1. Purpose
Aqari can automatically fetch (crawl) content from your organisation's public website in order to bootstrap and maintain your AI knowledge base. This allows your Aqari-powered chat widget to answer buyer questions about your projects, units, amenities, pricing, and developer background — using your own published information.
This document records your explicit, informed consent to that fetching activity, as required under the Aqari Terms of Service and applicable data-protection law.
2. What We Fetch
When consent is active, Aqari may fetch:
- Publicly accessible pages on the domain(s) you register in Settings → Website;
- Linked sub-pages discovered during crawling, including project pages, about pages, FAQs, gallery pages, and brochure landing pages;
- Downloadable PDFs, images, and other media files referenced by those pages;
- Structured data (JSON-LD, Open Graph metadata) embedded in page markup.
All fetching respects your site's robots.txt directives and <meta name="robots"> tags. Pages that disallow crawling are skipped.
3. What We Don't Fetch
Aqari will not:
- Attempt to access password-protected, members-only, or otherwise restricted pages;
- Submit any forms or simulate user login;
- Bypass paywalls, CAPTCHAs, or other access controls;
- Fetch content from domains not explicitly listed by you in Settings;
- Access sitemaps that require authentication.
4. How We Fetch
Aqari uses two complementary methods:
- Cheerio server-side parser — a Node.js HTML parser that fetches and parses individual page HTML in-process on Aqari's backend. Used for targeted single-page fetches and the admin assistant's real-time lookups.
- Spider Cloud crawl provider — a managed crawling service used for full-site discovery and bulk ingestion. Spider Cloud respects
robots.txtand applies rate-limiting to avoid impact on your server.
Both methods identify themselves with an AqariBot user-agent string that includes a link to our bot policy page.
5. Content Storage
All fetched content is:
- Stored exclusively within your workspace's tenant namespace — no other customer can see or access it;
- Chunked and converted into vector embeddings stored in a dedicated embeddings database, used only to answer queries from your widget and admin assistant;
- Never used to train any underlying AI model (see Privacy Policy § 11);
- Never shared, sold, or disclosed to any third party except as described in the Privacy Policy.
6. Frequency
Fetching occurs in the following scenarios:
- Initial approved scrape — a full crawl of your registered domain, triggered once after Aqari's platform team reviews and approves your workspace (see Section 10);
- Manual re-scrape — triggered by you clicking "Re-scrape" in Settings → Website at any time;
- Chat-time deep-crawl — if a buyer asks a question that cannot be answered from existing indexed content, the widget may fetch the most relevant page(s) in real time. This is bounded by per-session and per-day caps.
7. Caps & Budgets
To prevent unintended load on your web server:
- Soft caps apply by default to the number of pages crawled per scrape cycle and per chat-time deep-crawl. Default values are shown in Settings → Website.
- Workspace admins can adjust these caps within the limits of their subscription plan.
- Spider Cloud applies server-side rate-limiting; Aqari will never intentionally exceed safe request rates.
8. Deduplication
Before creating or updating a Document from fetched content, Aqari computes a content hash of the page body. If the hash matches an existing indexed Document, no new Document is created and no storage quota is consumed. This prevents duplicate indexing of unchanged pages during re-scrapes.
9. Your Warranties
By accepting this consent, you represent and warrant that:
- You own or have been expressly authorised to permit third-party crawling of the domain(s) you register;
- The content on those domains does not infringe any third-party intellectual property rights, and you have the right to grant Aqari a licence to process it for AI indexing;
- You will promptly update or remove the registered domain(s) in Settings if your authority to permit crawling changes or is revoked.
10. Super-Admin Review
Before any fetching begins for a new workspace:
- A member of the Aqari platform team reviews the registered domain to confirm it relates to real-estate development and does not raise abuse, trademark, or compliance concerns.
- Aqari may decline or pause fetching at its sole discretion — for example, if the domain is unrelated to real estate, hosts third-party content without authorisation, or is flagged for malware.
- You will receive an email notification of the review outcome. Fetching does not begin until explicit platform-team approval is recorded in the system.
11. Copyright & Third-Party Content
You are solely responsible for ensuring that the content published on your website, and therefore fetched by Aqari, complies with applicable copyright, trademark, and other intellectual property laws. If your website embeds or reproduces third-party content, you must hold an appropriate licence or rely on a recognised legal exception.
Aqari will remove indexed content from your knowledge base upon a valid takedown request from a legitimate rights holder. Send such requests to privacy@tryaqari.ai.
12. Revocation
You may revoke this consent at any time by either:
- Clearing the Website URL field in Settings → Website and saving; or
- Emailing privacy@tryaqari.ai with your workspace name and a request to halt fetching.
Revocation takes effect within 24 hours. Future fetching ceases immediately. You may additionally request deletion of all previously fetched content from your knowledge base; Aqari will complete such deletion within 30 days.
Note: revoking fetching consent does not affect your workspace or any other Aqari features.
13. Records
Aqari maintains an immutable acceptance log for each version of this consent document, recording:
- The user who accepted (name and email);
- The version number and effective date of the document accepted;
- The timestamp of acceptance (UTC);
- The IP address and user-agent string of the accepting browser.
These records are retained for the life of your workspace plus seven years, as evidence of lawful authorisation to crawl.
14. Contact
Questions about website content fetching or this consent document?
privacy@tryaqari.ai · Aqari FZ-LLC · Dubai, United Arab Emirates