Knowledge Base
The knowledge base is how your AI assistant learns your content. Upload documents, scrape websites and Awentail’s RAG engine takes care of the rest.
Supported file formats
| Format | Extension | Notes |
|---|---|---|
.pdf | Text-based PDF (scanned PDF with OCR coming soon) | |
| Word | .docx | Full text extraction including tables |
| Plain text | .txt | Direct content loading |
| CSV | .csv | Row-by-row processing |
Uploading documents
- Go to your assistant’s Knowledge Base tab
- Click Upload or drag and drop files
- Awentail automatically:
- Extracts text from the document
- Splits it into optimized chunks (150 words with 30-word overlap)
- Generates vector embeddings using OpenAI
text-embedding-3-small - Stores vectors in PostgreSQL with pgvector for fast similarity search
You can upload multiple files at once. Each file appears in the document list with its name, size and chunk count.
Web scraping
For content that lives on the web:
- Go to the Knowledge Base tab
- Click Scrape web
- Enter a URL (e.g.
https://example.com/pricing) - Awentail downloads the page, extracts text content (removes navigation, footer, scripts) and indexes it
Scraping limits depend on your plan:
| Plan | Scrapes per month |
|---|---|
| Free | — |
| Starter | 1 |
| Pro | 3 |
| Business | 10 |
How RAG works
When a visitor asks a question, Awentail uses a hybrid search approach:
- Vector search (70% weight) — Finds semantically similar chunks using cosine similarity of embeddings
- Keyword search (30% weight) — Uses PostgreSQL
tsvectorfull-text search for exact term matches - LLM Reranking — When the top result score is below 0.78, the LLM reranks results for better accuracy
This hybrid approach outperforms pure vector search, especially for technical or domain-specific content.
Managing documents
- View — Browse all uploaded documents with chunk counts
- Delete — Remove a document and all its chunks/embeddings
- Re-upload — Upload a new version to replace outdated content
Tip: Split large documents into focused topics for better search accuracy. A 5-page product FAQ will perform better than a 200-page manual.
Best practices
- Keep content focused — Upload documents directly related to what visitors will ask about
- Update regularly — Delete outdated documents and upload current versions
- Use descriptive file names — Helps you manage the knowledge base
- Combine sources — Upload FAQs, product docs, pricing and support articles for comprehensive coverage