Knowledge Base

The knowledge base is how your AI assistant learns your content. Upload documents, scrape websites and Awentail’s RAG engine takes care of the rest.

Supported file formats

Format	Extension	Notes
PDF	`.pdf`	Text-based PDF (scanned PDF with OCR coming soon)
Word	`.docx`	Full text extraction including tables
Plain text	`.txt`	Direct content loading
CSV	`.csv`	Row-by-row processing

Go to your assistant’s Knowledge Base tab
Click Upload or drag and drop files
Awentail automatically:
- Extracts text from the document
- Splits it into optimized chunks (150 words with 30-word overlap)
- Generates vector embeddings using OpenAI text-embedding-3-small
- Stores vectors in PostgreSQL with pgvector for fast similarity search

You can upload multiple files at once. Each file appears in the document list with its name, size and chunk count.

For content that lives on the web:

Go to the Knowledge Base tab
Click Scrape web
Enter a URL (e.g. https://example.com/pricing)
Awentail downloads the page, extracts text content (removes navigation, footer, scripts) and indexes it

Scraping limits depend on your plan:

When a visitor asks a question, Awentail uses a hybrid search approach:

Vector search (70% weight) — Finds semantically similar chunks using cosine similarity of embeddings
Keyword search (30% weight) — Uses PostgreSQL tsvector full-text search for exact term matches
LLM Reranking — When the top result score is below 0.78, the LLM reranks results for better accuracy

This hybrid approach outperforms pure vector search, especially for technical or domain-specific content.

Tip: Split large documents into focused topics for better search accuracy. A 5-page product FAQ will perform better than a 200-page manual.

Keep content focused — Upload documents directly related to what visitors will ask about
Update regularly — Delete outdated documents and upload current versions
Use descriptive file names — Helps you manage the knowledge base
Combine sources — Upload FAQs, product docs, pricing and support articles for comprehensive coverage