Documentation
Everything you need to get started with FAQai.app.
On this page▾
Getting Started
1. Upload a Document
Drag and drop or browse to upload your file. Supported formats: PDF, DOCX, and TXT. File size and page limits vary by plan (Free: 5 MB / 20 pages, Basic: 10 MB / 50 pages, Starter: 15 MB / 100 pages, Pro: 25 MB / 250 pages).
2. Generate Dataset
Our AI reads the full document, identifies the key topics, and generates structured training and evaluation datasets (Q&A pairs, contexts) for RAG chatbots.
3. Review & Export
Browse RAG-ready chunks, generated datasets (Q&A, variants, evaluation, adversarial), and quality insights. Export in 9 formats: JSON, CSV, Markdown, LangChain, LlamaIndex, Evaluation, Pinecone, Qdrant, and pgvector.
Supported File Formats
| Format | MIME Type | Notes |
|---|---|---|
| application/pdf | Text-based PDFs, max pages per plan (Free: 20, Basic: 50, Starter: 100, Pro: 250) | |
| DOCX | application/vnd.openxmlformats… | Microsoft Word 2007+ format |
| TXT | text/plain | Plain text, UTF-8 encoded |
Upload Limits
- File size: Varies by plan (Free: 5 MB, Basic: 10 MB, Starter: 15 MB, Pro: 25 MB).
- Page count: Varies by plan (Free: 20, Basic: 50, Starter: 100, Pro: 250 pages per document).
- Content: Documents must contain extractable text. Scanned or image-only PDFs are not supported.
Plans & Quotas
| Plan | Pricing | Page Quota | Features |
|---|---|---|---|
| Free | Free | 30 pages / month | Core dataset generation, JSON & CSV exports |
| Basic | £9/mo or £79/yr (save £29) | 1,000 pages / month | Everything in Free + all 9 export formats + API access (500 calls/mo) |
| Starter | £19/mo or £190/yr (save £38) | 2,000 pages / month | Everything in Basic + bulk upload + 1,000 API calls/mo |
| Pro | £39/mo or £390/yr (save £78) | 6,000 pages / month | Everything in Starter + webhooks + priority support |
Free plan quotas reset on the 1st of each calendar month. Paid plan quotas follow your Stripe billing cycle.
Yearly plans receive the same monthly page quota as monthly plans (quotas reset each month, not granted upfront). Yearly billing saves approximately 2 months of cost.
Export Options
FAQai supports 9 dataset export formats. Available formats depend on your plan.
| Format | Description | Plans |
|---|---|---|
| JSON | Standard format with all metadata | All plans |
| CSV | Spreadsheet-compatible for bulk editing | All plans |
| Markdown | Human-readable format for documentation | Basic, Starter, Pro |
| LangChain JSON | Compatible with LangChain document loaders | Basic, Starter, Pro |
| LlamaIndex JSON | Compatible with LlamaIndex data structures | Basic, Starter, Pro |
| Evaluation | RAGAS / DeepEval compatible evaluation format | Basic, Starter, Pro |
| Pinecone | Ready for Pinecone vector database upsert | Basic, Starter, Pro |
| Qdrant | Ready for Qdrant vector database | Basic, Starter, Pro |
| pgvector | SQL INSERT statements for PostgreSQL pgvector | Basic, Starter, Pro |
You can also copy individual items to clipboard from the dataset browser. JSON export includes all dataset types; other formats export the selected dataset type.
RAG Chunks Export
The RAG Chunks tab has its own dedicated export dropdown supporting all 9 formats. Chunk exports include enriched metadata (keywords, entities, topic, section), quality signals, and retrieval hints optimised for direct ingestion into vector databases and RAG pipelines.
JSON Export Diagnostics
JSON exports include two diagnostic sections: failure_analysis (RAG quality issues such as ambiguous queries, chunking gaps, and semantic overlap) and internal_errors (pipeline-level diagnostics like variant count or parsing issues). These help identify and fix weaknesses in your RAG system.
Dataset Types
FAQai generates six dataset sections from your documents for comprehensive RAG data preparation, training, and evaluation.
RAG Chunks
Structured, enriched document chunks with metadata (keywords, entities, topic, section), quality signals (completeness, overlap risk, fragmentation risk), and retrieval hints (search type, priority, use-case tags). Optimised for feeding into RAG pipelines.
Canonical Q&A
Core question-answer pairs grounded in document content. Each includes context, confidence score, and question type classification.
Query Variants
10 alternative phrasings per canonical question across 8 variant types (short, vague, broken, typo, incorrect assumption, partial, conversational, multi-intent) to improve retrieval robustness.
Evaluation Dataset
Question-context-expected_answer triples for retrieval and generation evaluation. Compatible with RAGAS and similar frameworks.
Adversarial Dataset
Misleading and edge-case questions to test RAG system robustness against hallucination and out-of-scope queries.
Quality Insights
AI-generated analysis of RAG system weaknesses across 9 categories (e.g. ambiguous queries, chunking issues, semantic overlap) with risk levels and actionable recommendations.
Pages per Month
Usage is measured by pages processed per document. Paid plans can purchase overage page packs.
Free
30 pages
Basic
1,000 pages
Starter
2,000 pages
Pro
6,000 pages
Dataset Items per Document (Per-Type Caps)
Each dataset type has its own guaranteed allocation so no single type can starve the others.
| Dataset Type | Free | Basic | Starter | Pro |
|---|---|---|---|---|
| Canonical Q&A | 80 | 200 | 400 | 600 |
| Query Variants | 200 | 500 | 1,000 | 1,500 |
| Evaluation | 60 | 150 | 300 | 450 |
| Adversarial | 60 | 150 | 300 | 450 |
Browsing Datasets
Once a document is processed, you can browse all six dataset sections directly on the document detail page using the tab bar.
Dataset Tabs
Switch between tabs to view items from each dataset type. Each tab shows its item count. All tabs share the same search bar.
Confidence Scores (Canonical Q&A)
Each Canonical Q&A item displays a confidence score indicating how directly the answer is supported by the document. Hover over the badge for a detailed explanation.
| Level | Meaning |
|---|---|
| High | Answer is directly stated in the document |
| Medium | Answer is supported but requires some inference |
| Low | Answer requires significant inference from the document |
Difficulty Badges (Evaluation & Adversarial)
Evaluation and Adversarial dataset items display a difficulty badge. Hover over the badge for a detailed explanation.
| Level | Meaning |
|---|---|
| Easy | Straightforward question answerable from a single passage |
| Medium | Requires combining information from multiple parts of the document |
| Hard | Requires deep reasoning, inference, or handling ambiguous information |
Pagination
Dataset items are paginated at 25 items per page across all tabs. Use the page number buttons or Previous / Next to navigate. The search bar filters items across question and answer text within the active tab.
Cancel Processing
You can cancel a document that is currently being processed. Cancellation stops dataset generation and refunds the page credits back to your monthly quota.
From the document list
Click the three-dot menu on a processing document and select "Cancel Processing".
From the document detail page
Click the amber "Cancel Processing" button next to the Delete button while the document is processing.
Via API
Send POST /v1/documents/{id}/cancel to cancel programmatically. See the API Reference section below.
Cancelled documents appear with a “Cancelled” status badge. Any partially generated datasets are marked as failed and excluded from exports.
Usage Guide
How to use FAQai datasets and exports in production RAG systems. Code examples in Python, JavaScript, cURL, and Go.
Using RAG Chunks
RAG Chunks are structured, enriched document chunks ready for ingestion into vector databases. Each chunk includes metadata (keywords, entities, topic, section), quality signals (completeness, overlap risk, fragmentation risk), and retrieval hints (search type, priority, use-case tags).
Common Use Cases
- Feed vector databases (Pinecone, Qdrant, pgvector) with enriched chunks
- Generate embeddings from the pre-cleaned
embedding_textfield - Use
retrieval_hintsto configure per-chunk search strategy - Filter or weight results using
chunk_qualitysignals
Python
import os, json, requests, openai
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# 1. Export RAG chunks in Pinecone format
resp = requests.get(
f"{BASE}/datasets/DATASET_ID/export?format=pinecone&chunk_mode=true",
headers=HEADERS,
)
data = resp.json()
# 2. Generate embeddings and upsert
client = openai.OpenAI()
for record in data["records"]:
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=record["metadata"]["text"],
).data[0].embedding
record["values"] = embedding
# 3. Upsert to Pinecone
from pinecone import Pinecone
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index = pc.Index("my-index")
index.upsert(vectors=data["records"])JavaScript / TypeScript
import OpenAI from "openai";
import { Pinecone } from "@pinecone-database/pinecone";
const API_KEY = process.env.FAQAI_API_KEY;
const BASE = "https://faqai.app/api/v1";
// 1. Export RAG chunks in Pinecone format
const resp = await fetch(
`${BASE}/datasets/DATASET_ID/export?format=pinecone&chunk_mode=true`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const data = await resp.json();
// 2. Generate embeddings
const openai = new OpenAI();
for (const record of data.records) {
const emb = await openai.embeddings.create({
model: "text-embedding-3-small",
input: record.metadata.text,
});
record.values = emb.data[0].embedding;
}
// 3. Upsert to Pinecone
const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pc.index("my-index");
await index.upsert(data.records);cURL
# Export RAG chunks as Pinecone-ready JSON curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=pinecone&chunk_mode=true" \ -H "Authorization: Bearer faq_YOUR_API_KEY" \ -o rag-chunks.pinecone.json # Export as Qdrant-ready JSON curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=qdrant&chunk_mode=true" \ -H "Authorization: Bearer faq_YOUR_API_KEY" \ -o rag-chunks.qdrant.json # Export as pgvector-ready JSON curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=pgvector&chunk_mode=true" \ -H "Authorization: Bearer faq_YOUR_API_KEY" \ -o rag-chunks.pgvector.json
Go
package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
func main() {
apiKey := os.Getenv("FAQAI_API_KEY")
url := "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=pinecone&chunk_mode=true"
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, err := http.DefaultClient.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var data map[string]interface{}
json.Unmarshal(body, &data)
records := data["records"].([]interface{})
fmt.Printf("Exported %d chunks ready for Pinecone upsert\n", len(records))
}Using Canonical Q&A
Canonical Q&A pairs are the core question-answer dataset grounded in your document content. Each item includes the question, answer, source context, confidence score, and full traceability (chunk ID, page, section).
Common Use Cases
- Validate retrieval accuracy by asserting correct answers
- Seed FAQ chatbots or support knowledge bases (Zendesk, Intercom)
- Fine-tune domain-specific Q&A models
- Create automated regression tests for your RAG pipeline
Python - Automated Retrieval Test
import os, requests
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Export canonical Q&A as JSON
resp = requests.get(
f"{BASE}/datasets/DATASET_ID/export?format=json",
headers=HEADERS,
)
dataset = resp.json()
# Run retrieval test against your RAG system
passed, failed = 0, 0
for qa in dataset["canonical_qa"]:
result = my_rag_system.query(qa["question"])
if qa["chunk_id"] in [r["chunk_id"] for r in result["sources"][:5]]:
passed += 1
else:
failed += 1
print(f"MISS: {qa['question']}")
print(f"Retrieval accuracy: {passed}/{passed + failed} ({100*passed/(passed+failed):.1f}%)")JavaScript / TypeScript - Automated Retrieval Test
const API_KEY = process.env.FAQAI_API_KEY;
const BASE = "https://faqai.app/api/v1";
// Export canonical Q&A
const resp = await fetch(
`${BASE}/datasets/DATASET_ID/export?format=json`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const dataset = await resp.json();
// Run retrieval test
let passed = 0, failed = 0;
for (const qa of dataset.canonical_qa) {
const result = await myRagSystem.query(qa.question);
const topChunks = result.sources.slice(0, 5).map((s) => s.chunk_id);
if (topChunks.includes(qa.chunk_id)) {
passed++;
} else {
failed++;
console.log(`MISS: ${qa.question}`);
}
}
console.log(`Retrieval accuracy: ${passed}/${passed + failed}`);cURL
# Export canonical Q&A as JSON curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json" \ -H "Authorization: Bearer faq_YOUR_API_KEY" \ -o canonical-qa.json # Export as CSV for spreadsheet analysis curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=csv" \ -H "Authorization: Bearer faq_YOUR_API_KEY" \ -o canonical-qa.csv
Go
package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
type QAItem struct {
Question string `json:"question"`
Answer string `json:"answer"`
ChunkID string `json:"chunk_id"`
Confidence string `json:"confidence"`
}
type Dataset struct {
CanonicalQA []QAItem `json:"canonical_qa"`
}
func main() {
apiKey := os.Getenv("FAQAI_API_KEY")
url := "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json"
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var ds Dataset
json.Unmarshal(body, &ds)
for _, qa := range ds.CanonicalQA {
fmt.Printf("[%s] Q: %s\n", qa.Confidence, qa.Question)
}
}Using Query Variants
Query Variants provide 10 alternative phrasings per canonical question across 8 types: short, vague, broken, typo, incorrect assumption, partial, conversational, and multi-intent. They simulate how real users actually search.
Common Use Cases
- Stress-test retrieval by running all variants through your retriever
- Identify which variant types cause the most retrieval failures
- Train query rewriting or intent classification models
- A/B test different search configurations across variant types
Python - Retrieval Robustness Test
import os, requests
from collections import defaultdict
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Export full dataset with variants
resp = requests.get(f"{BASE}/datasets/DATASET_ID/export?format=json", headers=HEADERS)
dataset = resp.json()
# Test retrieval for each variant type
failures_by_type = defaultdict(list)
for qa in dataset["canonical_qa"]:
variants = dataset.get("query_variants", {}).get(qa["id"], [])
for variant in variants:
result = my_retriever.search(variant["text"], top_k=5)
retrieved_ids = [r["chunk_id"] for r in result]
if qa["chunk_id"] not in retrieved_ids:
failures_by_type[variant["type"]].append(variant["text"])
# Report failures by variant type
for vtype, fails in sorted(failures_by_type.items(), key=lambda x: -len(x[1])):
print(f"{vtype}: {len(fails)} failures")
for f in fails[:3]:
print(f" - {f}")JavaScript / TypeScript
const resp = await fetch(
`${BASE}/datasets/DATASET_ID/export?format=json`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const dataset = await resp.json();
const failuresByType = {};
for (const qa of dataset.canonical_qa) {
const variants = dataset.query_variants?.[qa.id] ?? [];
for (const variant of variants) {
const results = await myRetriever.search(variant.text, { topK: 5 });
const ids = results.map((r) => r.chunk_id);
if (!ids.includes(qa.chunk_id)) {
failuresByType[variant.type] = failuresByType[variant.type] || [];
failuresByType[variant.type].push(variant.text);
}
}
}
for (const [type, fails] of Object.entries(failuresByType)) {
console.log(`${type}: ${fails.length} failures`);
}cURL
# Export full dataset including query variants
curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json" \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-o dataset-with-variants.json
# The JSON includes query_variants keyed by canonical question ID:
# { "query_variants": { "qa-001": [{ "text": "...", "type": "typo" }, ...] } }Go
package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
type Variant struct {
Text string `json:"text"`
Type string `json:"type"`
}
type FullDataset struct {
QueryVariants map[string][]Variant `json:"query_variants"`
}
func main() {
apiKey := os.Getenv("FAQAI_API_KEY")
url := "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json"
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var ds FullDataset
json.Unmarshal(body, &ds)
for qaID, variants := range ds.QueryVariants {
fmt.Printf("Question %s: %d variants\n", qaID, len(variants))
for _, v := range variants {
fmt.Printf(" [%s] %s\n", v.Type, v.Text)
}
}
}Using Evaluation Data
The Evaluation dataset provides question-context-expected_answer triples designed for measuring RAG performance. Compatible with RAGAS, DeepEval, and similar evaluation frameworks.
Common Use Cases
- Run RAGAS benchmarks for faithfulness, relevancy, and context recall
- Run DeepEval metrics for answer correctness and hallucination
- Set up CI/CD quality gates that fail builds if scores drop
- Track improvements over time after changes to chunking, embeddings, or prompts
Python - RAGAS Evaluation
import os, requests
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_recall
from datasets import Dataset
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# 1. Export evaluation dataset
resp = requests.get(
f"{BASE}/datasets/DATASET_ID/export?format=evaluation",
headers=HEADERS,
)
eval_data = resp.json()
# 2. Get your RAG system's answers
questions = [d["question"] for d in eval_data["data"]]
ground_truths = [d["ground_truth"] for d in eval_data["data"]]
contexts = [d["contexts"] for d in eval_data["data"]]
answers = [my_rag.query(q)["answer"] for q in questions]
# 3. Run RAGAS evaluation
dataset = Dataset.from_dict({
"question": questions,
"answer": answers,
"contexts": contexts,
"ground_truth": ground_truths,
})
result = evaluate(dataset, metrics=[faithfulness, answer_relevancy, context_recall])
print(result)
# 4. CI/CD gate: fail if below threshold
assert result["faithfulness"] > 0.8, f"Faithfulness too low: {result['faithfulness']}"
assert result["context_recall"] > 0.7, f"Context recall too low: {result['context_recall']}"JavaScript / TypeScript
const resp = await fetch(
`${BASE}/datasets/DATASET_ID/export?format=evaluation`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const evalData = await resp.json();
// Run evaluation against your RAG system
let totalScore = 0;
for (const item of evalData.data) {
const result = await myRag.query(item.question);
const isCorrect = result.answer.includes(item.ground_truth.substring(0, 50));
totalScore += isCorrect ? 1 : 0;
}
const accuracy = totalScore / evalData.data.length;
console.log(`Evaluation accuracy: ${(accuracy * 100).toFixed(1)}%`);
// CI/CD gate
if (accuracy < 0.8) {
process.exit(1);
}cURL
# Export evaluation dataset (RAGAS / DeepEval compatible)
curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=evaluation" \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-o evaluation.eval.json
# Structure: { "data": [{ "question", "ground_truth", "contexts", "metadata" }] }Go
package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
type EvalItem struct {
Question string `json:"question"`
GroundTruth string `json:"ground_truth"`
Contexts []string `json:"contexts"`
}
type EvalDataset struct {
Data []EvalItem `json:"data"`
}
func main() {
apiKey := os.Getenv("FAQAI_API_KEY")
url := "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=evaluation"
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var ds EvalDataset
json.Unmarshal(body, &ds)
fmt.Printf("Loaded %d evaluation items\n", len(ds.Data))
for _, item := range ds.Data[:3] {
fmt.Printf("Q: %s\nExpected: %s\n\n", item.Question, item.GroundTruth[:80])
}
}Using Adversarial Data
The Adversarial dataset contains deliberately misleading and edge-case questions designed to break your RAG system. Use it to detect hallucinations, test out-of-scope handling, and validate safety guardrails before production launch.
Common Use Cases
- Detect hallucinations: check if your RAG generates confident but wrong answers
- Verify out-of-scope handling: ensure your system says “I don't know”
- Red-team your system before production launch
- Test safety guardrails against manipulative queries
Python - Hallucination Detection
import os, requests
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Export adversarial dataset
resp = requests.get(f"{BASE}/datasets/DATASET_ID/export?format=json", headers=HEADERS)
dataset = resp.json()
# Test each adversarial question
hallucinations = []
for adv in dataset.get("adversarial", []):
result = my_rag.query(adv["question"])
# Adversarial questions should be refused or flagged as uncertain
if result.get("confidence", 0) > 0.8:
hallucinations.append({
"question": adv["question"],
"system_answer": result["answer"],
"expected": adv.get("expected_answer", "Should refuse or flag uncertainty"),
})
print(f"Hallucination rate: {len(hallucinations)}/{len(dataset.get('adversarial', []))}")
for h in hallucinations:
print(f" Q: {h['question']}")
print(f" Got: {h['system_answer'][:100]}")JavaScript / TypeScript
const resp = await fetch(
`${BASE}/datasets/DATASET_ID/export?format=json`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const dataset = await resp.json();
const hallucinations = [];
for (const adv of dataset.adversarial ?? []) {
const result = await myRag.query(adv.question);
if (result.confidence > 0.8) {
hallucinations.push({ question: adv.question, answer: result.answer });
}
}
console.log(`Hallucination rate: ${hallucinations.length}/${dataset.adversarial?.length ?? 0}`);
if (hallucinations.length > 0) {
console.error("FAIL: System hallucinated on adversarial questions");
process.exit(1);
}cURL
# Export full dataset (includes adversarial section)
curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json" \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-o dataset.json
# The JSON includes: { "adversarial": [{ "question", "answer", "difficulty", ... }] }Go
package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
type AdvItem struct {
Question string `json:"question"`
Answer string `json:"answer"`
Difficulty string `json:"difficulty"`
}
type AdvDataset struct {
Adversarial []AdvItem `json:"adversarial"`
}
func main() {
apiKey := os.Getenv("FAQAI_API_KEY")
url := "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json"
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var ds AdvDataset
json.Unmarshal(body, &ds)
fmt.Printf("Loaded %d adversarial questions\n", len(ds.Adversarial))
for _, adv := range ds.Adversarial {
fmt.Printf("[%s] %s\n", adv.Difficulty, adv.Question)
}
}Using Quality Insights
Quality Insights provide AI-generated analysis of your RAG system's weaknesses across 9 categories (ambiguous queries, chunking issues, semantic overlap, missing context, and more). Each insight includes a risk level and actionable recommendation.
Common Use Cases
- Prioritise improvements by focusing on high-risk insights first
- Fix chunking strategy if
chunking_issuesis flagged - Reduce semantic overlap between chunks
- Use recommendations to refine your system prompt and retrieval config
Python - Parse and Prioritise Insights
import os, requests
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Export JSON (includes dataset_quality_insights and failure_analysis)
resp = requests.get(f"{BASE}/datasets/DATASET_ID/export?format=json", headers=HEADERS)
dataset = resp.json()
# Parse quality insights by risk level
insights = dataset.get("dataset_quality_insights", [])
high_risk = [i for i in insights if i["risk_level"] == "high"]
medium_risk = [i for i in insights if i["risk_level"] == "medium"]
print(f"High risk: {len(high_risk)}, Medium risk: {len(medium_risk)}")
for insight in high_risk:
print(f" [{insight['insight_type']}] {insight['description']}")
print(f" Recommendation: {insight['recommendation']}")
# Parse failure_analysis for RAG-specific issues
failures = dataset.get("failure_analysis", [])
for f in failures:
print(f" [{f['severity']}] {f['issue']} -> {f.get('suggestedFix', 'N/A')}")JavaScript / TypeScript
const resp = await fetch(
`${BASE}/datasets/DATASET_ID/export?format=json`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const dataset = await resp.json();
const insights = dataset.dataset_quality_insights ?? [];
const highRisk = insights.filter((i) => i.risk_level === "high");
console.log(`High-risk insights: ${highRisk.length}`);
highRisk.forEach((i) => {
console.log(` [${i.insight_type}] ${i.description}`);
console.log(` Fix: ${i.recommendation}`);
});
// Also check failure_analysis for RAG issues
const failures = dataset.failure_analysis ?? [];
failures.forEach((f) => {
console.log(` [${f.severity}] ${f.issue}`);
});cURL
# Export JSON with quality insights and failure analysis
curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json" \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-o dataset-with-insights.json
# JSON includes:
# "dataset_quality_insights": [{ "insight_type", "description", "risk_level", "recommendation" }]
# "failure_analysis": [{ "issue", "severity", "suggestedFix", "affectedSection" }]
# "internal_errors": [{ "error_type", "message", "severity" }]Go
package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
type Insight struct {
InsightType string `json:"insight_type"`
Description string `json:"description"`
RiskLevel string `json:"risk_level"`
Recommendation string `json:"recommendation"`
}
type InsightDataset struct {
Insights []Insight `json:"dataset_quality_insights"`
}
func main() {
apiKey := os.Getenv("FAQAI_API_KEY")
url := "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json"
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var ds InsightDataset
json.Unmarshal(body, &ds)
for _, i := range ds.Insights {
fmt.Printf("[%s] %s: %s\n", i.RiskLevel, i.InsightType, i.Description)
}
}Export Format Guide
All 9 export formats are available via the API. Add chunk_mode=true to export enriched RAG chunks instead of Q&A dataset items.
| Format | API Parameter | Extension | Target System |
|---|---|---|---|
| JSON | json | .json | Any language, custom pipelines |
| CSV | csv | .csv | Excel, Google Sheets, pandas |
| Markdown | markdown | .md | GitHub, Notion, Confluence |
| LangChain | langchain_json | .langchain.json | LangChain (Python / JS) |
| LlamaIndex | llamaindex_json | .llamaindex.json | LlamaIndex (Python) |
| Evaluation | evaluation | .eval.json | RAGAS, DeepEval |
| Pinecone | pinecone | .pinecone.json | Pinecone vector DB |
| Qdrant | qdrant | .qdrant.json | Qdrant vector DB |
| pgvector | pgvector | .pgvector.json | PostgreSQL + pgvector |
API Endpoint
GET /v1/datasets/{id}/export?format={format} for Q&A datasets. Add &chunk_mode=true for RAG chunk exports.
Python - Download Any Format
import os, requests
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
DATASET_ID = "your-dataset-id"
# Download Q&A dataset in any format
formats = ["json", "csv", "markdown", "langchain_json", "llamaindex_json",
"evaluation", "pinecone", "qdrant", "pgvector"]
for fmt in formats:
resp = requests.get(f"{BASE}/datasets/{DATASET_ID}/export?format={fmt}", headers=HEADERS)
ext = resp.headers.get("Content-Disposition", f".{fmt}").split(".")[-1].rstrip('"')
with open(f"dataset.{ext}", "wb") as f:
f.write(resp.content)
print(f"Downloaded {fmt} -> dataset.{ext}")
# Download RAG chunks in Pinecone format
resp = requests.get(
f"{BASE}/datasets/{DATASET_ID}/export?format=pinecone&chunk_mode=true",
headers=HEADERS,
)
with open("rag-chunks.pinecone.json", "wb") as f:
f.write(resp.content)JavaScript / TypeScript
import fs from "fs";
const API_KEY = process.env.FAQAI_API_KEY;
const BASE = "https://faqai.app/api/v1";
const DATASET_ID = "your-dataset-id";
// Download in any format
const formats = ["json", "csv", "markdown", "langchain_json", "evaluation", "pinecone"];
for (const fmt of formats) {
const resp = await fetch(
`${BASE}/datasets/${DATASET_ID}/export?format=${fmt}`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const data = await resp.text();
fs.writeFileSync(`dataset.${fmt}`, data);
console.log(`Downloaded ${fmt}`);
}
// Download RAG chunks
const chunkResp = await fetch(
`${BASE}/datasets/${DATASET_ID}/export?format=pinecone&chunk_mode=true`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
fs.writeFileSync("rag-chunks.pinecone.json", await chunkResp.text());cURL - All Formats
# Q&A dataset exports curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=json" \ -H "Authorization: Bearer faq_YOUR_API_KEY" -o dataset.json curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=csv" \ -H "Authorization: Bearer faq_YOUR_API_KEY" -o dataset.csv curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=langchain_json" \ -H "Authorization: Bearer faq_YOUR_API_KEY" -o dataset.langchain.json curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=evaluation" \ -H "Authorization: Bearer faq_YOUR_API_KEY" -o dataset.eval.json # RAG chunk exports (add chunk_mode=true) curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=pinecone&chunk_mode=true" \ -H "Authorization: Bearer faq_YOUR_API_KEY" -o rag-chunks.pinecone.json curl "https://faqai.app/api/v1/datasets/DATASET_ID/export?format=qdrant&chunk_mode=true" \ -H "Authorization: Bearer faq_YOUR_API_KEY" -o rag-chunks.qdrant.json
Go
package main
import (
"fmt"
"io"
"net/http"
"os"
)
func downloadExport(datasetID, format string, chunkMode bool) {
apiKey := os.Getenv("FAQAI_API_KEY")
url := fmt.Sprintf(
"https://faqai.app/api/v1/datasets/%s/export?format=%s",
datasetID, format,
)
if chunkMode {
url += "&chunk_mode=true"
}
req, _ := http.NewRequest("GET", url, nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
filename := fmt.Sprintf("dataset.%s.json", format)
if chunkMode {
filename = fmt.Sprintf("rag-chunks.%s.json", format)
}
out, _ := os.Create(filename)
defer out.Close()
io.Copy(out, resp.Body)
fmt.Printf("Downloaded %s -> %s\n", format, filename)
}
func main() {
id := "DATASET_ID"
downloadExport(id, "json", false)
downloadExport(id, "evaluation", false)
downloadExport(id, "pinecone", true)
downloadExport(id, "qdrant", true)
}Using Webhooks
Webhooks let you receive real-time HTTP callbacks when events occur, eliminating the need to poll for status changes. This is the recommended approach for production workflows.
Available Events
document.uploaded - File accepteddocument.completed - Dataset readydocument.failed - Processing errordocument.cancelled - User cancelleddataset.exported - Export deliveredCommon Use Cases
- Auto-export datasets to your vector database when processing completes
- Send Slack / Teams / email notifications on completion or failure
- Trigger downstream ML pipelines (embedding generation, index rebuild)
- Sync results to Google Sheets, Airtable, or any external system
- Build event-driven architectures without polling overhead
cURL - Register a Webhook
# Register a webhook for all events
curl -X POST https://faqai.app/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-server.com/webhooks/faqai",
"events": ["document.completed", "document.failed", "dataset.exported"],
"description": "Production RAG pipeline"
}'
# Response includes a "secret" for signature verification - save it securely
# { "id": "wh_...", "secret": "whsec_...", "url": "...", "events": [...] }Python - Webhook Receiver with Auto-Export (Flask)
import os, hmac, hashlib, json, requests
from flask import Flask, request, jsonify
app = Flask(__name__)
WEBHOOK_SECRET = os.environ["FAQAI_WEBHOOK_SECRET"]
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
def verify_signature(payload: bytes, signature: str) -> bool:
expected = hmac.new(
WEBHOOK_SECRET.encode(), payload, hashlib.sha256
).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
@app.route("/webhooks/faqai", methods=["POST"])
def handle_webhook():
# 1. Verify signature
signature = request.headers.get("X-FAQai-Signature", "")
if not verify_signature(request.data, signature):
return jsonify({"error": "Invalid signature"}), 401
event = request.json
event_type = event["event"]
data = event["data"]
# 2. Auto-export when processing completes
if event_type == "document.completed":
doc_id = data["id"]
# Get the dataset for this document
ds_resp = requests.get(
f"{BASE}/documents/{doc_id}/datasets", headers=HEADERS
)
datasets = ds_resp.json()["datasets"]
for ds in datasets:
# Export as Pinecone format for vector DB ingestion
export = requests.get(
f"{BASE}/datasets/{ds['id']}/export?format=pinecone&chunk_mode=true",
headers=HEADERS,
)
# Forward to your vector DB pipeline
ingest_to_pinecone(export.json())
print(f"Exported {ds['dataset_type']} -> Pinecone")
elif event_type == "document.failed":
print(f"Processing failed for {data['id']}: {data.get('error', 'unknown')}")
send_slack_alert(f"FAQai processing failed for {data['name']}")
return jsonify({"received": True}), 200
if __name__ == "__main__":
app.run(port=8080)JavaScript / TypeScript - Webhook Receiver with Auto-Export (Express)
import express from "express";
import crypto from "crypto";
const app = express();
app.use(express.json());
const WEBHOOK_SECRET = process.env.FAQAI_WEBHOOK_SECRET!;
const API_KEY = process.env.FAQAI_API_KEY!;
const BASE = "https://faqai.app/api/v1";
function verifySignature(payload: string, signature: string): boolean {
const expected = crypto
.createHmac("sha256", WEBHOOK_SECRET)
.update(payload)
.digest("hex");
return signature === `sha256=${expected}`;
}
app.post("/webhooks/faqai", async (req, res) => {
// 1. Verify signature
const sig = req.headers["x-faqai-signature"] as string;
if (!verifySignature(JSON.stringify(req.body), sig)) {
return res.status(401).json({ error: "Invalid signature" });
}
const { event, data } = req.body;
// 2. Auto-export when processing completes
if (event === "document.completed") {
const dsResp = await fetch(`${BASE}/documents/${data.id}/datasets`, {
headers: { Authorization: `Bearer ${API_KEY}` },
});
const { datasets } = await dsResp.json();
for (const ds of datasets) {
const exportResp = await fetch(
`${BASE}/datasets/${ds.id}/export?format=pinecone&chunk_mode=true`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const chunks = await exportResp.json();
await ingestToPinecone(chunks);
console.log(`Exported ${ds.dataset_type} -> Pinecone`);
}
}
if (event === "document.failed") {
console.error(`Processing failed: ${data.name}`);
await sendSlackAlert(`FAQai failed: ${data.name}`);
}
res.json({ received: true });
});
app.listen(8080, () => console.log("Webhook receiver on :8080"));Go - Webhook Receiver with Signature Verification
package main
import (
"crypto/hmac"
"crypto/sha256"
"encoding/hex"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
type WebhookEvent struct {
Event string `json:"event"`
Timestamp string `json:"timestamp"`
Data map[string]interface{} `json:"data"`
}
func verifySignature(payload []byte, signature, secret string) bool {
mac := hmac.New(sha256.New, []byte(secret))
mac.Write(payload)
expected := "sha256=" + hex.EncodeToString(mac.Sum(nil))
return hmac.Equal([]byte(expected), []byte(signature))
}
func webhookHandler(w http.ResponseWriter, r *http.Request) {
body, _ := io.ReadAll(r.Body)
sig := r.Header.Get("X-FAQai-Signature")
secret := os.Getenv("FAQAI_WEBHOOK_SECRET")
if !verifySignature(body, sig, secret) {
http.Error(w, "Invalid signature", http.StatusUnauthorized)
return
}
var event WebhookEvent
json.Unmarshal(body, &event)
switch event.Event {
case "document.completed":
docID := event.Data["id"].(string)
fmt.Printf("Document completed: %s - ready to export\n", docID)
// Call export API and forward to your vector DB
case "document.failed":
fmt.Printf("Document failed: %s\n", event.Data["name"])
}
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"received": true}`))
}
func main() {
http.HandleFunc("/webhooks/faqai", webhookHandler)
fmt.Println("Webhook receiver on :8080")
http.ListenAndServe(":8080", nil)
}Automation Workflows
FAQai's REST API and webhooks integrate with popular no-code and low-code automation platforms. Below are practical workflow recipes you can build today.
Prerequisites
All recipes require a Basic, Starter or Pro plan with an API key. Store your key as a credential / environment variable in your automation platform. Set the Authorization header to Bearer faq_YOUR_API_KEY on all HTTP requests.
n8n - Auto-Process Google Drive Uploads
Automatically process new documents added to a Google Drive folder, export the generated dataset, and send results to Google Sheets.
Google Drive Trigger → HTTP Request (Upload) → Wait → Webhook Trigger (Completion) → HTTP Request (Export) → Google Sheets
- Google Drive Trigger - Set to “File Created” on your target folder. This fires whenever a new PDF, DOCX, or TXT is added.
- HTTP Request (Upload) - Method:
POST, URL:https://faqai.app/api/v1/documents. Body type: Form-Data. Fields:file(binary from trigger),auto_process=true,name(from trigger filename). Header:Authorization: Bearer faq_YOUR_API_KEY. - Wait - Use n8n's Wait node set to “Webhook” mode. Copy the resume URL. Register it as a FAQai webhook for
document.completed. Alternatively, use a polling loop: Wait 10s → HTTP GET status → IF completed → continue. - HTTP Request (Export) - Method:
GET, URL:https://faqai.app/api/v1/datasets/{{$json.dataset_id}}/export?format=json. - Google Sheets - Append rows from the canonical Q&A array. Map
question,answer,confidence,pageto columns.
# Register the n8n webhook with FAQai
curl -X POST https://faqai.app/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-n8n.example.com/webhook/WAIT_NODE_ID",
"events": ["document.completed", "document.failed"],
"description": "n8n Google Drive pipeline"
}'Make.com - Document Pipeline with Slack Notifications
Upload documents via a custom webhook, wait for processing, export to Pinecone format, and notify your team on Slack.
Custom Webhook → HTTP (Upload) → Sleep → HTTP (Poll Status) → Router → HTTP (Export) → Slack
- Custom Webhook - Create a webhook trigger in Make.com. Send files to this URL from your app or another automation.
- HTTP Module (Upload) - Method: POST. URL:
https://faqai.app/api/v1/documents. Body type: multipart/form-data. Map the file from the webhook trigger. Setauto_process=true. - Sleep Module - Wait 30-60 seconds for processing. For larger documents, use a loop with a Router to poll status.
- HTTP Module (Check Status) - GET
https://faqai.app/api/v1/documents/{{doc_id}}. Use a Router: ifstatus = completedcontinue, otherwise loop back to Sleep. - HTTP Module (Export) - GET
https://faqai.app/api/v1/datasets/{{dataset_id}}/export?format=pinecone&chunk_mode=true. - Slack - Post a message to your channel: “Dataset ready for [document name] - [chunk count] chunks exported to Pinecone format.”
Webhook alternative: Instead of polling, register a FAQai webhook pointing to a second Make.com scenario with a Custom Webhook trigger. This eliminates the Sleep/Poll loop entirely.
Register Make.com Webhook with FAQai
# Register Make.com Custom Webhook URL with FAQai
curl -X POST https://faqai.app/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://hook.eu1.make.com/YOUR_WEBHOOK_ID",
"events": ["document.completed", "document.failed"],
"description": "Make.com document pipeline"
}'Zapier - Email-to-Dataset Pipeline
Automatically process email attachments, generate datasets, and save results to Google Drive.
Gmail Trigger → Webhooks by Zapier (Upload) → Delay → Webhooks by Zapier (Poll) → Filter → Webhooks by Zapier (Export) → Google Drive
- Gmail Trigger - Trigger: “New Attachment”. Filter for PDF/DOCX files. Use a specific label like “FAQai” to control which emails are processed.
- Webhooks by Zapier (POST) - URL:
https://faqai.app/api/v1/documents. Payload type: Form. Fields:file(attachment URL),auto_process=true,name(attachment filename). - Delay by Zapier - Wait 2 minutes (Zapier delays are in minute increments).
- Webhooks by Zapier (GET) - URL:
https://faqai.app/api/v1/documents/{{doc_id}}. Check thestatusfield. - Filter by Zapier - Only continue if
statusequalscompleted. - Webhooks by Zapier (GET Export) - URL:
https://faqai.app/api/v1/datasets/{{dataset_id}}/export?format=csv. - Google Drive - Create File action. Save the CSV export to a specific folder.
Webhook alternative: Use Webhooks by Zapier → Catch Hook as the trigger for a second Zap. Register the Catch Hook URL as a FAQai webhook for document.completed. This eliminates the Delay + Poll steps.
Register Zapier Catch Hook with FAQai
# Register Zapier Catch Hook URL with FAQai
curl -X POST https://faqai.app/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://hooks.zapier.com/hooks/catch/YOUR_ACCOUNT_ID/YOUR_HOOK_ID/",
"events": ["document.completed", "document.failed"],
"description": "Zapier email-to-dataset pipeline"
}'Power Automate - SharePoint Document Processor
Process documents uploaded to SharePoint, generate RAG datasets, and post results to Microsoft Teams.
SharePoint Trigger → HTTP (Upload) → Do Until (Poll) → HTTP (Export) → Parse JSON → Teams Message
- SharePoint - When a file is created - Select your document library and folder. Filter for PDF/DOCX content types.
- HTTP Action (Upload) - Method: POST. URI:
https://faqai.app/api/v1/documents. Body: multipart/form-data with the file content from SharePoint. Headers:Authorization: Bearer faq_YOUR_API_KEY. Setauto_process=true. - Do Until Loop - Condition:
statusequalscompletedorfailed. Inside: Delay 15 seconds → HTTP GEThttps://faqai.app/api/v1/documents/{{doc_id}}→ Parse JSON. - Condition - If
status = completed: proceed to export. If failed: send error notification. - HTTP Action (Export) - GET
https://faqai.app/api/v1/datasets/{{dataset_id}}/export?format=json. - Post message to Teams - Channel: your RAG team channel. Message: document name, item count, link to dashboard.
Webhook alternative: Use Power Automate's “When a HTTP request is received” trigger in a separate flow. Register the trigger URL as a FAQai webhook to eliminate the polling loop.
Register Power Automate HTTP Trigger with FAQai
# Register Power Automate "When a HTTP request is received" URL with FAQai
curl -X POST https://faqai.app/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://prod-XX.westeurope.logic.azure.com:443/workflows/YOUR_FLOW_ID/triggers/manual/paths/invoke?api-version=2016-06-01&sp=%2Ftriggers%2Fmanual%2Frun&sv=1.0&sig=YOUR_SIG",
"events": ["document.completed", "document.failed"],
"description": "Power Automate SharePoint pipeline"
}'Pipedream - Event-Driven RAG Pipeline
Build a fully event-driven pipeline using Pipedream's webhook triggers and Node.js steps.
HTTP Trigger (Upload) → FAQai Upload → Register Webhook → Webhook Trigger (Callback) → Export → Forward
- Workflow 1 - Upload: HTTP trigger receives a file URL. A Node.js step downloads and uploads it to FAQai with
auto_process=true. - Workflow 2 - Callback: Create a separate workflow with an HTTP trigger. Register its URL as a FAQai webhook for
document.completed. - Export Step - When the webhook fires, a Node.js step calls the export API in any format.
- Forward - Send the exported data to any destination: Pinecone, Qdrant, Supabase, Airtable, Slack, email, S3, or another API.
Pipedream Node.js Step - Webhook Callback Handler
// Pipedream Node.js step - runs when FAQai webhook fires
export default defineComponent({
async run({ steps, $ }) {
const event = steps.trigger.event.body;
const API_KEY = process.env.FAQAI_API_KEY;
const BASE = "https://faqai.app/api/v1";
if (event.event !== "document.completed") {
$.flow.exit("Not a completion event");
}
const docId = event.data.id;
// Get datasets for this document
const dsResp = await fetch(`${BASE}/documents/${docId}/datasets`, {
headers: { Authorization: `Bearer ${API_KEY}` },
});
const { datasets } = await dsResp.json();
// Export RAG chunks in Pinecone format
const ds = datasets[0];
const exportResp = await fetch(
`${BASE}/datasets/${ds.id}/export?format=pinecone&chunk_mode=true`,
{ headers: { Authorization: `Bearer ${API_KEY}` } },
);
const chunks = await exportResp.json();
// Forward to next step (Pinecone, Slack, S3, etc.)
return {
document: event.data.name,
chunkCount: chunks.records.length,
data: chunks,
};
},
});Register Pipedream HTTP Trigger with FAQai
# Register Pipedream Workflow 2 HTTP trigger URL with FAQai
curl -X POST https://faqai.app/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://YOUR_ENDPOINT.m.pipedream.net",
"events": ["document.completed", "document.failed"],
"description": "Pipedream RAG pipeline callback"
}'Tips for All Platforms
- Prefer webhooks over polling for production workflows - they are faster and use fewer API calls
- Store your API key as a credential or secret, never hardcode it in workflow steps
- Add error handling for
document.failedevents to alert your team - Use
chunk_mode=truewhen exporting for vector database ingestion - For detailed API reference and webhook payload schemas, see the Webhooks and Automation Tools sections in the API Reference
End-to-End Workflows
Complete workflows that combine multiple dataset sections and export formats to solve real production challenges.
Workflow 1: Production RAG Chatbot
- Upload your document via
POST /v1/documentswithauto_process=true - Poll
GET /v1/documents/{id}until status iscompleted - Export RAG Chunks:
GET /v1/datasets/{id}/export?format=pinecone&chunk_mode=true - Generate embeddings (OpenAI, Cohere) and upsert into your vector database
- Export Canonical Q&A:
GET /v1/datasets/{id}/export?format=json - Run automated retrieval tests using Q&A pairs as ground truth
- Export Evaluation:
GET /v1/datasets/{id}/export?format=evaluation - Run RAGAS evaluation to get baseline scores
- Run adversarial tests to check hallucination handling
- Review Quality Insights and iterate on chunking, prompts, and retrieval
Python - Full Production RAG Chatbot
#!/usr/bin/env python3
"""Production RAG Chatbot - end-to-end pipeline using FAQai datasets."""
import os, time, json, requests, openai
API_KEY = os.environ["FAQAI_API_KEY"]
OPENAI_KEY = os.environ["OPENAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# ---- Step 1: Upload document ----
with open("handbook.pdf", "rb") as f:
upload = requests.post(
f"{BASE}/documents",
headers=HEADERS,
files={"file": ("handbook.pdf", f, "application/pdf")},
data={"auto_process": "true"},
)
doc = upload.json()
doc_id = doc["id"]
print(f"Uploaded document: {doc_id}")
# ---- Step 2: Poll until processing completes ----
while True:
status = requests.get(f"{BASE}/documents/{doc_id}", headers=HEADERS).json()
print(f" Status: {status['status']}")
if status["status"] == "completed":
dataset_id = status["dataset_id"]
break
if status["status"] == "failed":
raise RuntimeError(f"Processing failed: {status.get('error')}")
time.sleep(5)
# ---- Step 3: Export RAG chunks (Pinecone format) ----
chunks_resp = requests.get(
f"{BASE}/datasets/{dataset_id}/export?format=pinecone&chunk_mode=true",
headers=HEADERS,
)
chunks = chunks_resp.json()
print(f"Exported {len(chunks.get('records', []))} RAG chunks")
# ---- Step 4: Generate embeddings and upsert ----
client = openai.OpenAI(api_key=OPENAI_KEY)
vectors = []
for record in chunks["records"]:
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=record["metadata"].get("embedding_text", record["metadata"].get("text", "")),
)
vectors.append({
"id": record["id"],
"values": embedding.data[0].embedding,
"metadata": record["metadata"],
})
print(f"Generated {len(vectors)} embeddings")
# Upsert to Pinecone (example)
# import pinecone
# index = pinecone.Index("my-index")
# index.upsert(vectors=vectors, namespace="handbook")
# ---- Step 5: Export Canonical Q&A for retrieval testing ----
qa_resp = requests.get(
f"{BASE}/datasets/{dataset_id}/export?format=json", headers=HEADERS
)
qa_data = qa_resp.json()
canonical = qa_data.get("canonical_qa", [])
print(f"Exported {len(canonical)} Q&A pairs for testing")
# ---- Step 6: Automated retrieval test ----
passed, failed = 0, 0
for item in canonical[:20]: # test first 20
query_embedding = client.embeddings.create(
model="text-embedding-3-small", input=item["question"]
).data[0].embedding
# Replace with your vector DB search
# results = index.query(vector=query_embedding, top_k=5, include_metadata=True)
# retrieved_texts = [m["metadata"]["text"] for m in results["matches"]]
# if any(item["context"][:100] in t for t in retrieved_texts):
# passed += 1
# else:
# failed += 1
passed += 1 # placeholder
print(f"Retrieval accuracy: {passed}/{passed + failed}")
# ---- Step 7: Export Evaluation dataset ----
eval_resp = requests.get(
f"{BASE}/datasets/{dataset_id}/export?format=evaluation", headers=HEADERS
)
eval_data = eval_resp.json()
print(f"Exported {len(eval_data.get('data', []))} evaluation items")
# ---- Step 8: Run RAGAS evaluation (requires ragas library) ----
# from ragas import evaluate
# from ragas.metrics import faithfulness, context_recall, answer_relevancy
# result = evaluate(eval_data["data"], metrics=[faithfulness, context_recall, answer_relevancy])
# print(f"Faithfulness: {result['faithfulness']:.2f}")
# print(f"Context Recall: {result['context_recall']:.2f}")
# ---- Step 9: Adversarial hallucination check ----
adversarial = qa_data.get("adversarial", [])
print(f"Running {len(adversarial)} adversarial tests...")
# for item in adversarial:
# response = your_rag_chain.query(item["question"])
# if response["confidence"] > 0.8:
# print(f" HALLUCINATION: {item['question'][:80]}")
# ---- Step 10: Review Quality Insights ----
insights = qa_data.get("dataset_quality_insights", [])
for insight in insights:
risk = insight.get("risk_level", "unknown")
print(f" [{risk.upper()}] {insight.get('insight_type')}: {insight.get('description', '')[:80]}")
print("\nPipeline complete. Review insights above to iterate on your RAG system.")JavaScript / TypeScript - Full Production RAG Chatbot
/**
* Production RAG Chatbot - end-to-end pipeline using FAQai datasets.
* Run with: npx tsx rag-chatbot.ts
*/
import OpenAI from "openai";
import fs from "fs";
const API_KEY = process.env.FAQAI_API_KEY!;
const OPENAI_KEY = process.env.OPENAI_API_KEY!;
const BASE = "https://faqai.app/api/v1";
const headers = { Authorization: `Bearer ${API_KEY}` };
const openai = new OpenAI({ apiKey: OPENAI_KEY });
async function sleep(ms: number) {
return new Promise((r) => setTimeout(r, ms));
}
async function main() {
// ---- Step 1: Upload document ----
const form = new FormData();
form.append("file", new Blob([fs.readFileSync("handbook.pdf")]), "handbook.pdf");
form.append("auto_process", "true");
const uploadRes = await fetch(`${BASE}/documents`, {
method: "POST",
headers: { Authorization: `Bearer ${API_KEY}` },
body: form,
});
const doc = await uploadRes.json();
const docId = doc.id;
console.log(`Uploaded document: ${docId}`);
// ---- Step 2: Poll until processing completes ----
let datasetId = "";
while (true) {
const statusRes = await fetch(`${BASE}/documents/${docId}`, { headers });
const status = await statusRes.json();
console.log(` Status: ${status.status}`);
if (status.status === "completed") {
datasetId = status.dataset_id;
break;
}
if (status.status === "failed") {
throw new Error(`Processing failed: ${status.error}`);
}
await sleep(5000);
}
// ---- Step 3: Export RAG chunks (Pinecone format) ----
const chunksRes = await fetch(
`${BASE}/datasets/${datasetId}/export?format=pinecone&chunk_mode=true`,
{ headers }
);
const chunks = await chunksRes.json();
console.log(`Exported ${chunks.records?.length ?? 0} RAG chunks`);
// ---- Step 4: Generate embeddings and upsert ----
const vectors = [];
for (const record of chunks.records ?? []) {
const text = record.metadata?.embedding_text ?? record.metadata?.text ?? "";
const embedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text,
});
vectors.push({
id: record.id,
values: embedding.data[0].embedding,
metadata: record.metadata,
});
}
console.log(`Generated ${vectors.length} embeddings`);
// Upsert to Pinecone (example)
// import { Pinecone } from "@pinecone-database/pinecone";
// const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
// const index = pc.index("my-index");
// await index.upsert(vectors);
// ---- Step 5: Export Canonical Q&A for retrieval testing ----
const qaRes = await fetch(
`${BASE}/datasets/${datasetId}/export?format=json`,
{ headers }
);
const qaData = await qaRes.json();
const canonical = qaData.canonical_qa ?? [];
console.log(`Exported ${canonical.length} Q&A pairs for testing`);
// ---- Step 6: Automated retrieval test ----
let passed = 0;
let failed = 0;
for (const item of canonical.slice(0, 20)) {
const queryEmb = await openai.embeddings.create({
model: "text-embedding-3-small",
input: item.question,
});
// Replace with your vector DB search
// const results = await index.query({ vector: queryEmb.data[0].embedding, topK: 5 });
// const match = results.matches.some(m => m.metadata.text.includes(item.context?.slice(0, 100)));
// match ? passed++ : failed++;
passed++; // placeholder
}
console.log(`Retrieval accuracy: ${passed}/${passed + failed}`);
// ---- Step 7: Export Evaluation dataset ----
const evalRes = await fetch(
`${BASE}/datasets/${datasetId}/export?format=evaluation`,
{ headers }
);
const evalData = await evalRes.json();
console.log(`Exported ${evalData.data?.length ?? 0} evaluation items`);
// ---- Step 8: RAGAS evaluation (use ragas npm package or Python) ----
// See Python example above for RAGAS integration
// ---- Step 9: Adversarial hallucination check ----
const adversarial = qaData.adversarial ?? [];
console.log(`Running ${adversarial.length} adversarial tests...`);
// for (const item of adversarial) {
// const response = await yourRagChain.query(item.question);
// if (response.confidence > 0.8) console.log(` HALLUCINATION: ${item.question.slice(0, 80)}`);
// }
// ---- Step 10: Review Quality Insights ----
const insights = qaData.dataset_quality_insights ?? [];
for (const insight of insights) {
const risk = (insight.risk_level ?? "unknown").toUpperCase();
console.log(` [${risk}] ${insight.insight_type}: ${(insight.description ?? "").slice(0, 80)}`);
}
console.log("\nPipeline complete. Review insights above to iterate on your RAG system.");
}
main().catch(console.error);Workflow 2: CI/CD Quality Gate
- Generate datasets once per document version
- In your CI pipeline, export Evaluation and Adversarial datasets via API
- Run RAGAS evaluation - fail if faithfulness < 0.8 or context recall < 0.7
- Run adversarial tests - fail if hallucination rate > 10%
- Run variant retrieval tests - fail if recall drops below threshold
- Deploy only when all quality gates pass
Example CI Script (Python)
#!/usr/bin/env python3
"""CI/CD quality gate for RAG system."""
import os, sys, requests
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
DATASET_ID = os.environ["DATASET_ID"]
# Export evaluation dataset
eval_resp = requests.get(
f"{BASE}/datasets/{DATASET_ID}/export?format=evaluation", headers=HEADERS
)
eval_data = eval_resp.json()
# Export full dataset (includes adversarial)
full_resp = requests.get(
f"{BASE}/datasets/{DATASET_ID}/export?format=json", headers=HEADERS
)
full_data = full_resp.json()
# Gate 1: Evaluation accuracy
correct = sum(1 for d in eval_data["data"] if my_rag.check(d["question"], d["ground_truth"]))
accuracy = correct / len(eval_data["data"])
print(f"Evaluation accuracy: {accuracy:.1%}")
if accuracy < 0.8:
print("FAIL: Accuracy below 80%")
sys.exit(1)
# Gate 2: Adversarial hallucination rate
adversarial = full_data.get("adversarial", [])
hallucinations = sum(1 for a in adversarial if my_rag.query(a["question"])["confidence"] > 0.8)
hall_rate = hallucinations / max(len(adversarial), 1)
print(f"Hallucination rate: {hall_rate:.1%}")
if hall_rate > 0.1:
print("FAIL: Hallucination rate above 10%")
sys.exit(1)
print("All quality gates passed")Workflow 3: LangChain / LlamaIndex Integration
- Export RAG Chunks in LangChain format:
format=langchain_json&chunk_mode=true - Load directly into your framework using JSONLoader or custom loader
- Build your retrieval chain (VectorStore, RetrievalQA)
- Export Evaluation dataset for framework-native evaluation
- Review Quality Insights for tuning guidance
LangChain Example (Python)
import os, requests, json
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.schema import Document
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Export RAG chunks in LangChain format
resp = requests.get(
f"{BASE}/datasets/DATASET_ID/export?format=langchain_json&chunk_mode=true",
headers=HEADERS,
)
data = resp.json()
# Load into LangChain Documents
docs = [
Document(page_content=d["page_content"], metadata=d["metadata"])
for d in data["documents"]
]
# Build vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(docs, embeddings)
# Query
results = vectorstore.similarity_search("How does authentication work?", k=5)
for r in results:
print(f"[Page {r.metadata.get('page')}] {r.page_content[:100]}...")LlamaIndex Example (Python)
import os, requests
from llama_index.core import VectorStoreIndex, Document
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Export RAG chunks in LlamaIndex format
resp = requests.get(
f"{BASE}/datasets/DATASET_ID/export?format=llamaindex_json&chunk_mode=true",
headers=HEADERS,
)
data = resp.json()
# Build index from exported chunks
documents = [
Document(text=ex["query"], metadata={"contexts": ex["reference_contexts"]})
for ex in data["examples"]
]
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the main features?")
print(response)API Reference
Programmatic access to FAQai.app. Available on Basic, Starter and Pro plans.
Authentication
All API requests require an API key passed in the Authorization header as a Bearer token.
Create and manage your API keys from Settings → API Keys.
Authorization: Bearer faq_aBcDeFgHiJkLmNoPqRsTuVwXyZ012345
Keep your API keys safe
API keys are shown only once at creation time. Store them in an environment variable or secrets manager. Never expose keys in client-side code or public repositories.
Base URL
https://<your-domain>/api/v1
Replace <your-domain> with your deployment domain (e.g. faqai.app). All endpoints below are relative to this base URL. Responses are JSON with Content-Type: application/json.
Rate Limits
| Plan | Requests / min | API calls / month | Pages / month |
|---|---|---|---|
| Basic | 5 | 500 | 1,000 |
| Starter | 10 | 1,000 | 2,000 |
| Pro | 30 | 3,000 | 6,000 |
Rate limit headers X-RateLimit-Remaining and Retry-After are included in every response.
POSTUpload a Document
POST /v1/documents
Upload a file for dataset generation. Send the file as multipart/form-data.
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
| file | File | Yes | PDF, DOCX, or TXT. Size limit varies by plan. |
| name | string | No | Custom document name. Defaults to the filename. |
| auto_process | boolean | No | Automatically generate dataset after upload. Defaults to true. |
Example Request (cURL)
curl -X POST https://faqai.app/api/v1/documents \ -H "Authorization: Bearer faq_YOUR_API_KEY" \ -F "file=@./product-manual.pdf" \ -F "name=Product Manual" \ -F "auto_process=true"
Response
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Product Manual",
"original_filename": "product-manual.pdf",
"file_type": "application/pdf",
"file_size": 245760,
"status": "processing",
"created_at": "2026-02-26T10:30:00.000Z"
}POST/v1/jobs
Upload a document and process it asynchronously in a single request. Returns 202 Accepted immediately. Processing runs in the background. Use webhook_url for push notification or poll GET /v1/documents/{id} to check status.
Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
| file | File | Yes | PDF, DOCX, or TXT (size limit varies by plan) |
| name | string | No | Custom document name |
| webhook_url | string | No | HTTP(S) URL to receive completion/failure callback |
Example Request
curl -X POST https://faqai.app/api/v1/jobs \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "file=@document.pdf" \ -F "webhook_url=https://your-server.com/webhook"
Response (202 Accepted)
{
"job_id": "d290f1ee-6c54-4b01-90e6-d701748f0851",
"document_id": "d290f1ee-6c54-4b01-90e6-d701748f0851",
"status": "processing",
"webhook_url": "https://your-server.com/webhook",
"poll_url": "/api/v1/documents/d290f1ee-6c54-4b01-90e6-d701748f0851",
"created_at": "2026-02-16T10:30:00.000Z"
}Webhook Callback Payload
Sent as POST to your webhook_url when processing completes or fails.
{
"event": "document.completed",
"job_id": "d290f1ee-6c54-4b01-90e6-d701748f0851",
"document_id": "d290f1ee-6c54-4b01-90e6-d701748f0851",
"status": "completed",
"dataset_item_count": 245,
"canonical_qa_count": 80,
"query_variant_count": 80,
"evaluation_count": 45,
"adversarial_count": 40,
"timestamp": "2026-02-16T10:31:15.000Z"
}POSTGenerate Datasets
POST /v1/documents/{id}/processTrigger dataset generation for a document that was uploaded with auto_process=false, or re-process an existing document. The document must be in pending status.
Example Request
curl -X POST https://faqai.app/api/v1/documents/a1b2c3d4-e5f6-7890-abcd-ef1234567890/process \ -H "Authorization: Bearer faq_YOUR_API_KEY"
Response
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Product Manual",
"status": "completed",
"dataset_item_count": 120,
"canonical_qa_count": 15,
"query_variant_count": 60,
"evaluation_count": 30,
"adversarial_count": 15,
"created_at": "2026-02-26T10:30:00.000Z",
"processing_completed_at": "2026-02-26T10:31:15.000Z"
}GETList Documents
GET /v1/documents
Retrieve all documents for the authenticated user, sorted by most recent first.
Query Parameters
| Param | Type | Description |
|---|---|---|
| status | string | Filter by status: pending, processing, completed, or failed. |
| limit | integer | Max results to return (default 20, max 100). |
| offset | integer | Number of results to skip for pagination. |
Example Request
curl https://faqai.app/api/v1/documents?status=completed&limit=10 \ -H "Authorization: Bearer faq_YOUR_API_KEY"
Response
{
"data": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Product Manual",
"file_type": "application/pdf",
"file_size": 245760,
"status": "completed",
"dataset_count": 4,
"created_at": "2026-02-26T10:30:00.000Z",
"processing_completed_at": "2026-02-26T10:31:15.000Z"
}
],
"total": 1,
"limit": 10,
"offset": 0
}GETGet Document Details
GET /v1/documents/{id}Retrieve full details for a single document, including metadata and processing status.
Response
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Product Manual",
"original_filename": "product-manual.pdf",
"file_type": "application/pdf",
"file_size": 245760,
"status": "completed",
"dataset_count": 4,
"dataset_item_count": 120,
"created_at": "2026-02-26T10:30:00.000Z",
"processing_completed_at": "2026-02-26T10:31:15.000Z"
}GETGet Dataset Items
GET /v1/documents/{id}/datasetsRetrieve all datasets and their items for a document. Returns datasets grouped by type with full item details.
Example Request
curl https://faqai.app/api/v1/documents/a1b2c3d4-.../datasets \ -H "Authorization: Bearer faq_YOUR_API_KEY"
Response
{
"document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"datasets": [
{
"id": "ds-001",
"dataset_type": "canonical_qa",
"status": "completed",
"item_count": 344,
"created_at": "2026-03-10T14:30:00.000Z"
},
{
"id": "ds-002",
"dataset_type": "query_variants",
"status": "completed",
"item_count": 344,
"created_at": "2026-03-10T14:31:00.000Z"
},
{
"id": "ds-003",
"dataset_type": "evaluation",
"status": "completed",
"item_count": 278,
"created_at": "2026-03-10T14:32:00.000Z"
},
{
"id": "ds-004",
"dataset_type": "adversarial",
"status": "completed",
"item_count": 139,
"created_at": "2026-03-10T14:33:00.000Z"
}
]
}This endpoint returns dataset metadata (type, status, item count). To retrieve the actual items for a dataset, use the export endpoint: GET /v1/datasets/{id}/export?format=json
GETList Datasets
GET /v1/documents/{id}/datasetsRetrieve all datasets generated for a document with metadata.
Example Request
curl https://faqai.app/api/v1/documents/a1b2c3d4-.../datasets \ -H "Authorization: Bearer faq_YOUR_API_KEY"
GETDataset Coverage
GET /v1/datasets/coverage/{dataset_id}Analyze how well a dataset covers the source document. Returns coverage percentage and identifies uncovered chunks.
Example Request
curl https://faqai.app/api/v1/datasets/coverage/ds-001 \ -H "Authorization: Bearer faq_YOUR_API_KEY"
Response
{
"dataset_id": "ds-001",
"total_chunks": 12,
"chunks_with_qa": 10,
"chunks_without_qa": 2,
"coverage_percent": 83.3,
"uncovered_chunks": [
{ "chunk_id": "chunk-07", "page": 5 },
{ "chunk_id": "chunk-11", "page": 8 }
]
}GETSearch Datasets
GET /v1/datasets/search?q={keyword}Search across all your generated Q&A dataset items by keyword. Useful when you have uploaded many documents and need to find specific questions or answers. Matches against both thequestion andanswer fields.
Query Parameters
| Param | Type | Required | Description |
|---|---|---|---|
| q | string | Yes | Search keyword (min 2 chars) |
| dataset_type | string | No | canonical_qa, query_variants, evaluation, adversarial |
| document_id | string | No | Filter by specific document UUID |
| difficulty | string | No | easy, medium, hard |
| question_type | string | No | definition, explanation, process, comparison, inference, misleading |
| limit | number | No | Results per page (default 20, max 100) |
| offset | number | No | Pagination offset (default 0) |
Example Request
curl "https://faqai.app/api/v1/datasets/search?q=onboarding&dataset_type=canonical_qa&limit=10" \ -H "Authorization: Bearer faq_YOUR_API_KEY"
Response
{
"query": "onboarding",
"data": [
{
"id": "item-001",
"dataset_id": "ds-001",
"document_id": "doc-001",
"question": "What is the employee onboarding process?",
"answer": "The onboarding process consists of three phases: pre-boarding paperwork, orientation week, and a 30-day integration period with assigned mentors.",
"expected_answer": null,
"dataset_type": "canonical_qa",
"context": "Chapter 3: Employee Onboarding...",
"source_document": "hr-handbook.pdf",
"page_number": 12,
"difficulty": "medium",
"confidence": 0.92,
"question_type": "process",
"tags": ["onboarding", "HR", "new-hire"],
"created_at": "2025-01-15T10:30:00Z"
}
],
"total": 47,
"limit": 10,
"offset": 0
}POSTCancel Processing
POST /v1/documents/{id}/cancelCancel a document that is currently being processed. The document must be in processing status. Cancellation stops dataset generation and refunds page credits back to your monthly quota. Any partially generated datasets are marked as failed.
Response
{
"id": "a1b2c3d4-...",
"status": "cancelled",
"message": "Document processing cancelled. Page credits refunded."
}Error Responses
| Status | Reason |
|---|---|
| 400 | Document is not currently processing |
| 404 | Document not found or not owned by you |
DELETEDelete a Document
DELETE /v1/documents/{id}Remove a document from your account. The document and its generated datasets will no longer appear in your dashboard or API responses. Deleting a document does not restore your monthly page quota - deleted pages still count towards your billing cycle usage.
Response
{
"message": "Document deleted successfully"
}Webhooks
Webhooks let you receive real-time HTTP callbacks when events occur (e.g. document processing completes or fails). This is essential for automation workflows where polling is impractical.
Events
| Event | Fired when |
|---|---|
| document.uploaded | File accepted and document record created |
| document.completed | Dataset generation finishes successfully |
| document.failed | Processing encounters an error |
| document.cancelled | Processing cancelled by user; page credits refunded |
| dataset.exported | Dataset export finished and file delivered |
Create a Webhook
POST /api/v1/webhooks
{
"url": "https://your-server.com/webhook",
"events": ["document.uploaded", "document.completed", "document.failed", "document.cancelled", "dataset.exported"],
"description": "My n8n workflow"
}The response includes a secret - save it securely. It is used to verify webhook signatures and is shown only once.
Webhook Payloads
Each event type includes different data fields. All payloads share the same envelope structure.
document.uploaded
{
"event": "document.uploaded",
"timestamp": "2026-02-26T12:00:00.000Z",
"data": {
"id": "a1b2c3d4-...",
"name": "Onboarding Guide",
"file_type": "application/pdf",
"file_size": 1048576,
"status": "pending"
}
}document.completed
{
"event": "document.completed",
"timestamp": "2026-02-26T12:00:00.000Z",
"data": {
"id": "a1b2c3d4-...",
"name": "Onboarding Guide",
"status": "completed",
"dataset_item_count": 120,
"query_variant_count": 80,
"evaluation_count": 45,
"adversarial_count": 40,
"processing_completed_at": "2026-02-26T12:00:00.000Z"
}
}document.failed
{
"event": "document.failed",
"timestamp": "2026-02-26T12:00:00.000Z",
"data": {
"id": "a1b2c3d4-...",
"status": "failed",
"error": "Content moderation flagged the document"
}
}dataset.exported
{
"event": "dataset.exported",
"timestamp": "2026-02-26T12:00:00.000Z",
"data": {
"dataset_id": "e5f6g7h8-...",
"document_id": "a1b2c3d4-...",
"format": "json",
"item_count": 120,
"file_size": 45230
}
}Signature Verification
Every delivery includes an X-FAQai-Signature header containing an HMAC-SHA256 signature of the request body, using your webhook secret as the key:
X-FAQai-Signature: sha256=<hmac_hex>
// Verify in Node.js:
const crypto = require("crypto");
const expected = crypto
.createHmac("sha256", WEBHOOK_SECRET)
.update(rawBody)
.digest("hex");
const isValid = signature === `sha256=${expected}`;Manage Webhooks
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/webhooks | Create a webhook |
| GET | /api/v1/webhooks | List your webhooks |
| GET | /api/v1/webhooks/{id} | Details + recent deliveries |
| DELETE | /api/v1/webhooks/{id} | Remove a webhook |
Automation Tools
FAQai's REST API and webhooks work out of the box with popular automation platforms. Below are detailed, step-by-step guides for each.
Prerequisites
You need a Basic, Starter or Pro plan with an API key. Create one at Settings → API Keys. All examples below use faq_YOUR_API_KEY - replace it with your actual key.
n8n
Two approaches - polling-based (simpler) and webhook-based (real-time).
Option A: Form Upload + Polling
Workflow: n8n Form Trigger → Upload → Wait → Poll Status → Get Dataset Items → Output
- n8n Form Trigger - Add a Form Trigger node with a File field (name:
document) and an optional Text field (name:name). - HTTP Request - Upload - POST to the FAQai API:
Method: POST
URL: https://<your-domain>/api/v1/documents
Auth: Header Auth → Authorization: Bearer faq_YOUR_API_KEY
Body (Form-Data / Multipart):
file = {{ $binary.document }} ← binary from form
name = {{ $json.name }} ← optional
auto_process = trueResponse returns the document id with status: "processing".
- Wait - Add a Wait node set to 15 seconds.
- HTTP Request - Poll Status - GET the document:
Method: GET
URL: https://<your-domain>/api/v1/documents/{{ $json.id }}
Auth: Same Bearer token- IF node - Check
{{ $json.status }} === "completed". True → continue. False → loop back to the Wait node. - HTTP Request - Get Dataset Items:
Method: GET
URL: https://<your-domain>/api/v1/documents/{{ $json.id }}/datasets
Auth: Same Bearer token- Output - Connect to Google Sheets, Slack, Email, Notion, or any action node to use the generated dataset items.
Option B: Webhook-based (no polling)
Uses two separate n8n workflows - one to upload and one to receive results.
Workflow 1 - Upload:
- n8n Form Trigger with a file field (same as above).
- HTTP Request - POST to
/api/v1/documents(same as above). Done - no waiting needed.
Workflow 2 - Receive results:
- Webhook Trigger node - copy the generated URL.
- Register the webhook (one-time setup):
curl -X POST https://<your-domain>/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-n8n.example.com/webhook/abc123",
"events": ["document.uploaded", "document.completed", "document.failed", "document.cancelled", "dataset.exported"],
"description": "n8n dataset results"
}'- When FAQai finishes processing, it POSTs to your n8n webhook with the document ID. The Webhook Trigger fires automatically.
- HTTP Request - GET
/api/v1/documents/{{ $json.data.id }}/datasetsto fetch results → pipe to any output.
Make.com (Integromat)
Use HTTP modules with Bearer token authentication and Custom Webhooks for callbacks.
Scenario: File Upload → Generate Datasets → Google Sheets
Flow: Custom Webhook → HTTP Upload → Sleep → HTTP Poll → Router → HTTP Get Dataset Items → Google Sheets
- Trigger: Custom Webhook - Create a Custom Webhook module. Copy its URL. You'll send files to this URL from an external form, email, or another scenario.
- HTTP Module - Upload document:
Module: HTTP → Make a Request URL: https://<your-domain>/api/v1/documents Method: POST Headers: Authorization: Bearer faq_YOUR_API_KEY Body type: Multipart/form-data Fields: file → Map from webhook binary data name → Map from webhook JSON (optional) auto_process → true
- Sleep module - Set to 20 seconds (Tools → Sleep).
- HTTP Module - Poll status:
URL: https://<your-domain>/api/v1/documents/{{2.id}}
Method: GET
Headers: Authorization: Bearer faq_YOUR_API_KEY{{2.id}} maps the document ID from the upload response (module 2).
- Router - Route 1: if
{{4.status}} = completed→ continue. Route 2: if still processing → loop back to Sleep. - HTTP Module - Get Dataset Items:
URL: https://<your-domain>/api/v1/documents/{{2.id}}/datasets
Method: GET
Headers: Authorization: Bearer faq_YOUR_API_KEY- Google Sheets - Create Rows - Use an Iterator to loop through
data[]and write each dataset item as a row with columns: Question, Answer, Category, Confidence.
Webhook-based alternative
Replace the Sleep + Poll loop with a webhook listener:
- Create a separate scenario with a Custom Webhook trigger.
- Register that URL with FAQai:
curl -X POST https://<your-domain>/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://hook.eu1.make.com/your-webhook-id",
"events": ["document.uploaded", "document.completed", "document.failed", "document.cancelled", "dataset.exported"]
}'- FAQai POSTs to Make.com when done. The payload contains
data.id- use it to GET the dataset items.
Zapier
Uses “Webhooks by Zapier” for both triggering and making API calls.
Zap: Upload from Google Drive → Generate Datasets → Send via Email
Flow: Google Drive Trigger → Webhooks POST → Delay → Webhooks GET → Filter → Webhooks GET → Gmail
- Trigger: New File in Google Drive Folder - fires when a new PDF/DOCX/TXT is added to a specific folder.
- Webhooks by Zapier - POST (upload document):
URL: https://<your-domain>/api/v1/documents Payload Type: Form Data: file = (map file from Google Drive step) name = (map filename from Google Drive step) auto_process = true Headers: Authorization: Bearer faq_YOUR_API_KEY
- Delay by Zapier - Wait 1 minute (Zapier delays are in increments of minutes).
- Webhooks by Zapier - GET (check status):
URL: https://<your-domain>/api/v1/documents/{id from step 2}
Headers: Authorization: Bearer faq_YOUR_API_KEY- Filter by Zapier - Only continue if
statusequalscompleted. - Webhooks by Zapier - GET (retrieve dataset items):
URL: https://<your-domain>/api/v1/documents/{id from step 2}/datasets
Headers: Authorization: Bearer faq_YOUR_API_KEY- Gmail - Send Email - Format the dataset items into the email body. Map
data[].questionanddata[].answerfrom the previous step using Zapier's Formatter or Looping.
Webhook-based alternative (Catch Hook)
Eliminates the delay and polling steps:
- Create a Zap with Webhooks by Zapier → Catch Hook as the trigger. Copy the webhook URL.
- Register it with FAQai (one-time):
curl -X POST https://<your-domain>/api/v1/webhooks \
-H "Authorization: Bearer faq_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://hooks.zapier.com/hooks/catch/12345/abcdef/",
"events": ["document.uploaded", "document.completed", "document.failed", "document.cancelled", "dataset.exported"]
}'- FAQai POSTs to Zapier when done. The Catch Hook trigger fires with the document ID in
data.id. - Add a GET step to fetch dataset items, then connect any action (Sheets, Slack, Email, etc.).
Tips for all automation tools:
- Store your API key in a credential / connection (not hardcoded) - all platforms support Header Auth or Custom Auth.
- Set the
Authorizationheader toBearer faq_YOUR_API_KEYin every HTTP request. - For high-volume workflows, prefer webhooks over polling to reduce API calls and get instant results.
- Check the response
statusfield - handle bothcompletedandfailedin your error paths.
Error Codes
All errors return a JSON object with a message field describing the issue.
| Status | Code | Description |
|---|---|---|
| 400 | bad_request | Invalid parameters, file format, or document not in expected state |
| 401 | unauthorized | Missing or invalid API key |
| 403 | forbidden | API key does not have access, or export format not available on your plan |
| 404 | not_found | Document or resource not found |
| 409 | conflict | Document is already being processed or has been completed |
| 413 | payload_too_large | File exceeds plan file size or page limit |
| 422 | unprocessable | Processing failed or dataset is not in a completed state for export |
| 429 | rate_limited | Too many requests, monthly page quota exceeded, or API call limit reached |
| 500 | server_error | Internal server error - contact support |
Error response example
{
"status": 429,
"code": "rate_limited",
"message": "Monthly page limit reached. Upgrade your plan or purchase overage pages."
}Code Examples
Complete examples for uploading a document and retrieving the generated dataset items.
Node.js / JavaScript
import fs from "fs";
const API_KEY = process.env.FAQAI_API_KEY;
const BASE = "https://faqai.app/api/v1";
// 1. Upload a document
const form = new FormData();
form.append("file", fs.createReadStream("./manual.pdf"));
form.append("auto_process", "true");
const upload = await fetch(`${BASE}/documents`, {
method: "POST",
headers: { Authorization: `Bearer ${API_KEY}` },
body: form,
});
const doc = await upload.json();
console.log("Document ID:", doc.id); // status: "processing"
// 2. Poll until processing completes
let status = doc.status;
while (status === "processing" || status === "pending") {
await new Promise((r) => setTimeout(r, 3000));
const res = await fetch(`${BASE}/documents/${doc.id}`, {
headers: { Authorization: `Bearer ${API_KEY}` },
});
const detail = await res.json();
status = detail.status;
}
if (status === "failed" || status === "cancelled") {
console.error("Processing did not complete:", status);
process.exit(1);
}
// 3. Retrieve generated datasets
const datasetsRes = await fetch(`${BASE}/documents/${doc.id}/datasets`, {
headers: { Authorization: `Bearer ${API_KEY}` },
});
const datasets = await datasetsRes.json();
console.log(`Generated ${datasets.datasets.length} datasets`);
datasets.datasets.forEach((ds) => {
console.log(` ${ds.dataset_type}: ${ds.item_count} items`);
});Python
import os, time, requests
API_KEY = os.environ["FAQAI_API_KEY"]
BASE = "https://faqai.app/api/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# 1. Upload a document
with open("manual.pdf", "rb") as f:
resp = requests.post(
f"{BASE}/documents",
headers=HEADERS,
files={"file": f},
data={"auto_process": "true"},
)
doc = resp.json()
print(f"Document ID: {doc['id']}") # status: "processing"
# 2. Poll until processing completes
status = doc["status"]
while status in ("processing", "pending"):
time.sleep(3)
detail = requests.get(
f"{BASE}/documents/{doc['id']}", headers=HEADERS
).json()
status = detail["status"]
if status in ("failed", "cancelled"):
raise SystemExit(f"Processing did not complete: {status}")
# 3. Retrieve generated datasets
datasets = requests.get(
f"{BASE}/documents/{doc['id']}/datasets", headers=HEADERS
).json()
for ds in datasets["datasets"]:
print(f" {ds['dataset_type']}: {ds['item_count']} items")cURL
# Upload curl -X POST https://faqai.app/api/v1/documents \ -H "Authorization: Bearer faq_YOUR_API_KEY" \ -F "file=@manual.pdf" \ -F "auto_process=true" # Check status curl https://faqai.app/api/v1/documents/DOC_ID \ -H "Authorization: Bearer faq_YOUR_API_KEY" # Get datasets curl https://faqai.app/api/v1/documents/DOC_ID/datasets \ -H "Authorization: Bearer faq_YOUR_API_KEY" # Get dataset coverage curl https://faqai.app/api/v1/datasets/coverage/DATASET_ID \ -H "Authorization: Bearer faq_YOUR_API_KEY" # Search across all datasets curl "https://faqai.app/api/v1/datasets/search?q=onboarding&dataset_type=canonical_qa" \ -H "Authorization: Bearer faq_YOUR_API_KEY"
Processing is asynchronous
Dataset generation typically takes 10-60 seconds depending on document size. Poll the GET /v1/documents/{id} endpoint and check the status field until it changes to completed, failed, or cancelled. Alternatively, use webhooks for real-time notifications.
Security & Privacy
Your uploaded documents are processed securely using TLS encryption in transit and AES-256 encryption at rest. Document text is sent to AI models for dataset generation and is not used for model training.
All data is protected by Supabase Row Level Security - your documents and datasets are only accessible by your account.
For more details, see our Privacy Policy, Security Page, and GDPR Compliance.