AI Service

The Audra Flow AI Service is a multi-agent system powered by a Retrieval-Augmented Generation (RAG) pipeline. It provides intelligent, context-aware assistance across every phase of product development — from user research and specification authoring to technical architecture and delivery tracking.

Multi-Agent Architecture

Rather than relying on a single, general-purpose model, Audra Flow deploys four specialised AI agents. Each agent carries its own system prompt, domain context, and retrieval scope so that responses are consistently relevant to the task at hand.

UX Researcher

Analyses user interviews, survey data, and behavioural analytics to surface insights. The UX Researcher agent helps teams synthesise personas, journey maps, and usability findings without manually combing through raw data.

Product Owner

Assists with writing and refining product specifications, user stories, acceptance criteria, and release plans. It draws on your uploaded requirements documents and historical project artefacts to maintain consistency and traceability.

Architect

Focuses on technical specifications, service contracts, and system design decisions. The Architect agent can reference your existing architecture documents to recommend patterns that align with established standards.

Product Guru

A knowledge-base agent that answers questions about your product domain. It retrieves information from ingested documents — competitive analyses, market research, internal wikis — and delivers precise, citation-backed answers.

RAG Pipeline

Retrieval-Augmented Generation ensures that every AI response is grounded in your own data rather than relying solely on the foundation model's training corpus. The pipeline consists of four stages:

1. Document Ingestion

Upload documents through the API or the Audra Flow UI. Supported formats include PDF, Markdown, plain text, and common office documents. Each file is parsed, cleaned, and split into semantically meaningful chunks.

2. Embedding & Storage

Chunks are transformed into high-dimensional vector embeddings and stored directly in PostgreSQL using the pgvector extension. There is no separate vector database to provision or maintain — your relational data and vector data live side by side.

3. Retrieval

When a user sends a query, the system converts it into an embedding and performs a similarity search against the stored vectors. The most relevant chunks are retrieved, ranked, and passed to the generation step as additional context.

4. Generation

The retrieved context, the agent's system prompt, and the user's message are combined into a single prompt sent to the LLM. The model produces a response that is both informed by your proprietary data and constrained by the agent's domain instructions.

LLM Integration

Audra Flow integrates with leading large language model providers and is designed so that switching or adding providers requires minimal configuration changes.

Provider	Role	Model
OpenAI	Primary	GPT-4
DeepSeek	Fallback	DeepSeek Chat

If the primary provider is unavailable or rate-limited, the service automatically falls back to the secondary provider. This ensures high availability for AI-assisted workflows without manual intervention.

Prompt Management

Each agent's behaviour is governed by a structured prompt that includes:

System instructions — define the agent's role, tone, and output format.
Domain context — injected dynamically from the RAG retrieval step.
User message — the end-user's query or command.

System instructions are strictly separated from user input to mitigate prompt injection risks. Output validation ensures that responses conform to the requested format (JSON or Markdown) and never leak internal system instructions.

API Endpoints

The AI Service exposes a small, focused REST API built on FastAPI. All endpoints require authentication via a valid JWT token.

Method	Endpoint	Description
`GET`	`/health`	Service health check
`POST`	`/api/v1/chat`	Send a message to one of the AI agents and receive a streamed or synchronous response.
`POST`	`/api/v1/documents`	Upload documents for ingestion into the RAG pipeline.
`GET`	`/api/v1/agents`	List the available AI agents and their capabilities.

Technology Stack

Framework — FastAPI (Python) for high-performance async request handling.
Agent Orchestration — LangGraph for defining and executing multi-step agent workflows.
Vector Storage — PostgreSQL with the pgvector extension for embedding storage and similarity search.
Caching — Redis for session state and response caching.

Zero-Retention Policy: Customer data is never used to train public foundation models. RAG retrieval is strictly scoped to the authenticated user's project permissions, ensuring full context isolation between tenants.