Audra Flow

Troubleshooting

This page covers the most common issues teams encounter when developing, deploying, and operating Audra Flow, along with step-by-step remediation guidance.

Common Development Issues

Port Conflicts

Symptom: A service fails to start with an “address already in use” error.

Solution: Identify and stop the process occupying the port:

# Check which process is using port 3000
lsof -i :3000

# Check PostgreSQL default port
lsof -i :5432

Stop the conflicting process, or configure Audra Flow to use an alternative port via the .env file.

Database Connection Errors

Symptom: The web app or AI service cannot connect to PostgreSQL on startup.

Checklist:

  1. Verify that the database container is running: docker ps.
  2. Confirm that DATABASE_URL in your .env file matches the Docker Compose service name and port.
  3. If using the “dependencies only” profile, ensure you started it before the application: make deps.
  4. Reset the database if it is in a corrupt state:
    make down   # stop all services
    make clean  # remove volumes
    make deps   # restart fresh

Redis Connection Errors

Symptom: Authentication or caching fails with a Redis connection error.

Solution: Confirm Redis is running and that REDIS_URL in your .env file points to the correct host and port (default localhost:6379 for local development). Restart the Redis container if necessary:

docker restart audraflow-redis-dev

Docker Troubleshooting

Containers Fail to Build

Symptom: docker compose up --build exits with errors during the build stage.

Solution: Rebuild without cache to rule out stale layers:

make rebuild

Viewing Service Logs

If a container starts but the application inside it crashes, inspect the logs:

# All services
make logs

# Single service
docker compose -f docker-compose.dev.yml logs -f web
docker compose -f docker-compose.dev.yml logs -f ai-service

Stale Volumes

Symptom: Database migrations or seed data appear outdated after pulling new code.

Solution: Remove existing volumes and recreate them:

make clean  # stops services and removes volumes
make dev    # fresh start

Database Recovery

Audra Flow on AWS leverages RDS automated backups and Multi-AZ replication. The following table summarises the recovery options by scenario:

ScenarioRecovery MethodEstimated Time
Instance failure (Multi-AZ)Automatic failover to standby< 5 minutes
Data corruption or accidental deletionPoint-in-time restore1 – 2 hours
Complete database lossSnapshot restore2 – 4 hours
Region-level failureCross-region snapshot restore4 – 8 hours

Point-in-Time Restore

RDS retains continuous backups for the configured retention window (seven days for production). To recover to a specific moment:

  1. Identify the target timestamp — typically just before the incident occurred.
  2. Create a restored instance from the RDS console or CLI using the point-in-time restore feature.
  3. Verify data integrity on the restored instance (check record counts, referential integrity, and the latest audit timestamps).
  4. Swap the restored instance into the primary role by updating the connection string in Secrets Manager, then restart the ECS services to pick up the new endpoint.

Data Integrity Verification

After any recovery, run the following checks against the restored database:

  • Record counts — compare user, project, and audit-log totals against the last known-good values.
  • Referential integrity — query for orphaned records (e.g., project members referencing deleted users).
  • Application smoke tests — exercise the health-check and critical API endpoints to confirm the application layer is connected and functioning.

Deployment Rollback

Quick Rollback (Application Only)

If a newly deployed version introduces regressions, roll back to the previous ECS task definition revision. ECS will drain the current tasks and launch tasks using the previous container image.

Full Infrastructure Rollback

For configuration-level regressions (Terraform changes), check out the previous commit of the infrastructure code and re-apply:

git checkout <previous-commit> -- infrastructure/
terraform apply -var-file=environments/prod.tfvars

Database Migration Rollback

Prisma does not support automatic migration rollback. If a migration must be reversed, write a compensating migration that undoes the schema changes and apply it as a new migration. Always test migrations against a staging database before running them in production.

Health-Check Endpoints

Each Audra Flow service exposes a health endpoint that the load balancer and monitoring systems poll continuously:

ServiceEndpointExpected Response
Web API/api/health200 OK with {"status":"healthy"}
AI Service/health200 OK with {"status":"healthy"}

If a health check fails, the ALB stops routing traffic to the unhealthy task, and ECS automatically replaces it.

Frequently Asked Questions

How do I reset my local database without losing Docker images?

Run make clean followed by make dev. This removes Docker volumes (database data, Redis cache) but preserves built images.

The AI Service is not responding locally. What should I check?

Confirm that your OPENAI_API_KEY is set in the .env file and that the AI Service container is running on port 8000. Check its logs with docker logs audraflow-ai-service-dev.

How do I verify that my cloud deployment is healthy?

Curl the health endpoint of the load balancer URL:

curl https://<your-alb-url>/api/health
# Expected: {"status":"healthy"}

Additionally, check the ECS console to confirm that running task count matches the desired count and that the service status is ACTIVE.

A deployment is stuck. What do I do?

Inspect the ECS service events for error messages. Common causes include health-check timeouts (the application takes too long to start) and container exit errors (missing environment variables or failed database migrations). Review the CloudWatch logs for the most recently stopped task to pinpoint the failure.

How do I roll back if a production deployment goes wrong?

Update the ECS service to use the previous task definition revision. ECS will perform a rolling replacement, draining the faulty tasks and launching containers from the last known-good image. If infrastructure changes are involved, revert the Terraform code and re-apply.

Can I run a production-like build locally?

Yes. Use make up to launch the production Docker Compose profile locally. This builds optimised, multi-stage images identical to what runs in AWS, allowing you to catch build or runtime issues before deploying.