Troubleshooting
This page covers the most common issues teams encounter when developing, deploying, and operating Audra Flow, along with step-by-step remediation guidance.
Common Development Issues
Port Conflicts
Symptom: A service fails to start with an “address already in use” error.
Solution: Identify and stop the process occupying the port:
# Check which process is using port 3000
lsof -i :3000
# Check PostgreSQL default port
lsof -i :5432Stop the conflicting process, or configure Audra Flow to use an alternative port via the .env file.
Database Connection Errors
Symptom: The web app or AI service cannot connect to PostgreSQL on startup.
Checklist:
- Verify that the database container is running:
docker ps. - Confirm that
DATABASE_URLin your.envfile matches the Docker Compose service name and port. - If using the “dependencies only” profile, ensure you started it before the application:
make deps. - Reset the database if it is in a corrupt state:
make down # stop all services make clean # remove volumes make deps # restart fresh
Redis Connection Errors
Symptom: Authentication or caching fails with a Redis connection error.
Solution: Confirm Redis is running and that REDIS_URL in your .env file points to the correct host and port (default localhost:6379 for local development). Restart the Redis container if necessary:
docker restart audraflow-redis-devDocker Troubleshooting
Containers Fail to Build
Symptom: docker compose up --build exits with errors during the build stage.
Solution: Rebuild without cache to rule out stale layers:
make rebuildViewing Service Logs
If a container starts but the application inside it crashes, inspect the logs:
# All services
make logs
# Single service
docker compose -f docker-compose.dev.yml logs -f web
docker compose -f docker-compose.dev.yml logs -f ai-serviceStale Volumes
Symptom: Database migrations or seed data appear outdated after pulling new code.
Solution: Remove existing volumes and recreate them:
make clean # stops services and removes volumes
make dev # fresh startDatabase Recovery
Audra Flow on AWS leverages RDS automated backups and Multi-AZ replication. The following table summarises the recovery options by scenario:
| Scenario | Recovery Method | Estimated Time |
|---|---|---|
| Instance failure (Multi-AZ) | Automatic failover to standby | < 5 minutes |
| Data corruption or accidental deletion | Point-in-time restore | 1 – 2 hours |
| Complete database loss | Snapshot restore | 2 – 4 hours |
| Region-level failure | Cross-region snapshot restore | 4 – 8 hours |
Point-in-Time Restore
RDS retains continuous backups for the configured retention window (seven days for production). To recover to a specific moment:
- Identify the target timestamp — typically just before the incident occurred.
- Create a restored instance from the RDS console or CLI using the point-in-time restore feature.
- Verify data integrity on the restored instance (check record counts, referential integrity, and the latest audit timestamps).
- Swap the restored instance into the primary role by updating the connection string in Secrets Manager, then restart the ECS services to pick up the new endpoint.
Data Integrity Verification
After any recovery, run the following checks against the restored database:
- Record counts — compare user, project, and audit-log totals against the last known-good values.
- Referential integrity — query for orphaned records (e.g., project members referencing deleted users).
- Application smoke tests — exercise the health-check and critical API endpoints to confirm the application layer is connected and functioning.
Deployment Rollback
Quick Rollback (Application Only)
If a newly deployed version introduces regressions, roll back to the previous ECS task definition revision. ECS will drain the current tasks and launch tasks using the previous container image.
Full Infrastructure Rollback
For configuration-level regressions (Terraform changes), check out the previous commit of the infrastructure code and re-apply:
git checkout <previous-commit> -- infrastructure/
terraform apply -var-file=environments/prod.tfvarsDatabase Migration Rollback
Prisma does not support automatic migration rollback. If a migration must be reversed, write a compensating migration that undoes the schema changes and apply it as a new migration. Always test migrations against a staging database before running them in production.
Health-Check Endpoints
Each Audra Flow service exposes a health endpoint that the load balancer and monitoring systems poll continuously:
| Service | Endpoint | Expected Response |
|---|---|---|
| Web API | /api/health | 200 OK with {"status":"healthy"} |
| AI Service | /health | 200 OK with {"status":"healthy"} |
If a health check fails, the ALB stops routing traffic to the unhealthy task, and ECS automatically replaces it.
Frequently Asked Questions
How do I reset my local database without losing Docker images?
Run make clean followed by make dev. This removes Docker volumes (database data, Redis cache) but preserves built images.
The AI Service is not responding locally. What should I check?
Confirm that your OPENAI_API_KEY is set in the .env file and that the AI Service container is running on port 8000. Check its logs with docker logs audraflow-ai-service-dev.
How do I verify that my cloud deployment is healthy?
Curl the health endpoint of the load balancer URL:
curl https://<your-alb-url>/api/health
# Expected: {"status":"healthy"}Additionally, check the ECS console to confirm that running task count matches the desired count and that the service status is ACTIVE.
A deployment is stuck. What do I do?
Inspect the ECS service events for error messages. Common causes include health-check timeouts (the application takes too long to start) and container exit errors (missing environment variables or failed database migrations). Review the CloudWatch logs for the most recently stopped task to pinpoint the failure.
How do I roll back if a production deployment goes wrong?
Update the ECS service to use the previous task definition revision. ECS will perform a rolling replacement, draining the faulty tasks and launching containers from the last known-good image. If infrastructure changes are involved, revert the Terraform code and re-apply.
Can I run a production-like build locally?
Yes. Use make up to launch the production Docker Compose profile locally. This builds optimised, multi-stage images identical to what runs in AWS, allowing you to catch build or runtime issues before deploying.