Detailed Setup
Everything beyond the Quick Start: environment configuration, TLS, storage, scaling, backup, retention, and the production-ready checklist. The in-platform docs go deeper still — this is the marketing-tier reference.
Environment configuration
All services read configuration from environment files in deploy/env/. The compose files mount each component's env file as the container's environment. For Kubernetes, the same values become a ConfigMap + Secret pair (see the Helm values.yaml).
api-server.env
| Variable | Default | Notes |
|---|---|---|
DATABASE_DSN | postgres://aethonlog:aethonlog@postgres:5432/aethonlog?sslmode=disable | Use strong creds + sslmode=require in production |
BROKER_BROKERS | redpanda:9092 | Comma-separated list for HA |
OPENSEARCH_URL | http://opensearch:9200 | HTTPS + auth in prod |
LISTEN_HTTP | :8080 | HTTP listen address |
PUBLIC_ADDR | localhost:8080 | Used by agent install scripts; set to your public hostname |
GRPC_PUBLIC_ADDR | localhost:9091 | Agents reach this for streaming logs |
UI_PATH | /srv/ui | Path to compiled React assets |
DATABASE_MIGRATIONS_PATH | /srv/migrations | SQL migrations run at startup |
SMTP_HOST + port/user/pass/from | empty | For invite emails & alert notifications |
LOG_LEVEL | info | debug for verbose; warn for quieter |
ingest-gateway.env
| Variable | Default |
|---|---|
BROKER_BROKERS | redpanda:9092 |
LISTEN_GRPC | :9091 |
LISTEN_HTTP | :8081 |
syslog-gateway.env
| Variable | Default |
|---|---|
BROKER_BROKERS | redpanda:9092 |
LISTEN_UDP | :1514 |
LISTEN_TCP | :1514 |
TLS / HTTPS
The platform itself listens HTTP-only for simplicity; production deployments terminate TLS in front. Two common patterns:
- Reverse proxy (recommended) — Caddy, Traefik, or nginx in front of the API server. Caddy auto-provisions Let's Encrypt certs; Traefik does the same when paired with a cert resolver. Forward to
http://api-server:8080. - Direct TLS in api-server — set
LISTEN_HTTPS=:8443and mount cert + key asTLS_CERT_FILE/TLS_KEY_FILE. Disable plain HTTP withLISTEN_HTTP=(empty).
For Kubernetes, the Helm chart includes an Ingress definition; pair it with cert-manager + ClusterIssuer (Let's Encrypt or your internal CA) for auto-rotating certs.
Storage
PostgreSQL
Stores tenants, users, alert rules, audit log, and saved searches. Footprint stays small (typically < 10 GB even on large deployments). Production recommendations:
- Use a managed DB (RDS, Cloud SQL, etc.) or PG 16 with streaming replication
- Enable WAL archiving + point-in-time recovery
- Take a logical dump (
pg_dump) nightly; retain per your compliance window - Strong password +
sslmode=requireon the DSN
OpenSearch
The bulk of your storage — every log event is indexed here. Plan ~1 GB on disk per 1 GB of ingested logs at default replica settings (1 primary + 1 replica).
- Single-node default is fine for evaluation, not for prod
- Three-node minimum for resilience; enable cross-zone replication
- Snapshots to S3 or compatible object storage (MinIO, GCS) — daily, retained per retention policy
- Tune JVM heap via
OPENSEARCH_JAVA_OPTS(half of available RAM, capped at 31 GB) - Use index lifecycle management to roll old indices to slower storage tiers
Redis
Holds sessions, rate-limit counters, and query result cache. Production:
- Redis 7+ with AOF persistence enabled (
appendonly yes) - Set a password via
requirepass(consumed by api-server viaREDIS_PASSWORD) - Replicated setup (
replicaof) optional but recommended for > 1 instance
Scaling
Every service except the three stateful systems (Postgres, OpenSearch, Redis) is stateless and scales horizontally.
| Component | Scale signal | Notes |
|---|---|---|
| api-server | Request latency, CPU | Stateless; put behind a load balancer; sticky sessions not required |
| ingest-gateway | Backpressure on Redpanda | Stateless; scale to ingest peak |
| syslog-gateway | UDP packet loss | UDP is per-instance; consider keepalived VIPs for HA |
| parser-worker | Consumer lag | Scale on Redpanda consumer group lag |
| routing-worker | Consumer lag | One per tenant routing rule complexity; usually fewer than parsers |
| sink-connector | OpenSearch bulk queue depth | Match to OpenSearch write capacity |
On Kubernetes, set HPAs on every workload component. Sane starting values for a 1 TB/day deployment: 3 api-servers, 2 ingest-gateways, 4 parser-workers, 2 routing-workers, 3 sink-connectors.
Backup & disaster recovery
Three things to back up, each on its own schedule:
- PostgreSQL — daily
pg_dump+ continuous WAL archiving. Restore drills quarterly. - OpenSearch — daily snapshots to off-site object storage. Test a restore each release cycle.
- Configuration — the env files + any custom parsers, alert rules, and SAML metadata. Keep in version control alongside your infra.
Redis is cache + ephemeral state — no backup needed.
Retention policies
Configured per tenant in Admin → Retention Policies. Each policy is a query filter + a TTL. Common pattern:
level:debug→ 7 dayslevel:info→ 30 dayslevel:error OR audit:*→ 365 days (compliance window)
OpenSearch's ILM enforces the TTL by rolling indices to a delete phase. Compliance presets (HIPAA, SOC 2, PCI-DSS, FedRAMP) ship as templates.
Security & compliance
Authentication
- Local accounts: bcrypt-hashed passwords, optional TOTP MFA
- SAML 2.0: tested with Keycloak, Okta, Azure AD
- OIDC: any conforming provider
- API tokens: per-user, scoped, optional expiry
Authorization
- RBAC with predefined roles (viewer, editor, admin) + custom roles
- Per-tenant data isolation enforced at OpenSearch index level + API layer
- Platform-admin role for global operations (tenant management, system config)
Data protection
- PII redaction patterns applied at ingest (configurable regex + named patterns)
- Audit trail in append-only Postgres table, mirrored to OpenSearch for search
- Optional field-level encryption for highly sensitive log fields
Production checklist
- ☐ Strong DB password set;
sslmode=require - ☐ Reverse proxy with valid TLS in front of api-server
- ☐ SMTP configured for invite + alert emails
- ☐ Backups scripted (Postgres + OpenSearch) with off-site copy
- ☐ Retention policies defined per tenant
- ☐ SSO wired (SAML/OIDC) — disable local signup after bootstrap if compliance requires
- ☐ Resource limits set in compose/Helm
- ☐ Monitoring: Prometheus scraping
/metricson every service - ☐ Alertmanager rules for AethonLog itself (capacity, lag, error rate)
- ☐ At least one DR drill completed
This is the marketing-tier reference. The full operational manual — every env var, every API endpoint, the complete aethonctl CLI, the parser DSL grammar, the alert rule reference, the Helm values schema — lives in the installable docs. After install, browse http://YOUR_SERVER:8080/docs/.