Detailed Setup

Everything beyond the Quick Start: environment configuration, TLS, storage, scaling, backup, retention, and the production-ready checklist. The in-platform docs go deeper still — this is the marketing-tier reference.

Environment configuration

All services read configuration from environment files in deploy/env/. The compose files mount each component's env file as the container's environment. For Kubernetes, the same values become a ConfigMap + Secret pair (see the Helm values.yaml).

api-server.env

Variable	Default	Notes
`DATABASE_DSN`	`postgres://aethonlog:aethonlog@postgres:5432/aethonlog?sslmode=disable`	Use strong creds + `sslmode=require` in production
`BROKER_BROKERS`	`redpanda:9092`	Comma-separated list for HA
`OPENSEARCH_URL`	`http://opensearch:9200`	HTTPS + auth in prod
`LISTEN_HTTP`	`:8080`	HTTP listen address
`PUBLIC_ADDR`	`localhost:8080`	Used by agent install scripts; set to your public hostname
`GRPC_PUBLIC_ADDR`	`localhost:9091`	Agents reach this for streaming logs
`UI_PATH`	`/srv/ui`	Path to compiled React assets
`DATABASE_MIGRATIONS_PATH`	`/srv/migrations`	SQL migrations run at startup
`SMTP_HOST` + port/user/pass/from	empty	For invite emails & alert notifications
`LOG_LEVEL`	`info`	`debug` for verbose; `warn` for quieter

ingest-gateway.env

Variable	Default
`BROKER_BROKERS`	`redpanda:9092`
`LISTEN_GRPC`	`:9091`
`LISTEN_HTTP`	`:8081`

syslog-gateway.env

Variable	Default
`BROKER_BROKERS`	`redpanda:9092`
`LISTEN_UDP`	`:1514`
`LISTEN_TCP`	`:1514`

TLS / HTTPS

The platform itself listens HTTP-only for simplicity; production deployments terminate TLS in front. Two common patterns:

Reverse proxy (recommended) — Caddy, Traefik, or nginx in front of the API server. Caddy auto-provisions Let's Encrypt certs; Traefik does the same when paired with a cert resolver. Forward to http://api-server:8080.
Direct TLS in api-server — set LISTEN_HTTPS=:8443 and mount cert + key as TLS_CERT_FILE / TLS_KEY_FILE. Disable plain HTTP with LISTEN_HTTP= (empty).

For Kubernetes, the Helm chart includes an Ingress definition; pair it with cert-manager + ClusterIssuer (Let's Encrypt or your internal CA) for auto-rotating certs.

Storage

PostgreSQL

Stores tenants, users, alert rules, audit log, and saved searches. Footprint stays small (typically < 10 GB even on large deployments). Production recommendations:

Use a managed DB (RDS, Cloud SQL, etc.) or PG 16 with streaming replication
Enable WAL archiving + point-in-time recovery
Take a logical dump (pg_dump) nightly; retain per your compliance window
Strong password + sslmode=require on the DSN

OpenSearch

The bulk of your storage — every log event is indexed here. Plan ~1 GB on disk per 1 GB of ingested logs at default replica settings (1 primary + 1 replica).

Single-node default is fine for evaluation, not for prod
Three-node minimum for resilience; enable cross-zone replication
Snapshots to S3 or compatible object storage (MinIO, GCS) — daily, retained per retention policy
Tune JVM heap via OPENSEARCH_JAVA_OPTS (half of available RAM, capped at 31 GB)
Use index lifecycle management to roll old indices to slower storage tiers

Redis

Holds sessions, rate-limit counters, and query result cache. Production:

Redis 7+ with AOF persistence enabled (appendonly yes)
Set a password via requirepass (consumed by api-server via REDIS_PASSWORD)
Replicated setup (replicaof) optional but recommended for > 1 instance

Scaling

Every service except the three stateful systems (Postgres, OpenSearch, Redis) is stateless and scales horizontally.

Component	Scale signal	Notes
api-server	Request latency, CPU	Stateless; put behind a load balancer; sticky sessions not required
ingest-gateway	Backpressure on Redpanda	Stateless; scale to ingest peak
syslog-gateway	UDP packet loss	UDP is per-instance; consider keepalived VIPs for HA
parser-worker	Consumer lag	Scale on Redpanda consumer group lag
routing-worker	Consumer lag	One per tenant routing rule complexity; usually fewer than parsers
sink-connector	OpenSearch bulk queue depth	Match to OpenSearch write capacity

On Kubernetes, set HPAs on every workload component. Sane starting values for a 1 TB/day deployment: 3 api-servers, 2 ingest-gateways, 4 parser-workers, 2 routing-workers, 3 sink-connectors.

Backup & disaster recovery

Three things to back up, each on its own schedule:

PostgreSQL — daily pg_dump + continuous WAL archiving. Restore drills quarterly.
OpenSearch — daily snapshots to off-site object storage. Test a restore each release cycle.
Configuration — the env files + any custom parsers, alert rules, and SAML metadata. Keep in version control alongside your infra.

Redis is cache + ephemeral state — no backup needed.

Retention policies

Configured per tenant in Admin → Retention Policies. Each policy is a query filter + a TTL. Common pattern:

level:debug → 7 days
level:info → 30 days
level:error OR audit:* → 365 days (compliance window)

OpenSearch's ILM enforces the TTL by rolling indices to a delete phase. Compliance presets (HIPAA, SOC 2, PCI-DSS, FedRAMP) ship as templates.

Security & compliance

Authentication

Local accounts: bcrypt-hashed passwords, optional TOTP MFA
SAML 2.0: tested with Keycloak, Okta, Azure AD
OIDC: any conforming provider
API tokens: per-user, scoped, optional expiry

Authorization

RBAC with predefined roles (viewer, editor, admin) + custom roles
Per-tenant data isolation enforced at OpenSearch index level + API layer
Platform-admin role for global operations (tenant management, system config)

Data protection

PII redaction patterns applied at ingest (configurable regex + named patterns)
Audit trail in append-only Postgres table, mirrored to OpenSearch for search
Optional field-level encryption for highly sensitive log fields

Production checklist

☐ Strong DB password set; sslmode=require
☐ Reverse proxy with valid TLS in front of api-server
☐ SMTP configured for invite + alert emails
☐ Backups scripted (Postgres + OpenSearch) with off-site copy
☐ Retention policies defined per tenant
☐ SSO wired (SAML/OIDC) — disable local signup after bootstrap if compliance requires
☐ Resource limits set in compose/Helm
☐ Monitoring: Prometheus scraping /metrics on every service
☐ Alertmanager rules for AethonLog itself (capacity, lag, error rate)
☐ At least one DR drill completed

This is the marketing-tier reference. The full operational manual — every env var, every API endpoint, the complete aethonctl CLI, the parser DSL grammar, the alert rule reference, the Helm values schema — lives in the installable docs. After install, browse http://YOUR_SERVER:8080/docs/.