In regulated systems — insurance, healthcare, FinTech — the audit log isn't a debugging tool. It's a product. Treat it accordingly, or pay for it later when a regulator asks a question you can't answer.
Most systems treat audit logging the way most teams treat documentation: as something they'll get to later, after the "real" work is done. Both attitudes produce the same result — the artifact that matters most when things go sideways is the one that's least usable.
What "audit trail" actually means
An audit trail isn't your application log. It's a separate, structured event stream that records:
- Who performed an action — including impersonation context if a support engineer was acting on behalf of a customer.
- What they did — read, created, modified, deleted, exported, transmitted.
- When — to the second, with timezone unambiguous.
- Where — IP address, device, application surface (API, web, mobile).
- Why — when relevant, the business reason recorded as part of the action (e.g., "claim adjustment per ticket #1234").
- What changed — old value and new value for any modification.
If your audit log can't answer all six questions for a given event, it's not an audit log. It's a debug log with delusions of grandeur.
Three common mistakes
Mistake 1: Mutable logs. If you can DELETE or UPDATE rows in your audit table, it's not an audit. The simplest fix: append-only by convention, enforced by database permissions (no DELETE/UPDATE grant for the application user) and verified by automated checks. Stronger fix: write to an append-only store like an event sourcing log or an immutable database.
Mistake 2: Missing business context. "User 7392 modified record 14182" is debug-level information. "Claims adjuster Maria modified policy MK-2026-0048's coverage limit from €50,000 to €100,000 per change request CR-1241" is auditable. The difference is whether the log captures the business event, not just the technical operation.
Mistake 3: No retention policy. Audit logs that get pruned at the wrong time, or kept forever in expensive storage, both fail. The right retention is dictated by the regulation: 5-10 years for accounting-adjacent records under Romanian law, longer for some healthcare records. Design for the longest applicable horizon and budget accordingly.
Three layers
We split logging in regulated systems into three streams, each with different retention and access:
- Application log. Debug traces, errors, performance metrics. High volume, short retention (30-90 days), accessible to all engineers.
- Business event log. Domain events: orders placed, claims approved, prescriptions issued. Medium volume, multi-year retention, accessible to product analytics.
- Security audit log. Access to personal data, privilege changes, exports, deletions. Lower volume, longest retention, restricted access — and itself audited.
The mistake is conflating these. When the security audit log lives in the same place as the application log, retention and access are wrong for one or the other.
Storage choices
For the security audit log specifically, you want:
- Append-only by enforcement, not convention. No DELETE or UPDATE permissions on the table for the application user. Some teams use a managed append-only service (DynamoDB streams, Azure Event Hubs with capture) for this.
- Cryptographic chaining for the highest-stakes systems — each event's hash includes the previous event's hash, so tampering is detectable. Overkill for most cases, but standard in some financial systems.
- Separate access controls. The application engineers who write to the log shouldn't have read access to it. The compliance team that reads it shouldn't have write access. Each privilege escalation to access the log is itself an audit event.
- Backup and integrity verification. Back up the audit log to a separate system. Verify the backup integrity quarterly. If you can't reconstruct the audit trail from your backups, you don't have an audit trail.
Querying patterns
The audit log gets queried in three modes, each with different performance needs:
- Per-record history. "Show me everything that happened to patient X." Index by record_id.
- Per-user activity. "Show me everything user Y did this month." Index by actor_id + timestamp.
- Per-event-type aggregation. "How many claim approvals happened last quarter?" This is analytics, not strict audit — but the data lives in the same place.
If your audit log can answer all three within seconds, you've designed it well. If "show me everything that happened to this person" requires running a 20-minute query, you'll regret it during the first regulator request.
Building it from day one is cheap
Adding proper audit logging to a system that wasn't designed for it costs roughly 4-8 weeks of engineering work plus risk of missing edge cases. Building it in from day one costs maybe 2-3 days at the start of the project plus a discipline of "every domain event emits an audit event."
The cheapest moment to build audit infrastructure is when you have one event type. Each new event type added later is a small extension. By the time your system has 50 event types and no audit infrastructure, you have a 6-month retrofit ahead of you.
What you can take from this
If you're building or auditing a regulated system, the audit log deserves the same design rigor as the user-facing product:
- Decide who, what, when, where, why, and what-changed before writing the first line of code.
- Make it append-only by enforcement.
- Capture business context, not just technical operations.
- Plan retention for the longest applicable regulatory horizon.
- Audit access to the audit log.
It's not glamorous. Nobody demos audit logs at sales calls. But when a regulator or a customer asks "who did this?", it's the only artifact that matters.
