Last Updated: April 18, 2026 at 16:30

Event Sourcing in Microservices: Storing History Instead of State

A practical guide to how event sourcing works, why it improves auditability and debugging, and when the added complexity is worth it

Most systems store what is true now, but event sourcing captures how truth came to be by recording every change as an immutable event. This article explains how that shift enables precise debugging, full auditability, and the ability to replay system behaviour at any point in time. It also examines the real trade-offs: eventual consistency, projection complexity, and the operational burden most teams underestimate. The key insight is that in event-sourced systems, correctness is no longer about current state being accurate — it is about history being complete.

Image

Introduction to Event Sourcing

Event sourcing is an architectural pattern where every change to a system is recorded as an immutable sequence of events, rather than overwriting the current state. Instead of storing what is true now, the system stores how truth came to be — a complete, ordered history of decisions. This matters because when systems grow distributed and complex, the hardest problems are no longer about storing data, but about understanding what actually happened. Event sourcing provides that clarity by making history the source of truth, enabling precise debugging, full auditability, and the ability to reconstruct state at any point in time.

The failure that motivates everything

Imagine you are investigating a production incident.

An order shows as PAID in the order service. The inventory service shows stock RESERVED for that order. But the shipping service shows NOT CREATED. The customer is asking why their order has not shipped.

Everything looks valid in isolation. Each service's state is internally consistent. But globally, the system is broken. You cannot tell what actually happened.

You check the logs. You see scattered messages and timestamps, but you cannot reconstruct the sequence of events with certainty. Did payment happen before inventory reservation? Did the shipping service ever receive the order? Did it fail silently?

You realise something uncomfortable: you stored only the final states. The sequence of decisions that led here is gone.

When state diverges across services, you cannot reconstruct what actually happened. The history is lost. You are left with fragments and guesses.

This is the problem event sourcing is designed to solve.

The core reframe: state is a derived view

Traditional systems store current state. You have an orders table. When an order status changes, you update the row. The old value is gone. You know what is true now. You cannot know what was true at any other point in time unless you explicitly logged it separately.

Event sourcing takes a different approach.

Stop storing the current state. Store the history of decisions.

Every change to state is captured as an immutable event in an append-only log. The current state is not stored directly — it is derived by replaying those events.

This is the anchor insight: In traditional systems, correctness depends on the current state being right. In event-sourced systems, correctness depends on the history being complete.

Think of it this way. State is a snapshot — a single frame from a film. History is the entire movie. A snapshot tells you where things are. The movie tells you how they got there. When something goes wrong, the snapshot rarely tells you why. The movie shows you exactly what happened, step by step.

Most systems throw away the movie and keep a single frame. Event sourcing keeps the entire film — at the cost of having to replay it whenever you need to understand the present.

The vocabulary you need

Before going further, let us define the key terms precisely. These will appear throughout the article, so a clear definition now saves confusion later.

Event — An immutable fact about something that happened. OrderPlaced. PaymentCaptured. InventoryReserved. OrderShipped. Events are always written in past tense because they describe something that has already occurred. You cannot change an event after it is written.

Event store — An append-only log where events are stored. You can only add events. You never update or delete them. This log is the single source of truth.

Projection — A read model built by consuming events. Projections are shaped for specific query needs. An order summary screen uses a projection built from OrderPlaced and OrderShipped events, pre-computed and ready to query fast.

Command — An instruction to do something. PlaceOrder. AuthorizePayment. Commands are validated and, if accepted, produce events. Commands can fail. Events cannot — they have already happened.

Here is the definition to hold onto throughout: Every change to state is captured as an immutable event in an append-only log, and current state is derived by replaying those events.

A concrete look at how it works

Let us trace a real flow so the vocabulary connects to something tangible.

A user places an order. A PlaceOrder command arrives at the Order service. The service validates: does the customer exist? Are the items in stock? Is the payment method valid? If all checks pass, the service emits events.

Event 1: OrderPlaced { orderId: "abc", customerId: "123", items: [...], timestamp: ... }

Event 2: InventoryReserved { orderId: "abc", warehouseId: "W1", timestamp: ... }

Both events are appended to the event store. Nothing else is written. There is no orders table being updated. The write side is done.

Later, a projection consumes those events. It builds a denormalised read model — a table optimised for the order summary screen, with one row per order and all the fields the UI needs. When the customer loads their order history, the query hits this projection, not the event store.

When the customer support team needs the full history of a specific order, they replay that order's event stream:

OrderPlaced → status: PLACED

PaymentAuthorized → status: PAID

InventoryReserved → status: RESERVED

OrderShipped → status: SHIPPED

The current state is the final result of that replay. At any point, you can also ask: what did this order look like after the second event? Just stop the replay there.

The fundamental trade-off

Traditional systems optimise for easy writes. You update a row. You are done. Fast, simple, familiar.

Event sourcing optimises for perfect reconstruction. You append an event — that write is also fast, because appends are sequential. But the cost shifts to reads. To know current state, you replay events. To query efficiently, you build and maintain projections.

Event sourcing trades simplicity of mutation for determinism of history.

You can always reconstruct exactly what happened. You can replay the past. You can debug by re-running history in a development environment. But you pay for this power in operational complexity, storage growth, and projection management.

This is not a free lunch. Accept it only when the determinism of history is worth more to you than the simplicity of mutation.

Why event sourcing almost forces eventual consistency

The causal chain

Event store writes are immediate and strongly consistent. You append an event — that write either succeeds or fails, right now.

But reads depend on projections, and projections consume events asynchronously. They update read models in the background. There is always a delay — even if small — between an event being appended and a projection reflecting that event.

Therefore, reads are eventually consistent with writes. A user might place an order and immediately view their order list. The order might not appear yet because the projection has not caught up.

Why you cannot fully escape this

Even if you use the same database for events and projections, updating the projection is a separate write operation that happens after the event append. There is a window of inconsistency.

You could update projections synchronously within the same database transaction. In theory, this achieves consistency. In practice, at any meaningful scale, it destroys the performance benefits of event sourcing. Synchronous projections and high write throughput are in tension.

How to handle it in practice

You need explicit strategies rather than hoping users do not notice.

Read-your-own-writes: When a user writes something, ensure their immediate subsequent reads reflect it. One approach is to return the updated read model directly from the write operation. Another is to read from the write side of the database for that specific user, bypassing the projection temporarily.

UI hints: A message like "Your order has been placed. It will appear in your order history shortly" manages expectations honestly. Users understand small delays when they are acknowledged.

Polling or push updates: The UI polls for updates or holds an open connection. When the projection catches up, the UI updates automatically without a manual refresh.

Critical paths fall back to the event store: For operations where consistency is non-negotiable, bypass the projection and query the event store directly. Accept the performance cost for that specific path.

The honest truth

Eventual consistency is not a bug in event sourcing. It is a feature of the trade-off. You accept stale reads for higher write throughput and complete history. But you must design for it explicitly. You cannot ignore it and hope for the best.

Projections: where systems actually fail

The write side of event sourcing is easy. Appending events? Simple. Replaying them? Straightforward.

Projections are where difficulty lives — and where most event-sourced systems eventually struggle.

What a projection does

A projection listens for events and updates a read model — a data shape built for a specific query. For example, an order summary projection listens for OrderPlaced, PaymentAuthorized, and OrderShipped. It maintains a denormalised row per order with every field the summary screen needs. No joins at read time. Just fast queries.

Types of projections

Simple projections map events directly to database rows. An OrderPlaced event creates a row. An OrderShipped event updates the status on that row. Straightforward to implement.

Aggregating projections compute summaries. An hourly order volume projection listens for OrderPlaced events and increments counters per hour. More complex to build and rebuild correctly.

Search projections build and maintain search indexes. A product search projection listens for ProductCreated and ProductUpdated events and keeps the search index current.

Projection failure modes

What happens when a projection fails?

A projection processes events in order, one by one. If event #42 fails to process — maybe the data is malformed, maybe a database connection drops — the projection cannot move on to event #43. It's stuck. It cannot skip the failed event and continue, because that would create an incomplete read model.

So the projection stops. Completely.

The read model freezes at whatever state event #41 left it. New events keep arriving, piling up in a queue, but the projection never reaches them. Lag grows. The order summary screen shows data from six hours ago. Meanwhile, the rest of the system — the write side, other projections — carries on normally, unaware that this one projection is dead.

This is the danger: projections fail silently. No error reaches the end user. They just see stale data and don't know why.

What you need

Three capabilities are non-negotiable:

  1. Monitoring — Track projection lag. If a projection falls behind, alert immediately.
  2. Checkpoint restart — The projection must remember the last successfully processed event (the checkpoint). When you fix the issue, restart from there, not from the beginning.
  3. Rebuild from scratch — Sometimes you need to replay all events and rebuild the read model completely. This takes time. Design for it.

Without these, one bad event brings down your read model forever.

Rebuilding projections

You fix a bug in your projection. But the existing read model still has bad data from before the fix. What do you do?

Two options:

  1. Fix only new data — Future events are correct. Old data stays wrong. Usually not acceptable.
  2. Rebuild everything — Delete the bad read model. Replay every event from the beginning using the fixed code. This gives you a clean, correct model.

Most systems need option 2.

The problem: Replaying millions of events takes time — sometimes hours.

The solution in five steps:

  1. Don't delete the old model yet — Build the new one alongside it. Queries keep working during the rebuild.
  2. Use checkpoints — Track which event you last processed. If the rebuild crashes, resume from there instead of starting over.
  3. Process in batches — Handle 1,000 events at a time. This prevents memory issues and lets you see progress.
  4. Handle new events during rebuild — New events keep arriving. You can either pause them until rebuild finishes (simpler but causes downtime) or run both old and new projections side by side, then switch when done (complex but zero downtime).
  5. Test rebuilds regularly — Run a test rebuild once a week. Know how long it takes before something actually breaks.

The bottom line: A rebuild is not an emergency. It's normal maintenance. Design for it. Practice it.

Introducing a new projection

When you add a new projection, it needs to be populated with historical data. Two approaches work:

Replay from the event store — clean and complete, but slow for large event histories.

Backfill from current state — query the write database and insert into the projection. Faster, but you lose any intermediate states if the projection needs full history.

Choose based on whether your projection requires the complete history or only the current snapshot.

Event sourcing compared to CRUD and Change Data Capture

To make the trade-offs concrete, here is how three common approaches compare.

CRUD (Create, Read, Update, Delete) stores only current state. Once you update a row, the old value is gone. You cannot reconstruct history without separate audit logging. Complexity is low. Best for simple applications with no audit requirements and predictable scale.

Change Data Capture (CDC) captures changes from database transaction logs after they happen. You see that a column changed, but not the business intent behind the change. Moderate complexity. Best for streaming changes to other systems, analytics pipelines, or replication.

Event sourcing stores the intent behind every change, not just the fact that something changed. Complexity is high. Best for systems with audit requirements, complex business logic, temporal queries, or the need for deterministic replay.

The key distinction between CDC and event sourcing: CDC captures what changed. The database log shows that the order_status column changed from PLACED to PAID.

Event sourcing captures why it changed. The event store contains a PaymentAuthorized event recording the payment ID, amount, timestamp, and authorisation code — the business decision that caused the change.

Knowing what changed is useful. Knowing why it changed is transformative for audit, debugging, and business intelligence.

The hard parts about event sourcing

Let us be direct about the difficulties. Event sourcing introduces real operational complexity.

Event versioning

Events are immutable. Once stored, you cannot change them. But your understanding of events evolves. You might later realise that OrderPlaced should have included a discount code field.

Because existing events cannot be changed, you have two options. Create a versioned event — OrderPlacedV2 — that includes the new field. Or handle missing fields defensively in your processing code using default values. Both approaches work; both require discipline to maintain over time.

Schema evolution

This is the hardest long-term problem. Over months and years, event schemas will change. Fields are added, renamed, deprecated. Event types are split or merged.

The solution is backward compatibility. Consumers must handle multiple event versions.

Idempotency

Events may be delivered more than once. Message brokers retry on failure. Consumers restart. Network issues cause duplicates.

Your event consumers must be idempotent: processing the same event twice must produce the same result as processing it once. A practical approach is to store the ID of each processed event in a small table, and skip any event whose ID is already present. This sounds simple but becomes a discipline issue at scale — every consumer needs it, not just the ones you remember to handle carefully.

Snapshots

Replaying ten million events to reconstruct current state is not acceptable in a request-response API — it could take minutes. The solution is snapshots.

What is a snapshot?

A snapshot is a saved copy of an entity's state at a specific point in time, stored alongside the position in the event log.

Instead of replaying every event since the beginning of time, you do this:

  1. Load the latest snapshot (e.g., "order abc as of event #42,000")
  2. Replay only the events that happened after that snapshot (events #42,001 to #42,500)

If you snapshot every 1,000 events, reconstruction loads one snapshot and replays at most 1,000 events — fast enough for any practical use case.

How often should you snapshot?

You have three options:

  1. Every N events — Simple and predictable. Snapshot after every 1,000 or 5,000 events.
  2. Every N minutes — Good for entities that are active in bursts.
  3. At specific milestones — For example, after an order is delivered (no more events will ever arrive for that order, so snapshot once and forget it).

The trade-off: More snapshots = faster reads + more storage. Fewer snapshots = slower reads + less storage. Tune based on your event volume and how fast your reads need to be.

The human cost

Event sourcing is not intuitive for most developers. The mental model is genuinely different from what most engineers learn. New team members will struggle. Debugging requires understanding the event stream, not just inspecting a database row. A team that does not deeply understand event sourcing will drift toward anti-patterns without realising it.

When NOT to use event sourcing

This section is as important as everything above. Event sourcing is genuinely the wrong choice in many common situations.

Do not use it for simple CRUD systems. If your application is mostly creating, reading, updating, and deleting records with no complex business logic and no audit requirements, event sourcing adds enormous complexity for almost no benefit. A traditional database is the right tool.

Do not use it without genuine history requirements. If you do not need to know what happened at every point in time, if you have no need to replay past states, if there is no compliance or audit obligation — event sourcing is overkill.

Do not use it at low scale. If your write volume is handled easily by a traditional database and you are not experiencing lock contention, event sourcing solves problems you do not have.

Do not introduce it to a team unfamiliar with the pattern. The learning curve is steep. The mistakes are costly and often subtle. Gain experience with simpler patterns first.

Do not use it as your first microservices pattern. If you are new to microservices, add event sourcing only when you have concrete, identified requirements that it addresses.

A simple test to apply: "Do I need to know what happened, or only what is?"

If you only need what currently is, do not use event sourcing. If you need what happened — for audit, debugging, compliance, or temporal queries — event sourcing may be the right fit. Then ask a second question: "Can my team sustain the complexity?" Both questions need a confident yes before you proceed.

The mental model to carry forward

Most systems store what is true now. Event-sourced systems store how truth came to be. And when systems fail, it is not the current state you need — it is the history.

Events are facts. Facts are immutable. Immutability is the foundation of trust.

Start simple. Introduce event sourcing only where you have identified requirements for history. Evolve gradually rather than applying it system-wide from the start.

Summary

The core reframe: State is a derived view. Events are the source of truth. Correctness depends on history being complete, not current state being correct.

The fundamental trade-off: Event sourcing trades simplicity of mutation for determinism of history. Traditional systems optimise for easy writes. Event sourcing optimises for perfect reconstruction.

Eventual consistency is unavoidable in practice: Writes are immediate. Reads depend on asynchronous projections that lag. Design explicitly for this with read-your-own-writes patterns, UI hints, and fallbacks.

What it removes and introduces: Event sourcing removes synchronous coordination between services but introduces temporal coordination constraints. You trade blocking for ordering requirements.

Projections are a first-class engineering concern: The event store is easy. Projections are where systems fail. Lag monitoring, rebuild strategies, and backfill planning are not optional.

CDC versus event sourcing: CDC tells you what changed. Event sourcing tells you why it changed. That distinction matters for audit, compliance, and debugging.

When to say no: Simple CRUD, no history requirements, low scale, unfamiliar teams, and early-stage microservices projects are all strong signals to choose a simpler approach.

N

About N Sharma

Lead Architect at StackAndSystem

N Sharma is a technologist with over 28 years of experience in software engineering, system architecture, and technology consulting. He holds a Bachelor’s degree in Engineering, a DBF, and an MBA. His work focuses on research-driven technology education—explaining software architecture, system design, and development practices through structured tutorials designed to help engineers build reliable, scalable systems.

Disclaimer

This article is for educational purposes only. Assistance from AI-powered generative tools was taken to format and improve language flow. While we strive for accuracy, this content may contain errors or omissions and should be independently verified.

Event Sourcing in Microservices Explained: How It Works, Trade-offs, E...