Event Sourcing = everything that happens is an event. Save all the events and yo...

BlackFly · on Feb 4, 2019

> Save all the events and you can always get to the latest state

This isn't actually true and good event sourcing guides will point out why this isn't true. Event based systems are naturally racey, on a rerun of a stream of events the order in which the events is processed may change and you might therefore get a different result than the first time.

For example, if you have 1 item remaining in inventory but two people attempt to purchase it at the same time, that is a natural race and only 1 of them can succeed. If you are operating at the sort of scale where you require this kind of horizontal scaling, you may be in the territory where these sorts of conflicts become common.

erpellan · on Feb 4, 2019

Event systems are not racey. An event is historical and immutable. The event source is a list of things that categorically happened. Commands can race each other, but the outcome (OrderPlaced vs OrderFailedBecauseItemOutOfStock) is definitive.

A common confusion is between commands (which can fail) and events (which have already happened). Replaying events should always result in the same state because they only reflect things that actually happened.

You can't replay commands because they have side effects.

BlackFly · on Feb 4, 2019

They kind of go hand in hand so I find it a bit amusing to make the distinction and say that therefore event sourcing isn't racey. Fine, event sourcing by itself isn't, but issuing the events is racey if anything listening to the events issues commands.

I am quite confident every implementation of event sourcing is going to couple commands to the event stream unless it is merely making an audit log. In which case I guess it isn't really an architecture so much as an add on.

You will replay commands as part of testing and debugging and that is where the fun comes in with race conditions.

dtech · on Feb 4, 2019

There should be 1 point in the system that said "user x has ordered item a". That then becomes an event. The command from user x and y to order item a might be a race condition to see which command gets processed first, but once it has been turned into an event it's immutable and without race conditions. You should never replay commands.

BlackFly · on Feb 4, 2019

Event systems do not require deduplication and/or strong ordering on the ingress. You can also solve the problem on the consumption side and there are often reasons to do this.

Plus you should be more forgiving of examples. Let us strengthen the example by saying that the inventory system lied and told us there were 2 items when there was only one. So you issue 2 commands, but when the events arrive at the fully automated warehouse, one randomly receives a null pointer exception and auto-heals into a refund, but which one?

mattknox · on Feb 5, 2019

Here, you should have 2 orders issued events, one order_succeded, and one order_failed (hopefully not with an actual NPE, that would be pretty bad). So much like an SSTable-based system, there are race conditions at the level of deciding what to write to the logs/event stream, but should not be race conditions in replaying the event stream (the same customer would get a failure every time).

bsaul · on Feb 4, 2019

Never done event sourcing architecture, but i guess this kind of subtelty is what makes a system a nightmare or a breeze to work with.

Which makes me wonder if there are some kind of exercise book to practice « good architectural design » , just like with algorithms.

koliber · on Feb 4, 2019

Preventing race conditions in message-queue based architectures is a neat challenge. Depending on your problem domain, there are some approaches that work really well. It is likely that these don't work for all cases though.

The approach I've seen work well is to divide your messages into categories, such that events that belong to the same category MUST be handles in order, while events in different categories can be handled in any order.

Say you have ordered updates to USERs. Say you have 10 event processors. Each processors handles events for 1/10th of the users in your system. You come up with a way of partitioning your users: user_id % 10 for example. For any given user instance, only ONE processor would process ALL the events for that user. They would be processed in order based on some ordering key (i.e. time stamp, event ID).

What is tricky is that this partitioning requires first-order attention. If it gets messed up, everything crumbles and you are in for long nights and working weekends. When it works, you have a hugely efficient event processing machine that can handle anything the pipe (message queue) can throw at it.

photonios · on Feb 4, 2019

We dealt with a similar problem where I work.

At first we had ideas to do exactly what you described. We're using RabbitMQ, so partitioning is not built-in like with Kafka. We did some experiments with RabbitMQ's consistent hash exchange. Although the plugin does properly partition, we'd need to build our own stuff for rebalancing the partitions over workers as workers join and leave the pool, which happens often in our case because we sometimes process massive spikes and automatically scale up.

In the end we went with something more stupid. Each event gets a timestamp and we acquire a lock in Redis when we start processing the event. If the lock is already acquired, the event is rejected and put back into the queue with a delay. Before processing an event (and after acquiring the lock) we make sure that the timestamp of the event is newer than what we already processed. If it's older, we drop the event.

This approach trades some efficiency for correctness. Depending on your problem domain, this can work very well.

HelloNurse · on Feb 5, 2019

There is only a request, no "customer X buys the last item" event unless the system proves that the item is actually available (and other conditions). An easy way is putting requests in a queue, to ensure one of the buying attempts comes after the other and sees the item unavailable.

manigandham · on Feb 4, 2019

Life has race conditions. You'll need these transactions somewhere in any kind of application to handle these situations.

Event-sourcing just provides more control and explicit ways to handle it since it's built around a stream of events. You can leave it to the messaging layer which can timestamp the messages, or use explicit partitions with sequences and strict ordering, or have consumers process messages with external atomic transactions in the database. What works depends on what you need.

jsmeaton · on Feb 4, 2019

What about when event structures change? Now you’re having to push versions into your events and keeping every version of your serialisation format.

Redux often does not have to keep track of versions, because the event stream is consistent for that session.

manigandham · on Feb 4, 2019

The most common approach is to add versions to the events. The good thing is that with event sourcing, the exact cutover and lifetimes of these schema versions can be known (and even recorded as events themselves).

Downstream apps and consumers that don't need to be compatible with the entire timeline can then migrate code over time and only deal with the latest version. You have to deal with schemas anytime you have distributed communications anyway, but event sourcing provides a framework for explicit communication and this is one area where it can make things easier.

tumetab1 · on Feb 4, 2019

For me an issue, not usually made explicit, is that those benefits seem to require a pretty stable business and a stable application landscape.

Why? Because a changing business requires changes to the Events. Since a change in an Event requires that all consumers to be updated, immediately or at the schema version deprecation, the cost of change seems to increase faster in comparison to an application landscape without event sourcing. At the same time, the cost to know who needs to be changed also grows since "any" application can consume any Event.

An stable application landscape also seems to be required because if the number of an Event consumer grows quickly the ability to update, and deprecated schemas, seems to be related with the number of Event consumers (which require update).

taude · on Feb 5, 2019

If your org is anything like mine, mostly things (data) are "additive" onto the existing structure. When you want to deprecate something, you can notify all the consumers like a thirst party would, if you were going to change something enough. But this later happens much rarer for us, though tends to leave traces of technical debt....

dragonwriter · on Feb 4, 2019

> What about when event structures change? Now you’re having to push versions into your events and keeping every version of your serialisation format.

Sure. Events are like filled out forms, and there is a reason forms that are significant, modified over time, and where determining meaning from historical forms is a expected business need tend be versioned (often by form publication date). If you ever need to reconstruct/audit history (whatever your data model), you need a log of what happened when that faithfully matches what was recorded at the time, and you need definitions of the semantics as to how the aggregation of all that should map to a final state. Event sourcing is a pretty direct reification of that need, and, yes, versioning is part of what is needed to make that work.

tomnipotent · on Feb 4, 2019

> Now you’re having to push versions into your events

Most distributed systems already do this, especially if they use Thrift/Protobuf etc. It's par for the course.

viraptor · on Feb 4, 2019

The protocol version only handles the version of the structure. It does not change when you change the meaning of the field. For example "this is still foo-index, but from now on it's an index in system X, not system Y". (Yeah, bad example, you could change the field name here, but sometimes it's not that clear)

ouro · on Feb 5, 2019

I have actually written a book on this.

Its not that hard to deal with. Dealing with it through ad-hoc "we will come up with something as we go along" can be a bit of a pain though. It is in fact fairly trivial to handle if some thought is put into it.

fmjrey · on Feb 5, 2019

Any reference to the book you wrote?

gtramont · on Feb 6, 2019

https://leanpub.com/esversioning

eropple · on Feb 4, 2019

Typically you write an in-place migration for every version, yes. Or you snapshot your working-copy database and archive the event stream up to that point, and you play it back using the appropriate version of the processing code.

It kinda sucks and there isn't a great answer. But there are a lot of use cases where there are real benefits to having that log go back to the start.

mvc · on Feb 5, 2019

There is no easy way (in any platform as far as I'm aware), of discovering and updating all clients that depend on some data structure if you plan to change the data structure in an incompatible way.

Better to "grow" your data structures than "change" them.