Ruminations of J.net idle rants and ramblings of a code monkey

Output Adapter –> Input Adapter Communications : Follow up

StreamInsight

Just a quite note to follow up on my previous post on Output Adapter –> Input Adapter Communications : Event Shapes – specifically about Edge output to Edge input scenarios. While this scenario works just fine in an ideal world, we all know that we don’t live in an ideal world. Instead, there are potential communication breakdowns between the StreamInsight servers from things like reboots (for whatever reason … like Windows Updates), network outages or – if you are using a unreliable protocol like UDP – dropped packets and messages. In an edge-to-edge scenario, it is possible for the hub StreamInsight server to get a start edge … but never an end. In this case, you have an event that is in the engine and participating in analysis, joins, unions, aggregates, etc. that is no longer valid. But … since the end never “came in”, you have no way of knowing that the event is no longer valid and its end date is, essentially, the end of time. Over a long-running process, this can build if you have several starts without a corresponding end. On the other end of the spectrum, you could get an end event without a corresponding start. StreamInsight won’t let you enqueue such a beastie – it will raise an exception – but the problem is deeper than that. As with the never-ending start, you’ll have data consistency issues. In this case, you rather than having an event that is part of your analysis, you are missing an event that should be a part of your analysis. Again, the result is that you have inconsistent output.

Some of this … particularly issues with reboots … can be handled, to some extent, with checkpointing and adapters that understand and properly handle high water marks. But there’s nothing that you can do about communications outages or dropped/undelivered packets.

Translating these incoming events to points simplifies these issues but doesn’t completely resolve all of them. If you enqueue a point on start, you can use the ToSignal() macro that’s in the LinqPad samples with a timeout of TimeSpan.MaxValue to get the same effect (Edge Start/End) in your output. And, while you can still have events living longer than they should, they will only live until you get an updated value for the item rather than living forever, which minimizes the impact and prevents orphaned starts from building up. Whether you are enqueuing only starts or only ends, you still may miss some events but that is a potential problem regardless of your event shapes.

So … the edge output to edge input scenario isn’t quite as simple as it appeared at first blush. In a test/lab scenario, it will usually work just fine, especially when following the “happy path”. However, there are other challenges that come into play in a real-world scenario where things go wrong and, with these challenges in mind, the edge-to-edge scenario is more challenging. At the end of it, point inputs, regardless of the source event shape, provide the simplest use case and present only those challenges that are due to the very nature of a distributed system. Using something like MSMQ for the transport would resolve a lot of this as well … but it comes at a (pretty significant) cost of throughput and latency.

Comments are closed