About the author

J Sawyer is a developer based in Houston, TX who absolutely loves to write code. After spending 9 years at Microsoft, he moved on to other things and is currently the Lead Developer for the RealTime Data Management team at Logica US. He spends his days building Really Cool Things around StreamInsight and having a blast doing it.

He has been involved with HDNUG, one of the oldest and largest .NET-focused user groups in the US, since its inception in 2001 and has watched it grow from 5-10 technologists meeting around a conference table to a thriving community of over 5000 with regular meeting attendance averaging 100 attendees. He currently serves as the Vice President. You can join him at HDNUG on the second Thursday of every month at the Houston Microsoft office.

He also loves to ride his Yamaha FZ1. And sometimes his Ninja 650. And also his Honday XR-400 dirt bike. But he doesn't code and ride at the same time. That would be bad.

Output Adapter –> Input Adapter Communications : Follow up

December 22, 2011 12:41 PM

Just a quite note to follow up on my previous post on Output Adapter –> Input Adapter Communications : Event Shapes – specifically about Edge output to Edge input scenarios. While this scenario works just fine in an ideal world, we all know that we don’t live in an ideal world. Instead, there are potential communication breakdowns between the StreamInsight servers from things like reboots (for whatever reason … like Windows Updates), network outages or – if you are using a unreliable protocol like UDP – dropped packets and messages. In an edge-to-edge scenario, it is possible for the hub StreamInsight server to get a start edge … but never an end. In this case, you have an event that is in the engine and participating in analysis, joins, unions, aggregates, etc. that is no longer valid. But … since the end never “came in”, you have no way of knowing that the event is no longer valid and its end date is, essentially, the end of time. Over a long-running process, this can build if you have several starts without a corresponding end. On the other end of the spectrum, you could get an end event without a corresponding start. StreamInsight won’t let you enqueue such a beastie – it will raise an exception – but the problem is deeper than that. As with the never-ending start, you’ll have data consistency issues. In this case, you rather than having an event that is part of your analysis, you are missing an event that should be a part of your analysis. Again, the result is that you have inconsistent output.

Some of this … particularly issues with reboots … can be handled, to some extent, with checkpointing and adapters that understand and properly handle high water marks. But there’s nothing that you can do about communications outages or dropped/undelivered packets.

Translating these incoming events to points simplifies these issues but doesn’t completely resolve all of them. If you enqueue a point on start, you can use the ToSignal() macro that’s in the LinqPad samples with a timeout of TimeSpan.MaxValue to get the same effect (Edge Start/End) in your output. And, while you can still have events living longer than they should, they will only live until you get an updated value for the item rather than living forever, which minimizes the impact and prevents orphaned starts from building up. Whether you are enqueuing only starts or only ends, you still may miss some events but that is a potential problem regardless of your event shapes.

So … the edge output to edge input scenario isn’t quite as simple as it appeared at first blush. In a test/lab scenario, it will usually work just fine, especially when following the “happy path”. However, there are other challenges that come into play in a real-world scenario where things go wrong and, with these challenges in mind, the edge-to-edge scenario is more challenging. At the end of it, point inputs, regardless of the source event shape, provide the simplest use case and present only those challenges that are due to the very nature of a distributed system. Using something like MSMQ for the transport would resolve a lot of this as well … but it comes at a (pretty significant) cost of throughput and latency.

Tags:

StreamInsight