Protoshell and Thriftshell update

As part of FsShelter development I had to implement the multilang serilizers for Storm, originally building against then current Storm 0.10.0.

I’ve started with Thrift, thinking that since its already a part of Storm runtime it would make the adoption easier compared to protobuf and given similar  characteristics the downgrade in performance would not be noticeable. After some testing it turned out that Thrift performed nearly at the speed of JSON (better with some payloads, worse with others), which might require some explanation.

Unlike monolithic protobuf, Thrift has a pluggable model for pretty much everything. So when people say Thrift they should qualify at least two things: Transport and Protocol. Thirft looks comparable to protobuf only when Compact protocol is used. Compact however has a caveat, it doesn’t work with streaming transports, unless you implement custom framing logic to achieve something similar to protobuf’s ParseDelimitedFrom functionality. And Storm is all about streaming, which is why I’m deprecating the support for Thrift. Unless someone wants to maintain it I’ll be removing Thrift support from the future releases of FsShelter.

Protoshell on the other hand gets an update – Storm 1.0 has been released and some packages have been renamed. The new 1.0.1 release of Protoshell is now available and has been tested to work with latest Storm, so now FsShelter can benefit from massive performance improvements made in Storm.

FsShelter does not require a new build to benefit from this release. All one needs to start running FsShelter components against Storm 1.0.1 is the new server-side serializer implementation, which can be referenced directly from github as a paket dependency and included with the topology for deployment.

Protoshell and Thriftshell update

An easy way to try FsShelter

Thanks to docker, trying something out w/o having to figure out all the dependencies and pollute your system has become really easy. This is how I started with Storm and this is the way we now help others to try FsShelter as well – fsshelter-samples container.

The container includes an installation of Storm, Mono, F# and a pre-built clone of FsShelter repo. Original build of Mono (4.2.1) caused processes to crash now and then and was an interesting study in how Storm deals with failures and what it means for processing guarantees. Current version (4.2.3) runs solid and may deprive you from witnessing Storm restarting all the components… you may have to crash them yourself 🙂

An easy way to try FsShelter

FsShelter: a Storm shell for F#

About a year ago Prolucid adopted Apache Storm as our platform of choice for event stream processing and F# as our language of choice for all of our “cloud” development.

FsStorm was an essential part that let us iterate, scale and deliver quickly, but even from the earliest days it was obvious that the developer experience could be improved. Unfortunately, it meant a complete rewrite of FsStorm:

  • FsStorm DSL is a really thin layer on top of Nimbus API model:
    • has explicit IDs when describing components in a topology
    • uses strings in all the names
    • matching of inputs/outputs is not guaranteed
  • FsStorm uses Json AST as it’s public API:
    • messages, tuples, configuration
    • serialization mechanism is hard-baked into the API

We’ve worked around some of the problems, usually by writing more code.

It actually makes sense that Storm itself doesn’t care about the type of the tuples/fields. It runs on JVM, which is very much typed, and it relies on sub-class polymorphism to make things tick. However the public API for the tuples looks like an afterthought in every language. But we figured, there is this “compiler” that can do “type checking” for us, let’s make it work! Maybe we can even make it faster if we replace Json with Protobuf?

Coming up with the new DSL that would allow the components to consume and emit tuples of various (static) types on multiple streams was an interesting experience and led to some strange places. A lot has been written on F# DSLs, but none of that applied directly. Can I use “just functions”? Do I need a type provider? A computation expression? A compiler as a service?

logo

After a few false starts I found the desired paradigm that could be expressed in F# succinctly. As it usually happens, once I gave up on certain notions (building “any purpose” graph from a single source in this case), the result was pretty simple. And so, after a few weeks of journey and discovery, we are releasing FsShelter: a way to program Storm with F# in a statically typed fashion.

Many thanks to Tomas Petricek, Scott Wlaschin, Andrew Cherry and Erik Tsarpalis, without them FsShelter wouldn’t have been possible.

FsShelter is currently in beta and any feedback is welcome and appreciated.

FsShelter: a Storm shell for F#

Real-time analytics with Apache Storm – now in F#

Over the past several month I’ve been prototyping various aspects of  an IoT platform – or more specifically, exploring the concerns of “soft” real-time handling of communications with potentially hundreds of thousands of devices.

Up to this point, being in .NET ecosystem I’ve been building distributed solutions with a most excellent lightweight ESB – MassTransit, but for IoT we wanted to be a little closer to the wire. Starting with the clean slate and having discovered Apache Storm and Nathan’s presentation I realized that it addresses exactly the challenges we have.

It appears to be the ultimate reactive microservices platform for lambda architecture: it is fairly simple, fault tolerant overall, yet embracing fire-n-forget and “let it fail” on the component level.

While Storm favours JDK for development, has extensive component support for Java developers and heavily optimizes for JRE components execution, it also supports “shell” components via its multilang protocol. Which is what, unlike Spark makes it interesting for a .NET developer.

Looking for a .NET library to implement Storm components there’s the Microsoft’s implementation – unfortunately components in C# end up looking rather verbose and it happens to work exclusively with HDInsight/Azure, which is a deal breaker for us, as we want our customers to be able to run it anywhere. Fortunately though, further search revealed recently open-sourced FsStorm announced on Faisal’s blog and I liked it at first sight: concise F# syntax for components and the DSL for defining topologies makes authoring with it a simple and enjoyable process.

The FsStorm components could be just a couple of lines of F#, mostly statically verified, have clear lifecycle and easy to grasp concurrency story. And with F# enjoying 1st class support on Mono, we are able to run Storm components effectively on both dev Windows boxes and distributed Linux clusters while capitalizing on productivity and the wealth of .NET ecosystem.

It is now available under FsStorm umbrella as a NuGet package, with CI, a gitter chatroom and a bit of documentation.

While still in its early days, with significant changes on the horizon – something I want to tackle soon is static schema definitions for streams and pluggable serialization with Protobuf by default, I believe it is ready for production, so go forth and “fork me on GitHub”!

Real-time analytics with Apache Storm – now in F#