A few years ago I gave birth to the validation framework that I felt was uniquely capable of addressing complex validation scenarios I’ve come across. The following is a brief insight into how it came to be and where it’s at.
If you’re familiar with DDD, you’ll remember the pattern for expressing business rules via specifications. While simple and powerful, I found it to be insufficient: what we want to know in the real world scenarios is not only if something satisfies a given criteria, but also what does it mean if it does or doesn’t.
Here’s an example, given an entity E you need to ensure that a field (X) is of certain length. You can have a rule, here expressed as the function: e => e.X.Length < MAX_LENGTH.
Great you’ve got the logic, nicely encapsulated inside your domain. It can be referenced by name and it’s testable. Now what? What are we going to do with the fact that the rule evaluated to false?
What all of it lacks is context. What is the context of this evaluation? Why are we doing this? To throw an exception? No, we need more information even if we throw an exception and even having composable rules (named Specifications by Evans) isn’t going to help us tell the user what do we think is wrong.
Here’s an example of the context: a user is entering some text in a webform into the field X. Joel Spolski will have a field day ridiculing your UX, but you want to tell the user that what he’s entering is too lengthy.
Another example using the same rule but in different context: your API is called to import list of Es and you want to collect all the errors to respond with once you’re done processing the batch.
We most certainly want to reuse the rule, but what we do with the outcomes of the evaluation is completely different. How about an entirely different UI? Or an another application that makes use of your domain internally? They may all want to do something different with the outcome!
I dislike the idea of validation as a cross-cutting concern (addressed with attributes) for the same reason – it makes no effort to incorporate the context.
Here’s another example of the context: Say the entity E contains two complex properties of the same type: A1 and A2 of type A, which contains property B. And your rule is to permit value b1 for A1.B property and b2 for A2.B property. You see the context (it’s the overall path to the property B), but your attributes – don’t. It’s very hard to make sense of things looking from inside out.
This leads me to my proposition: The validation is best performed and described from the outside, looking in.
Everything is easier as you have the access to all sorts of contexts: use-case, user, dependencies, etc. It’s easier to implement and it’s easier to understand.
So what do we need, other than the rules?
We’ll need to redefine what Specification is. We’ll use Specifications to associate Rules with outcomes.
We’ll also need to express Interpretations of the rules in a manner that would let us carry out contextual parametrization (such as reporting the offending value, constraints of the rule itself, localization, etc).
For the maximum flexibility we’ll need factories of Interpretations and Actions to carry out given an outcome, enter: Bindings.
The framework was very successful and ended up used in several Novell products (of PlateSpin fame) and I’ve decided to do an open-source rewrite. That’s all there is to the microframework I posted on github.
At the moment, I’ve got the basics setup: the build, the tests and the major components. The library is .net portable, so I’d like to have samples in MVC, WPF, Silverlight and WP.
No matter how fast and vast your infrastructure, at some point you find that it’s not enough or it’s not used efficiently. The messages make sense and the endpoints are configured in optimal way, the problem becomes the updates themselves – it’s expensive to carry out many little updates to the state w/o sacrificing the ACID properties.
No matter how fast an endpoint can process the messages, it’s possible to get more messages than you can process - the messages start to pile up. It’s OK if you know a reprieve is coming, but what if it’s not?
Batching to the rescue – we have to make effective use of the bandwidth we have on any given endpoint and some times it means repackaging the messages into something that carries the summarized bulk down for the actual updates.
The approach we’ve found simple and effective is to use some kind of persistent Event Store to keep all the message information, then, based on a trigger (timer or otherwise) group and aggregate the events before proceeding with the domain model updates.
We freed up the endpoint and the storage and avoided introducing some kind of conflict resolution inherent to eventual consistency models for dealing with updates at scale.
Having built a system in Event-Driven SOA fashion I’ve come to realize that the moniker applied to this style of architecture actually misses the point.
I’d argue that the focus of such an architecture shouldn’t be services. As a case might be, only some of the actors involved would actually be Services, but you’ll also have Distributors, Workers, Sagas etc. What is important distinguishing characteristic in each case is the type of message these actors are meant to handle.
Should a message be a Notification, a Command or something else?
When I had to work with FIX to implement a trading system I didn’t really appreciate how well the protocol captures the ideas of Event-Driven architecture in its message types. Now I do – well-designed messages are the primary outcome of such an architectural approach.
With NuGet finally bringing .NET dependency management into 21st century, an outstanding problem remaining for anyone who does custom builds of dependencies is that of storing them.
I like my dependencies resolved at build time, so that they don’t burden my source code repository. Which is fine, it’s easy to include custom steps to bring them in. However if I have to modify a dependency, which happens fairly often with open source, but remain in the NuGet world of managing it, I have a problem – I can’t upload this dependency to the public NuGet gallery (wouldn’t make sense even if I could).
Custom NuGet gallery to the rescue!
With NuGet gallery being an open-source we were able to setup a custom build that allows updating the packages. This allows us to keep the version number of the dependency while changing anything we need in it, which is a big deal when dealing with modern day dependency trees.
The workflow we use:
- Disable public gallery
- Temporarely enable public gallery to bring in a dependecy
- Push it to the private gallery
- Modify the package if needed and update the private gallery
- Enjoy your unbroken dependency chain
I wonder how other people are doing this…
I see sometimes debates sparking over smart UIs and where domain logic should go. My view has always been that your domain logic is everywhere. User experience is as much part of your domain as logical structure of your database. But what I noticed is there’s an interesting fluidity that work on Scalability exposes: certain domain logic is sometimes a Presentation concern, sometimes – Application Services and sometimes – both.
A good counter-example is validation: you really want to give the user feedback early and often, is it a Presentation concern then? Where do error messages get defined? The answer in both cases – it belongs in Domain layer, but it has to be sufficiently abstract to allow for flexibility that Presentation requires. This I see done wrong all the time, with fancy frameworks and lot of pride taken in doing it, but that’s for another post.
Going back to the original point, Scalability requires that you do as much work as possible when the user is not looking. That tends to shift a lot of what normally would go in Presentation into Application Services layer and a different tier. That means having a separate “bounded context” within your domain layer to keep the state for Presentation layer for immediate and fast consumption.
This is not unexpected, but you don’t think about it until you have to serve a lot of domain logic quickly.
So the answer is – it’s everywhere, with different aspects captured in different layers.
Over the years since finding out about Domain-Driven Design I went through several iterations of grokking it. I’ve also seen other people completely miss the point of “tackling the complexity at the heart”, for example:
I see others going through the same mistakes:
- I’ve dismissed it as nothing but rehashing of OOD principles
- I’ve overdone “IDs are the impedance artifacts, an instance reference is my ID”
- I’ve overdone “everything is an object therefore every class will manage its own behavior”
- I’ve modeled my entities and called it my domain model
I think Evan’s book is partly to blame, the man just wasn’t a writer. Partially, the reason is that what we seek is like Plato’s cave shadows – we may be able to glean a shape of the ultimate form, but it’s only an approximation of the real thing, lacking the detail.
Here’s my current understanding:
- It’s the principles!
We use patterns and principles in our solutions all the time. OOD is great, but it is too abstract to be of any value by itself. SOLID is a good starting set, but it doesn’t go into domain modelling. There are lots of other principles that are suitable in one case or another. DDD is just like that, it’s a way to capture desired characteristics of your problem domain in your code. Bringing any kind of dependency, especially a large framework into the domain model deserves a frown. The focus of the modelling, the principles applied, will come at a cost and will be wasted if something is sacrificed for a framework.
- It’s about the code!
We have the models everywhere, some of them are for communicating to the database, some – to communicate to/from external services and some are to talk to people. DDD is about the code at the core of the problem you are working to solve. There maybe a need for a different model for each of the purposes. In CQRS, for example, you are expected to have two separate models within the same app!
- It’s about modelling the behavior!
Which leads me to the ultimate purpose of the domain modelling: capturing the behavior as close to the way business treats it as possible. Have you heard about coding dojos where you have to try and solve a problem w/o ever using the setters? Well, it’s kinda like that, only with a real purpose. The immutability, for example, is of paramount importance when modelling the domain, because that’s one of the few ways to express the meaning and the intent about particular interaction.
- It’s about testability!
Presumably, there’s a significant cost attached to a model that permits a wrong behavior. Keep the model small, abstract and inject all the infrastructure. We shouldn’t need a complex setup to test a theory or modify a behavior and we should test it continuously. The model and the Behavior-Driven tests is our documentation, it’s for the coder to capture and to understand the requirements.
One of the things that becomes obvious when tackling Scalability is that calculating certain things on the fly takes too long to be practical. Another obvious thing is that the logic that deals with data needs to be executed somewhere close to the data.
By structuring the data in such a way that we can hold on to the results of the calculation we can take advantage of the cloud processing capabilities we have on the backend. We end up with copies of many things, but by partitioning the data into Aggregates we are free to modify any bits w/o any locking issues. It also opens the doors to further distribution – if you have your own copy, it doesn’t matter where you work on it. The interested parties, such as UI, will eventually become consistent all along returning a cached copy of data.
Introducing copies of data means we need to know when to update them. By communicating via messages that represent domain events taking place, we let our services work within their narrow scope with their own copy of the data. Once they modify their little part of the domain, all they have to do is notify the parties that depend on it with particulars of what was done.
Push notifications for UI
UI becomes just another publisher and subscriber of business events, triggering the changes and minimizing the reads. The delays between a change taking place and the UI reflecting it has to be kept an eye on, but by computer standards humans are slow. We read and write at glacial pace and while computers carry out all this eventing, processing and copying of the data, a human would barely make a click or two.
Taking a lot of data in and promising to get back with the results via asynchronous means is another thing made possible once you embrace fire-and-forget methods of communication. By looking at a batch, we can employ more intelligent strategies about resource acquisitions and aggregate the events, enabling all the parties involved to do their thing more efficiently.
Putting it all together we can take our scale-out efforts pretty far: If a particular calculation is very demanding, we can put the service carrying it out on a separate machine and it’s not going to affect anything else. This is very powerful, but eventually we’ll hit a wall again – even within a narrow scope we’ll accumulate too much data. The data we have partitioned “vertically” will have to be partitioned “horizontally”. It’s a big challenge, but also the “holy grail” of scalability and we have some ideas as to the approach and maybe one day I’ll be able to tell about it as well.