Understanding migrations

Reasoning about changes

Domain model changes over time. Cause for a change can be new requirements, deeper insight or some minor refactoring. There are a couple of basic ways to change a model:

  • adding new fields/objects
  • removing existing fields/objects
  • renaming/changing fields/objects
  • moving data around

When we reason about change, what we usually have in mind can be expressed concisely with DSL and viewed as a simple DIFF operation. For example, if we want to add a new field "finished" to our model:

aggregate Task {
  Date started;

Our new model will look like this, with the diff tool showing the field addition as a newly inserted line:

aggregate Task {
  Date started;
+ Date finished;

which maps closely to our understanding of change. Modeling this way allows us to reason about interactions without writing code. Let's say that on closer inspection we get a deeper insight that "finished" is actually optional information. Now our change will look like this:

  Date started;
- Date finished;
+ Date? finished;

basically pointing out that the only difference is the ? representing an optional property. As our model evolves we want to track this information in more detail and use actual time (instead of just a date) which would result in something similar to:

aggregate Task {
- Date started;
- Date? finished;
+ Timestamp started;
+ Timestamp? finished;

Sometimes better understanding of the domain results in better names. Since there are only two difficult problems in programming:

  • naming things
  • cache invalidation
  • and off-by-one errors

it's important to use names close to our domain. Otherwise communication and reasoning about domain becomes more and more difficult, since we are doing mapping in our head which is not expressed in the domain. Renaming things requires some heuristics, but unless we are doing massive renames all at once, migrations should cope with them:

-aggregate Task {
+aggregate WorkItem {

When we reason about model changes, this is what we usually have in mind. Everything else is just noise and can be inferred from diffs most of the time.

Automating migrations

Usually developers create scripts for database migrations or use some API/DSL to define them. Most of the time, this is just noise and can be automated.

If we add a new aggregate to our model, there is no need to create a script for adding a new aggregate, since the Platform can infer this from the model and create the new aggregate automatically.

When we reference another aggregate using a reference property, there is no need to add a script for creating a foreign key, since the Platform can do this automatically.

When we rename fields, objects, move them from one module to another, there is no need to create a script for that. If we change the data type of our object, depending on their compatibility Platform will try to convert it or drop the old one and create a new one.

Platform reasons about what to do with changes the same way developer reasons about them. If there is some pattern, it can be detected and correct migration procedure can be invoked. While there are currently a lot of patterns implemented, they certainly don't cover all use cases.

Let's look at real world example and how we reason about it and what will happen in reality. If we have model like this:

aggregate Document {
  string name;
  Message[] messages;
  calculated int unreadMessages 'it => it.messages.Count(m => !m.isRead)';
  calculated int totalMessages 'it => it.messages.Count()';

entity Message {
  bool isRead;
  string text;

snowflake DocumentGrid Document {

where we have some document and people can write messages on it.

When we try to use the unreadMessages calculated property to find only documents that are interesting, we could experience performance slowdown since a join must be performed to calculate this information (while in reality this use case would not have a noticeable performance difference, more complicated cases would).

So we decide that we want to persist this calculated property in a table. In DSL this is defined intuitively, by adding a "persisted" concept:

- calculated int unreadMessages 'it => it.messages.Count(m => !m.isRead)';
+ calculated int unreadMessages 'it => it.messages.Count(m => !m.isRead)' { persisted; }

When this kind of change is detected in migrations, appropriate steps will be taken. What are the appropriate steps? After some thought we will probably arrive to this conclusion:

  1. add unreadMessages integer field to the Document table
  2. copy value from the calculated expression into that field
  3. remove use of the calculated expression and use actual field instead

But let's say for the sake of better understanding that we don't want to use calculated field anymore and want to maintain this value in the backend via some other way. What will happen if we do:

- calculated int unreadMessages 'it => it.messages.Count(m => !m.isRead)';
+ int unreadMessages;

Well, Platform will decide that you want to create a field in the Document table and from now on use this information in some different way. With that in mind it will run the same algorithm again.

Destructive migrations

As long our migration is safe (meaning, that it will not destroy data) the DSL Platform will do it without much fuss. But if we remove a field, this will make the migration a destructible one, since there will be no way to go back after this migration is run.

In that case Platform will require confirmation for these changes just in case something unexpected is happening. For example, while we can do this migration in one step:

aggregate Task {
-  date started;
-  string[] tags;
+  timestamp startedOn;
+  string[] comments;

more complicated migration will require two steps. Otherwise something "unexpected" will happen and Platform will protest:

aggregate Task {
- timestamp started;
- timestamp? finished;
+ timestamp startedOn;
+ timestamp? finishedAt;

While Platform could use heuristics to guess what we meant (like using the Levenstein distance to pair new with old fields), it currently doesn't do that.

We have to run this migration in two steps, first renaming one property and then renaming the other property. While this shows some of the weakness of this approach, there are ways this could be worked around, but are beyond the scope of this article.

Using expressions to migrate data

Since Platform can create SQL queries from LINQ expressions, we can leverage this feature to create more complex migrations. Default concept can be used to set default value for new property on server. As long as field is non-optional, database migration will execute SQL to populate new field with default value.

For example, while we could use name of the property to choose default value of false for boolean property:

aggregate Task {
+ bool isDone; // defaults to false as default of bool type

we can also choose to set default value for this property:

aggregate Task {
+ bool unfinished { default 'it => true'; }

Lambda expression can be arbitrary complex, as long as it can be converted to SQL:

aggregate Task {
  timestamp? finishedAt;
+ bool markedDone { default 'it => it.finishedAt != null'; }

Platform extensibility can be used to add new extension for C# -> SQL conversion when required.