This is the third installment in a series presenting configuration management work that comes out of the Drutopia initiative.
- In part 1 of this series, Configuration Providers, we introduced the problem of managing shared configuration in Drupal, starting with how to determine what configuration is available.
- In part 2, Configuration Snapshots, we covered creating snapshots of configuration to facilitate comparison of different configuration states.
- In this installment, we'll look at another key issue when updating shared configuration: how to handle customizations.
To help clarify the problem, take this scenario:
- You install a site from a Drupal distribution.
- One of the pieces of configuration that distribution provides is an event content type.
- You edit the content type to tweak the help text that appears when a user creates a new piece of content.
- Now a new release of the distribution is available.
You want to update to the new version and get any updates it provides, including any changes that were made to the event content type. But - and here's the tricky part - when you update, you don't want to lose the customization you made to event the content type's help text.
Customizations, updates, and merges
As noted in part 2 of this series, there are three different states to consider when we update configuration that's provided by extensions (modules, themes, or the install profile):
- The configuration as it was provided when we first installed or last updated. This is what we store in and read from a snapshot.
- The configuration as it's currently provided by the extension.
- The configuration as it's currently saved in the site's 'active' storage.
To bring in updates while respecting customizations, we can't take any one of these states on its own. If we take just the newly-provided version, we would lose the customization we did in the active version. If we stick with the active version, we don't get the changes in the newly installed version. instead, we need to merge in updates. We want to bring in any changes from the updated version, but only if they don't overwrite or undo any change that's been made in the active configuration. In our merge, we need to consider all three configuration states. In other words, we'll do what's called a three-way merge--a merge of data from three distinct sources.
Merging data from three or more sources or states is a fairly common computing need.
- In version control software like Git, used for Drupal development, you might need to do a three-way merge to merge changes from two different sources into an original version of a file.
- Similarly, a k-way merge algorithm is used for "taking in multiple sorted lists and merging them into a single sorted list".
Both of these merge strategies are roughly analogous to what we need to do.
The structure of configuration data in Drupal
In merging Drupal configuration data, we need to take into account how configuration data is structured. Drupal configuration data is stored in array format. Since the relevant Drupal code is written in PHP, it's PHP arrays we're dealing with.
As explained in on w3schools.com, there are three distinct types of arrays in PHP:
- Indexed arrays, sometimes called sequence or numeric arrays - arrays with a numeric index.
- Associative arrays, sometimes called hashes - arrays with named keys.
- Multidimensional arrays, sometimes called nested arrays - arrays containing one or more arrays.
Drupal's configuration data structure can combine the three types. Our merging strategy needs to handle multidimensional arrays and, in doing so, treat each array item differently depending on whether it's indexed, associative, or multidimensional.
PHP has a bunch of functions that work with array data including two specific to merging:
array_merge_recursive(). Drupal core includes some specialized helper methods for merging data arrays:
config-extra Drush extension provides support for doing configuration merges "in instances where different team members need to change the site's configuration settings at the same time"--see relevant documentation.
None of these does exactly what we need, but they provide pointers.
The method takes three arguments as input: the previous (snapshotted), currently-provided, and active states of a configuration item.
The approach is pretty straightforward:
Distinguish between indexed and associative arrays. When exporting configuration, Drupal uses a Symfony method,
Inline::isHash(), to determine if an array is indexed or associative. So we do the same.
For associative arrays, merge in additions, deletions, and changes and recurse as required.
- Additions are items with array keys present in the current data but not in the previous snapshot.
- Deletions are the inverse: items with array keys present in the previous snapshot but not in the data as currently provided.
- In both cases, we consult the active configuration before bringing in an update. We remove an item only if it hasn't been changed (customized) since the previous snapshot. We add an item only if it hasn't already been added.
- Finally, changes are items with the same key but different values in the snapshot and the currently provided data. As with deletions, we consult the active configuration storage and accept a change only if it hasn't been customized. If the changes value is itself an array, we recurse. That is, we pass the value as an argument to the same
ConfigMerger::mergeConfigItemStates()method, so that this array, too, can be appropriately merged.
For indexed arrays, merge in additions and deletions. For indexed arrays, we do much the same as for associative ones, but with an important difference.
- Because the array keys are assigned sequentially, they don't have a fixed relationship with their values. This means that while we can say what's been added or deleted, we can't really say that an existing array item has changed.
Using this merge algorithm, Configuration Merge makes it possible to calculate configuration updates that don't overwrite what's been customized on a site.
While Configuration Merge applies when merging in configuration updates from installed extensions, it can also apply to the configuration staging workflow that's built into Drupal core.
When it comes to staging and handling configuration, one of the most significant developments in the Drupal contributed modules space is Config Filter. The Config Filter module makes it possible to modify configuration as it's being staged or imported from one environment to another.
The original use case of Config Filter is Configuration Split, a module that makes it possible to "define sets of configuration that will get exported to separate directories when exporting, and get merged together when importing".
There are two Drupal modules that use Config Filter to protect customizations from being lost when staging configuration.
- The Config Ignore module provides a configuration filter that allows you to specify a list of configuration items that will be ignored (skipped) when you synchronize configuration from one environment to another.
- The Configuration Overlay module provides a configuration filter to turn the configuration export into an overlay of the shipped configuration of all installed configuration. Installing this module and subsequently exporting configuration will leave only those configuration files in your export directory that have been added or modified relative to the shipped configuration of modules and installation profiles.
Both are useful, but they don't cover the specific use case of merging in configuration updates.
To fill this gap, Config Merge has a sub-module, Config Merge Filter, that provides a Config Filter plugin that safely merges configuration updates into the site's active configuration, using the
ConfigMerger::mergeConfigItemStates() method. For this three-way merge, the three states used are:
- Core's configuration snapshot, representing the configuration as last staged.
- The configuration as it's currently provided in the sync storage, with all previous configuration filters applied.
- The configuration as it's currently saved in the site's 'active' storage.
If you install Config Merge Filter and then run core's configuration synchronization, updates will be merged into the active storage, retaining customizations.
While the current algorithm used in Configuration Merge covers the basics, there's plenty of room for improvement.
For one thing, we're currently playing loose with ordering when merging arrays. For associative arrays additions, we're just adding new keys to the end of existing arrays. For indexed arrays, we're doing some very rough handling of ordering by keeping track of deletions and swapping in additions. In both cases, we may end up with insignificant array ordering differences that show up as later differences between array states. An improvement would be to retain sort order when merging in array additions; see this issue.
More broadly, in Drupal 8, modules can provide configuration schemas that define the configuration they provide; see the documentation on configuration schema/metadata. A merge algorithm based on the configuration schema could be way more robust. See the issue Use config schema to ensure valid merging.
Related issue on Drupal core
Stay tuned for the next post in this series: Configuration Alters.