Deep dive into recent Redis & startup performance improvements

Recently, I've been working on a number of performance improvements during the bootstrap phase of Drupal, which is especially relevant to make fast cached responses from the internal page cache and the dynamic page cache modules faster.

Some of these are specifically for sites using the Redis module but various changes happen in Drupal core that should benefit everyone.

Background & Concepts

Delete and invalidate operations

The Drupal cache backend supports both deletions and invalidations. The difference is subtle. A deletion is final, the cache backend must afterwards not return that item anymore. For invalidations, it is possible to set the $allow_invalid parameter to TRUE which may still return invalidated items. A typical use case would be the ability to still return an invalidated item for a certain time, to then rebuild in the backend or combined with a lock, where only a single process rebuilds. Both delete and and invalidate have 3 methods, to invalidate a single item, multiple or everything in a given cache bin.

Recommendation: Always use delete unless there is a specific use case that uses $allow_invalid.

Cache Tags

The cache API in Drupal allows to efficiently invalidate possibly vast amount of cache items. Each cache item is tagged with things it depends on, if that thing changes, all those caches are invalidated, across multiple cache bins. The invalidate operation is very fast, the tradeoff is that every single cache read operation must check if those cache tags have been invalidated. There are ongoing efforts to evaluate that tradeoff, use fewer cache tags or fewer more general cache tags for infrequently invalidated things.

The implementation for cache tag validation is quite optimized with a static cache and support for checking multiple cache tags at once, but still, each cache item that contains at least one not-yet-seen cache tag for the current request results in an additional lookup.

The actual implementation in Core and Redis rely on incrementing a counter for each cache tag on invalidation and then each cache item is the sum of all invalidations of its tags. If that changed, it is no longer valid.

Difference between Redis/Memcache and the database (SQL) cache backend

The default database cache backend maintains a separate cache table for each cache bin (a separately configurable collection of cache items). This allows fairly efficient bulk operations to invalidate or delete all cache items in a given bin.

Redis and Memcache don't really have a comparable concept. These backends need to use alternative implementations to support invalidate and delete all. I'm only familiar with Redis, which uses a separate last-delete-all timestamp for deletions, and invalidate all is a cache tag that's added to all cache items of a bin. This means that each additional bin where items are requested from has an upfront cost to check those two flags, this is something that the database backend does not have.

ChainedFast Backend

Some caches are very small and/or are requested extremely frequently. To handle that, the Drupal cache system provides a so called ChainedFast implementation which stores caches in a local, fast in-memory backend (APCu) but also makes sure that changes in the slower, persistent backend (database/redis/memcache) are respected. This is done by storing a last-changed timestamp in the persistent backend. This allows to keep changes between multiple webservers but also the webserver process (Apache, php-fpm) and CLI (Drush) in sync. This requires one cache get operation against the slow persistent backend to fetch that timestamp for each request and bin.

Drupal by default uses the ChainedFast backend for the bootstrap (early and frequently used caches), config and discovery (various plugin definitions).

Note: Redis supports the alternative and proprietary relay extension which comes with a built-in version of the ChainedFast backend that relies on a persistent connection with Redis and doesn't need that extra query. I've very limited real world experience with that.

How it started...

Initially, I was looking into adding caching for the current languages, this is an early-bootstrap query that is done for almost every request on multilingual websites and it's always exactly the same query, so it seemed like a good candidate to cache, even if it's just one fast query. By using the bootstrap cache bin, it doesn't require an additional persistent cache lookup.

However, it can change when languages are added/removed and can have multiple variations due to translations of those languages, so the initial implementation uses multiple caches and a cache tag.

This resulted in the expected output in the new performance tests, which assert the specific amount of queries, cache operations and other performance-related metrics in common scenarios. One regular query removed, one more cache get (not an actual database query) and a cache tag lookup (an actual database query). This started a discussion whether or not the caching is actually useful.

It also reminded me of an earlier issue that proposes to load cache tags that are frequently used upfront in a single query, the idea is this saves several database queries on many page requests with the tradeoff of checking a few cache tag that might not actually be needed. This should be a very fast operation as it's a single lookup (IN condition in the database, MGET for redis).

I reviewed and tested that a bit using the Redis MONITOR command, which reminded me of some known issues that I've been aware of but also saw several surprising things, which resulted in various investigations and issues of a pretty deep rabbit hole of performance optimizations.

Intermezzo: The MONITOR command

MONITOR allows to watch all queries that are currently run against the Redis server, this allows to inspect exactly what's going on.

To use it, start a redis CLI session and execute the command, this initially responds with an OK and as soon as other operations happens, they show up.

With ddev and the ddev/ddev-redis add-on, it's as simple as ddev redis-cli MONITOR

ChainedFast last write timestamp checks (redis specific)

The initially mentioned concepts combined cause some considerable overhead in specific situations, the most obvious one is the last write timestamp check of the ChainedFast backend. As mentioned, it's supposed to be a single cache operation. However, combined with the requirement to support invalidateAll() and deleteAll(), it actually results in 3:

"HGETALL" "prefix:config:last_write_timestamp_cache_config"
"GET" "prefix:cachetags:x-redis-bin:config"
"GET" "prefix:config:_redis_last_delete_all"

The first is the actual cache lookup for the timestamp, then the Redis cache backend has to check if there was a recent invalidateAll() for that cache bin (the cache tag) and then if there was a deleteAll() operation.

I created an issue to improve this, this is already part of the recent 8.x-1.9 release of the redis project.

What it does is detect these specific cache identifiers, bypass the usual cache lookup and validation and just read and write a plain value (GET instead of Hash get). This is possible because the ChainedFast backend does this on any invalidateAll() or deleteAll() operation anyway, so those too increase this timestamp.

The result:

"GET" "primer:last_write_timestamp_cache_config"

This saves 3 x 2 Redis operations on basically all requests that aren't internal page cache hits.

A possible alternative for this is to change the ChainedFast implementation to use cache tags instead of timestamps, together with preloading common cache tags, this could completely remove separate persistent cache lookups per bin for the 3 common ones.

Disable/deprecation of invalidateAll()

The invalidateAll() check is still necessary on all other bins, on an internal page cache that's two (container and page).

I believe there are no real world use cases (or at least none that outweigh the cost it has) for invalidateAll(), it was added just for consistency with delete operations.

I'm proposing to deprecate and remove that completely. But the Redis cache backend will only be able to remove that check once it requires a release of Drupal core that has that fully removed, that's very far in the future. As an intermediate step, a setting was added to treat invalidateAll() like deleteAll().

Before

"HGETALL" "prefix:container:service_container......"
"GET" "primer:cachetags:x-redis-bin:container"
"GET" "primer:container:_redis_last_delete_all"

With $settings['redis_invalidate_all_as_delete'] = TRUE;

"HGETALL" "prefix:container:service_container......"
"GET" "primer:container:_redis_last_delete_all"

This only saves one Redis operation on internal page cache requests though even though it's 2 bins. The reason for this is why I originally chose to use cache tags. If the first cache item requested for a bin has cache tags, then the invalidate all check is (almost) free, it's one more cache tag that's loaded in the same operation.

Theme extension list cache

Something else that I noticed in the list of Redis operations was this bit:

"HGETALL" "primer:default:core.extension.list.theme"
"GET" "primer:cachetags:x-redis-bin:default"
"GET" "primer:default:_redis_last_delete_all"

This was showing up as part of resolving the dynamic page cache and turned out to be triggered by the current theme cache context and the resulting theme negotiation. This cache was stored in the default cache bin and has now been moved to the bootstrap cache, this is already committed and will be in the upcoming Drupal 11.2 release.

With that, those 3 operations vanish completely, as the bootstrap cache is already used earlier and the last write timestamp was already read. This also benefits anyone using a different cache backend.

Access policy caching

Another cache lookup that results in even more Redis operations is the new Access Policy API cache, in some cases up to 5:

"HGETALL" "core:access_policy:access_policies:drupal:[user.is_super_user]=1:[user.roles]=authenticated,administrator"
"GET" "core:cachetags:x-redis-bin:access_policy"
"GET" "core:access_policy:_redis_last_delete_all"
"HGETALL" "core:access_policy:access_policies:drupal:[languages:language_interface]=en:[user.is_super_user]=1:[user.roles]=authenticated,administrator"
"MGET" "core:cachetags:config:user.role.authenticated" "core:cachetags:config:user.role.administrator" "core:cachetags:access_policies"

This is a combination of a few things that make the resolution more complex:

The Access Policy API allows to integrate more complex access systems, such as the Group module, neatly into Drupal's existing permission and access systems. Those more advanced use cases may very well benefit from additional caching and might also have enough variations that does does not really fit into the ChainedFast system (each user can have different groups)
However, the default implementation of this is the regular user role config entities, which are cached in the ChainedFast config cache bin already. That means this caches fast in-memory caches with slow persistent caches
Additionally, the use of the VariationCache and its cache redirect system break the invalidate cache tag optimization, the first cache lookup has no tags, so it results in two different cache tag lookups. This does not happen when using the mentioned invalidateAll() optimization.

The current proposal to improve this introduces the ability for access policy implementations to not be worth to cache. If none are, the persistent cache is skipped. This needs reviews and feedback

Preloading cache tags

Time to go back to the teased topic of preloading cache tags.

As mentioned, cache tags are loaded the first time they are discovered in a cache item being loaded. While cached rendered elements for nodes, pages, blocks and other elements often have a lot and highly variable cache tags such as entity IDs, regular caches often have relatively few and stable cache tags.

The core issue discusses different approaches for collecting useful cache tags, the cost of loading too many and so on. It might take some time until a default implementation of this lands in core. However, it is relatively easy to implement a custom version of the request subscriber that I proposed that is optimized for your project.

To see those cache tag lookups with redis for your project, combine the monitor command with a grep on cachetag: ddev redis-cli MONITOR | grep :cachetags:

The output will look like this

"GET" "primer:cachetags:route_match" (route lookup)
"GET" "primer:cachetags:entity_types" (entity type definitions)
"MGET" "primer:cachetags:config:block_list" + 100 more (dynamic page cache

Focus on lookups with few and stable cache tags, ignore long lists from dynamic and page cache, there is only a benefit if a specific lookup of one or multiple cache tags can be fully preloaded. Examples include info/discovery cache tags such as entity types and token_info, _values cache tags for entity types that are loaded on many pages as well as config cache tags used on many pages.

Example request subscriber from our project

<?php

namespace Drupal\YOURMODULE\EventSubscriber;

use Drupal\Core\Cache\CacheTagsChecksumInterface;
use Drupal\Core\Site\Settings;
use Symfony\Component\EventDispatcher\EventSubscriberInterface;
use Symfony\Component\HttpKernel\Event\RequestEvent;
use Symfony\Component\HttpKernel\KernelEvents;

/**
* Preload common cache tags.
*
* @todo Remove when https://www.drupal.org/project/drupal/issues/3436146 is
* done.
*/
class CacheTagPreloadSubscriber implements EventSubscriberInterface {

public function __construct(protected CacheTagsChecksumInterface $cacheTagsChecksum) {
}

/**
* Preloads common cache tags.
*
* @param \Symfony\Component\HttpKernel\Event\RequestEvent $event
* The request event.
*/
public function onRequest(RequestEvent $event): void {
  if ($event->isMainRequest()) {
  $default_preload_cache_tags = array_merge([
  'route_match',
  'access_policies',
  'routes',
  'router',
  'entity_types',
  'entity_field_info',
  'entity_bundles',
  'local_task',
  'library_info',
  'token_info',
  'node_values',
  'block_content_values',
  'media_values',
  'file_values',
  'paragraph_values',
  'taxonomy_term_values',
  'user_values',
  'crop_values',
  'menu_link_content_values',
  'breakpoints',
  'config:eu_cookie_compliance.settings',
  'config:imagemagick.settings',
  'theme_registry',
  'config:configurable_language_list',
], Settings::get('cache_preload_tags', []));
  $this->cacheTagsChecksum->getCurrentChecksum($default_preload_cache_tags);
}
}

/**
* Registers the methods in this class that should be listeners.
*
* @return array
* An array of event listener definitions.
*/
public static function getSubscribedEvents(): array {
  $events[KernelEvents::REQUEST][] = ['onRequest', 500];
  return $events;
}

}

Other improvements and related issues

Several of these changes and improvements were not visible at all, not enough or even negatively in the existing automated performance test assertions. That is because they do not make a difference between fast and slow cache bins and count all cache tag API calls. A core issue was opened to group that by cache bin for more specific assertions and count the actually executed queries for cache tag invalidation lookups.

Another performance issue was discovered in the new single directory component system, which loads the full module list from the default cache bin as well to sort components by module weight. This issue was already opened as it was discovered to use excessive memory as well as part of Drupal CMS performance testing I proposed a change that moves that to the discovery phase instead.

In our testing, we noticed that anonymous page cache hits included lookups for the config cache, this turned out to be because the auto_unban module was loading its configuration in the subclassed BanIpManager (we use perimeter + ban + auto_unban to automatically block bots for increasing amounts of time). General recommendation: Avoid loading config, entity storages and other possibly expensive actions in your __construct() methods.

Already in the language caching issue and several others, I noticed some performance tests should have failed on the automated test runs on Gitlab CI but didn't. Others noticed similar problems in test only jobs that are meant to fail and repeated HEAD test fails due to those supposed-to-be-failing tests recently caused tests to fail repeatedly on the primary 11.x development branch, which is very disruptive. We then discovered that a large chunk of the JavaScript tests were silently running into Selenium issues and were skipped instead of failing. This resulted in promoting the existing issue to a critical testing infrastructure problem and combined efforts from various contributors.

Completely separate but related to bootstrap performance, I've been working on performance improvements in the redirect project, which will be described in a separate blog post.

Conclusion

Most of those issues on their own don't have a huge impact. Redis operations are typically very fast (although already less so if Redis runs on a different server), but they add up. In our testing, we've seen 16+ fewer Redis operations on Dynamic Page Cache hits.

This adds up quickly, especially considering that Redis is now used on 50'000 Drupal sites, many of which likely handle millions of requests, and several improvements benefit all Drupal sites. Performance improvements such as these make Drupal faster and consume less energy.