Sector(s)

Project Team

Team members

Salsa Digital, amazee.io, section.io and Victoria’s Department of Premier and Cabinet (DPC) worked together managing huge traffic spikes on Victoria’s Single Digital Presence platform. During and directly after daily COVID-19 briefings, traffic would increase by around 4000%! We set up scaling and caching strategies to meet the significant increases in traffic.

Describe the project (goals, requirements and outcome)
Back to top

SDP’s challenge

During his daily pandemic briefings, the Victorian Premier directed citizens to government websites for more information. Request volumes on the SDP platform increased by around 4000%, to 400,000 requests per minute in the space of 30 seconds.

Back to top

SDP’s transformation — refined architecture for improved performance

Salsa Digital, amazee.io, section.io and SDP worked together to meet these large traffic spikes. We tackled the issue by both pre-incident planning and then post-incident planning to refine our processes and continually improve.

Pre-incident planning

We made the following refinements to the technology architecture of the SDP platform:

  • Auto-scalable workloads
    • Origin clusters define horizontal pod autoscalers that are configured to add more computing resources automatically based on CPU utilisation of the web workload.
    • Edge workloads are deployed to Section’s edge Kubernetes service, which allows for automated scaling events to be triggered and more caching nodes to be added as traffic increases.
  • Caching strategy
    • We worked to define an effective caching strategy for the web properties. The strategy involved using cache tags generated by Drupal and surfacing them via decoupled frontends. This allowed Drupal to issue invalidation requests for all connected devices and enabled content change during high traffic events without service interruption.
    • This strategy allowed us to achieve a 99% cache offload rate, which ensured that origin remained operational during traffic peaks
    • We also set up a regular load testing process

Post-incident strategy

After each event we entered into a retrospective and blameless post-mortem to analyse how the team and systems handled the traffic event and identify areas of improvement that we could make for the next one.

Some of the key activities included:

  • Log monitoring and alerting processes
  • High traffic event response team — due the nature of how the events were triggered we could form a team to review and monitor platform services during events and:
    • Preemptively scale up origin resources
    • Preemptively scale up edge resources
Back to top

The outcomes — an even more resilient platform

  • Platform that can handle rapid ramp ups of traffic
  • New incident response strategy —used these events as inputs to define a response strategy that involves:
    • Preemptively engaging team members where possible
    • Preemptively scaling workloads (both origin and edge) prior to the event to give more compute capacity and headroom
  • Tailored caching solution that empowers the decoupled architecture of SDP
  • Process improvements and the introduction of burst capacity to monitor and be on stand-by during high traffic events
Back to top
Why Drupal was chosen

Drupal is the CMS of choice for the Victorian Government in Australia. In 2018, Victoria’s Department of Premier and Cabinet and Salsa Digital built a headless Drupal instance as part of a new open source, whole-of-government platform, Single Digital Presence. Drupal was chosen as the CMS and is being used to drive consolidation and an open source community within the Victorian Government. Single Digital Presence consists of three products:

  1. Bay — an open-source platform based on Lagoon
  2. Tide — a Drupal CMS distribution
  3. Ripple — coupled with Nuxt, the frontend presentation layer
Image

Technical Specifications

Drupal version: