I found this post on the internet in 2022 and was quickly amazed by the author’s good abstractions of several popular system design patterns. They do pretty much help of identifying if an implementation most likely belongs to a particular pattern, allowing us to perceive a brief understanding of how it works and then delve into the details following a top-down approach. This post is old but gold, and others are also worth reading.
Looking back after 2.5 years since my previous post on scalable system design techniques, I’ve observed an emergence of a set of commonly used design patterns. Here is my attempt to capture and share them.
In this model, there is a dispatcher that determines which worker instance will handle the request based on different policies. The application should best be “stateless” so any worker instance can handle the request.
This pattern is deployed in almost every medium to large website setup.
In this model, the dispatcher multicast the request to all workers of the pool. Each worker will compute a local result and send it back to the dispatcher, who will consolidate them into a single response and then send it back to the client.
This pattern is used in Search engines like Yahoo, and Google to handle users’ keyword search requests… etc.
In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
This pattern is commonly used in large enterprise application. Memcached is a very commonly deployed cache server.
This model is also known as “Blackboard”; all workers monitor information from the shared space and contribute partial knowledge back to the blackboard. The information is continuously enriched until a solution is reached.
This pattern is used in JavaSpace and also commercial product GigaSpace.
This model is also known as “Data Flow Programming”; all workers are connected by pipes where data is flow across.
This pattern is a very common EAI pattern.
The model is targeting batch jobs where disk I/O is the major bottleneck. It uses a distributed file system so that disk I/O can be done in parallel.
This pattern is used in many of Google’s internal applications, as well as implemented in open source Hadoop parallel processing framework. I also find this pattern can be used in many many application design scenarios.
This model is based on lock-step execution across all workers, coordinated by a master. Each worker repeats the following steps until the exit condition is reached when there are no more active workers.
This pattern has been used in Google’s Pregel graph processing model as well as the Apache Hama project.
This model is based on an intelligent scheduler/orchestrator to schedule ready-to-run tasks (based on a dependency graph) across a cluster of dumb workers.
This pattern is used in Microsoft’s Dryad project
Although I tried to cover the whole set of commonly used design pattern for building large scale system, I am sure I have missed some other important ones. Please drop me a comment and feedback.
Also, there is a whole set of scalability patterns around data tier that I haven’t covered here. This include some very basic patterns underlying NOSQL. And it worths to take a deep look at some leading implementations.