Scaling New Heights: A Deep Dive into Scalability in Software Architecture

scalability

In New York City, there was an amusement park that drew in thousands of visitors every weekend and on public holidays for exciting rides. The roller coaster served as the key attraction. The park was thriving as well. With influencers promoting the amusement park, the number of visitors surged dramatically, leading to emerging issues. The roller coaster, which is the highlight of the park, is now struggling to accommodate the increasing number of guests. Visitors are becoming frustrated due to the lengthy waits in line.

The owner of the amusement park has two options:

  1. Enhance the current roller coaster to accommodate a larger number of riders.
  2. Construct additional roller coasters to distribute the capacity.

This mirrors the challenge that software architects encounter whenever they are tasked with scaling their applications.

  • Upgrading the attraction—vertical scaling (scaling up)
  • Introducing more attractions—horizontal scaling (scaling out)

In this post, we will examine scalability and look into the two main strategies that can assist you in making informed scaling choices—not solely from a technical standpoint, but also considering cost, time, and effort.

Scalability is the ability of a system to handle a growing amount of work by adding resources. In software systems, scalability is a way of designing applications that can handle an increasing number of users, requests, or data. Just as an amusement park needs extra rides and staff during peak times, a scalable system also adds resources to handle higher loads.

According to Wikipedia:

Scalability is the property of a system that handles a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system.

Elasticity or Flexibility: The ability to rapidly increase (or decrease) capacity in response to fluctuating demand. Consider it similar to bringing on temporary staff during peak times.

Resource Allocation: Effectively distributing computational resources (CPUs, memory, storage) to prevent any component from becoming a limiting factor.

Financial Efficiency: Smartly scaling so that you incur costs for additional resources only when necessary, steering clear of unnecessary expenses during quieter times.

What if a surge in visitors to the amusement park occurred because of marketing and YouTube influences

There are two methods to handle the increased crowd: 

  1. Enhancing the roller coaster to a mega coaster by increasing its capacity—vertical scaling
  2. Introducing more rides—horizontal scaling

Each option comes with its own set of benefits and challenges. Let’s explore further. 

The owner of the amusement park considered opting for the first alternative, which involved enhancing the existing roller coaster with a higher-capacity attraction. This approach represents vertical scaling, where performance is improved by adding additional resources within an existing system (or ride).

By increasing the capacity of the ride, it is possible to allow each visitor to carry more passengers per cycle, which reduces the length of wait lines. In a similar vein, upgrading a digital system’s CPU, network, or disk enhances the server’s capability to manage rising traffic levels.  

The owner of the amusement park considered opting for the first alternative, which involved enhancing the existing roller coaster with a higher-capacity attraction. This approach represents vertical scaling, where performance is improved by adding additional resources within an existing system (or ride)

By increasing the capacity of the ride, it is possible to allow each visitor to carry more passengers per cycle, which reduces the length of wait lines. In a similar vein, upgrading a digital system’s CPU, network, or disk enhances the server’s capability to manage rising traffic levels. 

According to Wikipedia:

Scaling vertically (up/down) means adding resources to (or removing resources from) a single node, typically involving the addition of CPUs, memory or storage to a single computer.

Advantages

  • Ease of Implementation:
    Vertical scaling
    requires minimal adjustments to the software architecture. In this process, current servers, nodes, or databases are improved with faster processors, additional memory, or upgraded storage.
  • Quick performance enhancement with slight modifications:
    Increasing ride capacity was a straightforward solution, resulting in a quick advantage. Likewise, adding hardware such as a CPU, memory, and additional storage can be accomplished swiftly, providing a server with a boost to manage higher traffic loads.  

Challenges

  • Downtime:
    To improve the rides, the roller coaster must stop all operations. In the same way, enhancing hardware in a digital system requires downtime (the period when the server is offline for upgrading). 
  • Physical Limitations:
    Any physical enhancement of the existing machine (or ride) has a maximum capacity. Furthermore, high-end components tend to be costly.
  • Single Point of Failure:
    If a ride malfunctions, the entire roller coaster must halt for repairs or part replacement. Similarly, if a component malfunctions after the upgrade, the whole system will be impacted

The owner evaluated the benefits and challenges of enhancing the roller coaster ride (vertical scaling). One of his friends, Mr. Kohli, encouraged him to consider a different perspective (horizontal scaling). In this method, rather than improving current rides, the owner should introduce additional rides to share the load. In digital environments, we add more machines to manage the workload

According to Wikipedia:

Scaling horizontally (out/in) means adding or removing nodes, such as adding a new computer to a distributed software application.

Therefore, when there are more visitors, you set up several ticket counters and add multiple roller coaster rides to evenly distribute the crowd.

Advantages

  • Limitless Growth Opportunities:
    In horizontal scaling, the number of servers (or rides) can be adjusted according to demand. 
  • Robustness and Fault Resilience:
    With multiple rides available, if one roller coaster faces an issue, other operating rides can handle the demand. Likewise, in the digital environment, the failure of a single server (or node) won’t disrupt the entire system, as other components continue to function, reducing the overall impact on user experience.  

Challenges

  • Management Complexity: 
    A growing number of rides requires coordination to ensure the safety, proper monitoring, and maintenance of each ride. The increase in the number of servers (or nodes) during horizontal scaling adds to the complexity of digital systems. Efficient management of inter-server communication, load balancing, and data consistency across all servers (or nodes) is required.   
  • Network Latency: 
    Increasing the number of rides also prolongs the travel time for visitors to reach rides with fewer crowds. Similarly, the addition of more servers results in increased network latency.  

Scaling isn’t only a technical decision—it also needs business intervention. Every strategy involves costs, time, and effort. So, understanding the budget, the need to scale, and operational complexity is a must before choosing any strategy or deciding if a mix of both is required.

⚙️ Lower operational complexity:
The load is not increasing exponentially, and you know for the next few years, a one-time cost to upgrade can save you the operational cost (monitoring, and maintenance) and fulfil your business needs.

💪 Quick performance boost:
When the visitors are predictable. In the case of a digital system, when the traffic increase is predictable and does not fluctuate.

💰 High upfront cost:
Adding roller coaster rides is expensive. And you need to upgrade your thinking for the next few years. Similarly, adding resources to a server requires an upfront cost; it is required for an upgrade.

Downtime:
Upgrading servers (or rides) requires downtime.

👷 Upper Limit:
There’s an upper limit for any server (or nodes/rides). You have to change the server (or ride) after that.

🛑 Single Point of Failure:
Upgrade can improve performance immediately, like drinking an energy drink gives you an instant boost. But failure in the upgraded server (or node/ride) can take the whole system down. Which might result in significant downtime.

💡 Example in Tech:

  • Upgrading a database server from 16GB RAM to 256GB RAM.
  • Switching from a traditional HDD to high-speed SSD storage.
  • Using a larger EC2 instance in AWS instead of multiple smaller ones.

📈Scales indefinitely:
With horizontal scaling, you can add as many rides (servers or nodes) as per the demand and also remove them when not in use.So, with horizontal scaling, you can handle unpredictable and fluctuating visitors (traffic).

No Downtime: Horizontal scaling ensures no downtime as it adds and deletes rides (servers or nodes) as per the need. Also, you can set the minimum number of rides (servers or nodes) you need all the time.

🚦No single point of failure: Even if a ride (server or node) is faulty, others keep running, which will save the server from breaking completely.

💰 Cost-effective: You can rent as many rides (servers, nodes, or infra) as per your requirement and return them when not in use. In the digital world, most cloud providers provide “pay as you go” subscriptions, which means you only pay for what you use and for the time you use it. In this, there is no upfront cost involved.

💰 Higher operational costs: Each ride (or server) needs maintenance, monitoring, and resource allocation, which increases operational costs.

Data Synchronization: Synchronization between rides (servers or nodes) needs to be carefully handled. In amusement parks, visitors need to be directed to the working rides. Similarly, in the digital world, traffic orchestration needs to happen carefully, and this can only happen with proper synchronization.

💡 Example in Tech:

  • Adding more application servers behind a load balancer.
  • Using database sharding to split data across multiple servers.
  • Deploying services across multiple availability zones in AWS.
FactorVertical Scaling (Upgrading One Ride)Horizontal Scaling (Adding More Rides)
💰 Upfront CostHigh upfront cost for a powerful machinePay-as-you-go; cost scales with usage.
💰Operational CostLess operational cost due to a single server or node (ride)Higher operational code due to multiple servers or nodes (rides)
DowntimeDowntime required for upgradeNo downtime is required; servers or nodes (rides) can be added on the fly.
👷 EffortEasy to manage (single system) but has a limit.More complex—requires load balancing and coordination
🛑 LimitationsThere’s a maximum limit to upgrades.Needs strong orchestration to avoid inefficiencies.
🚦Single Point of FailureYes, because of a single server or node (ride).No, there are multiple servers or nodes (rides).

Now, what if we combine both approaches? That’s what most modern amusement parks (tech systems) do.

📌 Start with Vertical Scaling: Begin by upgrading the critical systems to handle the current growing load. Like in the amusement park, the owner upgraded the roller coaster, which for once took care of the growing crowd.
In the technology world, a critical service can be scaled vertically by using faster databases, better CPUs, and higher memory.

📌 Introduce Horizontal Scaling: Once we know that the demand is increasing and we are going to get an unpredictable crowd (traffic), then horizontal scaling—adding more instances of servers or nodes (rides)—makes sense to manage exponential visitor (traffic) growth and then return the servers (rides) when the traffic is not much.

📌 Smart Load Balancing: Like in the amusement park, there are multiple ticket booths and queue management by staff. Visitors are directed to the different rides depending upon the load on each ride. Similarly, in the digital world, load balancers redirect traffic to different servers (or nodes) efficiently.

📌 Auto-Scaling: As the owner sets it, it rides on auto-scaling (scaling up when there are more visitors and scaling down when the visitors are low in number). In the same way, modern cloud services like AWS, GCP, and Azure automatically scale up and down servers depending upon traffic.

Achieving scalability is not a one-day job; it requires thoughtful planning that comes from system design (click here to know what is system design).
Here are some of the advanced strategies and techniques to build scalable systems.

🏕️ Each Ride Operates Independently
In the amusement park, all the roller coaster rides are managed independently. Every ride has its dedicated team (working as independent microservices).

🚨 Now, what if a ride’s wheel breaks down?
With independent teams, the other roller coaster rides keep running smoothly. This is microservices in action—each ride (service) runs on its own, preventing a single failure from taking down the entire park.

🛠️ Technical Insights

  • Isolation for Flexibility:
    By breaking a monolithic application (single ride) into microservices (multiple independent rides), each service (ride) is isolated and can function independently. Each microservice (ride) can be developed, deployed, and scaled independently.
  • Service Communication:
    Every ride has its walkie-talkie to send quick messages to each other. Similarly, in a microservices system, using REST or gRPC can make fast and efficient communication between services.
    Sometimes, an announcement needs to be made, like at closing time. So, in the amusement park, a big loudspeaker is used to convey messages to all the ride staff and visitors. Similarly, in microservices, queueing systems (like Apache Kafka or RabbitMQ) are used to send broadcast messages to all systems at once.
  • Resilience:
    What will happen if a ride stops working in the middle of the day? No worries because the riders will immediately stop the ride for a few minutes and investigate, then retry, and even if nothing works, they stop the ride until it’s fixed and switch visitors to another ride. Similarly, in microservices, if a failure happens, then strategies like retries, timeouts, and circuit breakers are implemented using tools like service meshes (e.g., Istio) and API gateways.

⚖️ Trade-Offs

  • Complexity:
    Managing ride-to-ride communication requires robust management. In the microservices world, efficient monitoring and orchestration of interservice communication and data consistency are necessary.
  • Operational Overhead:
    Increasing and decreasing numbers of rides increase management overhead. In microservices orchestration platforms like Kubernetes become essential to manage scaling up and scaling down of servers (or nodes), which adds extra complexity to the system.

🎟️ Multiple Ticket Gates for Faster Entry
We saw how amusement parks created multiple ticket counters to avoid overwhelming a single ticket gate during rush hours.

🚨 What if there was only one ticket booth?
If amusement had created only one ticket counter, then managing the crowd would have become impossible, and the level of frustration for the visitors and the crowd would have been sky-high. Adding multiple ticket entry gates made guest entry quick and easier. Similarly, in the microservices world, the load balancers distribute traffic to multiple servers or nodes.

🛠️ Technical Insights

  • Even Distribution:
    Load balancers (like Nginx, HAProxy, and Kubernetes) evenly distribute the incoming traffic. This will keep any server or node from becoming the bottleneck. The same way multiple entry gates do not let a single window become overwhelming.
  • Health Checks:
    Load balancers (like Kubernetes) also monitor server or node health regularly and send traffic only to healthy servers or nodes. This will help the server recover from system failure. Same as during ticket entry, if one scanner stops working, the crowd can still enter the amusement park via any other gate. This will allow some time for the staff to either repair or replace the scanner at that gate or open a new temporary gate.

⚖️ Trade-Offs

  • Single Point of Failure:
    If the load balancer is not designed as master-slave or redundant, it becomes a bottleneck or single point of failure.
  • Latency:
    By adding a layer between incoming traffic and the server or node, we introduce slight latency.  

🎢 Skipping the Long Lines with a Fast-Pass (membership)

As the crowd is increasing, the amusement park has launched a premium membership program. With this program, you will get priority on the roller coaster ride. Similarly, in the digital world, caching is like is readily available quickly. With tools like Redis, Memcache, and CDN websites and services, fetch images and frequently used data at a very fast speed. The cache is a temporary memory.

🚨 Without FastPass, everyone waits longer.

🛠️ Technical Insights

  • Memory Caching: Tools like Redis and Memcached store frequently requested data in memory so it loads faster and also reduces the load on databases.
  • Content Delivery Networks (CDNs): This tool is used to load static assets like images and even full-page HTML closer to the user, which reduces the website page load time drastically.
  • Cache Invalidation: Once the membership tenure is over, the pass has to be invalidated or renewed. Similarly, in the digital world, a cache ensures that the data stored is not stale. This cache uses TTL (time to live), after which the data is removed and needs to be reloaded.

⚖️ Trade-Offs

  • Cache Consistency: During its 5th anniversary, the amusement park announced 3 free rides to all its existing members, but the cards are with the members. Here there is a change in the source system, and the number of rides must be updated on every member’s card. In the software world, cache consistency needs to be maintained when data is changed in the source system. This is done by invalidating the cache data or by updating the cache data.
  • Cache Memory Consumption: Memory usage increases with the increase in cache consumption. So, it’s crucial to plan and optimize the cache data. Too small a cache will not help improve performance, and too large a cache will give high stress to memory.

🛝Expanding Your Park with Multiple-Themed Zones

In an amusement park, there are a lot of rides and other activities along with roller coaster rides. If, while designing the amusement park, any activity is placed randomly or in one place, searching will take a lot of time, as people will not have any memory of where to go for a particular ride or activity. This will lead to chaos, delays, and finally frustration.

So, instead of placing anything in one place, the amusement park is designed in a way that is divided into multiple themes. The rides and activities are also divided based on themes; then it would be easier for visitors to remember rides and activities as they’re related to the theme. This is what we call “sharing in system design.

🛠️ Technical Insights

  • Sharding:
    Sharding is dividing a dataset into smaller chunks and storing them in different databases on different machines to scale out or horizontally scale.
  • Replication:
    As we saw, the main attraction was the roller coaster, so if a roller coaster is placed under each theme area, then this replication (or duplicity) will save visitors from the long waiting queue. Similarly, creating copies of a database across several servers (or nodes) ensures high availability, fault tolerance, and a performance boost.

⚖️ Trade-Offs

  • Query overhead:
    If a visitor needs to take two rides that are in different themes, then the amusement park staff need to guide him to the correct theme. Then the visitor must go and collect the tickets from both the themes. Similarly, in the digital world, the data is stored across multiple shards. A service needs to redirect a query to the correct shard. Sometimes a query result can be achieved using data from multiple shards, so this also introduces some latency.
  • Operational Overhead:
    As there are multiple theme zones, to maintain different themes, more staff is required, and more uniforms and decorations are also required on a per-theme basis. Similarly, in the digital world, managing multiple databases and servers is complex and needs more administration.
  • Increases Infrastructure Cost:
    With different theme zones, different staff, and different uniforms, different decorations need to be purchased, which requires more cost. In the digital world also, sharding involves multiple servers and databases that increase cost. So, optimization is essential.

🧑‍🤝‍🧑 Hiring Extra Staff for Peak Days
During weekends, public holidays, and summer holidays in school, the crowd is huge in the amusement park. More staff is needed during those peak days. So, hiring staff from a staff agency only for the peak days will reduce the cost of staffing.

🚨 Why pay for extra staff year-round when you only need them occasionally?

Auto-scaling:
Increase or decrease resources (staff) based on demand.

As per Wikipedia:

Autoscaling, also spelled auto scaling or auto-scaling, and sometimes also called automatic scaling, is a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm. 

🛠️ Technical Insights

  • Cloud Auto-Scaling:
    Hiring a staffing agency to add or remove staff automatically based on the crowd. Similarly, cloud platforms (like AWS, Azure, and Google Cloud) support auto-scaling to automatically add or remove resources based on real-time demand.
  • Monitoring and Metrics:
    The decision of when to increase or decrease the staff can only be seen with proper monitoring of visitor entries. In the digital world, integrating systems like Datadog and Dynatrace can give you CPU usage, memory consumption, failures, etc. While tools like Kubernetes keep track of CPU usage and memory consumption and automatically scale up or scale down the systems.
  • Elastic Load Balancing:
    Along with hiring a staffing agency, multiple entry gates will help seamlessly distribute the crowd. Similarly, combining auto-scaling with a load balancer seamlessly distributes traffic as new resources come online.

⚖️ Trade-Offs

  • Oscillation:
    If a staffing agency is not planning properly, then a sudden increase and decrease of staff may lead to chaos. In the same way, poorly tuned auto-scaling will rapidly scale up and scale down rapidly, which may introduce instability.
  • Cost Management:
    If staff agency planning is done properly, then it will also increase the cost of the amusement park. In the same way, if auto-scaling is not tuned properly, then it may lead to exponential growth in the infrastructure cost.

The below strategies help run an amusement park smoothly like a highly scalable software system.

The above strategies can complement each other or can create chaos, so this requires careful system design and continuous monitoring.

Designing a scalable system is not just adding servers or resources. It’s about giving careful thought to the trade-offs as well.

  • Complexity vs. Performance: Adding microservices, load balancers, and caching can introduce complexity. But adding these will provide performance that outweighs the complexity.
  • Cost vs. Elasticity: Auto-scaling can change the game, but it requires a lot of monitoring and configuration tuning to ensure that the performance is not compromised due to cost savings.
  • Consistency vs. Availability: In distributed systems, direct consistency leads to performance degradation. So, most systems adopt eventual consistency, which slightly delays the data synchronization, but the throughput is improved.

Building scalable systems needs system design, which includes planning, strategizing, and continuous monitoring. Scalable systems need continuous innovation and strategies with changing business needs.

Thank you for deep diving into scalability. If you’d like to learn more about system design. I invite you to explore my other posts on system design at Techobaby – System Design. Till then, keep learning and building robust systems.

Feel free to share your thoughts and experiences in the comments below, and let’s keep the conversation going on how we can all build better, more scalable systems!

DON’T MISS ANY OPPORTUNITY

Scroll to Top