Building Tech Like Lego: Why System Design Matters in Today’s Digital Landscape

It was a Sunday morning when Prisha, a 10-year-old, and her friend got her Lego bricks out of the cupboard.

“Let’s build a big city with these Lego blocks!” Prisha said excitedly.

Both the girls started building houses, roads, and even a train track. Both were happy and proud, on the other hand. But as the Lego city started growing and buildings got higher, problems began to emerge.

The buildings started collapsing.

  • They ran out of Lego bricks before finishing the tallest tower.
  • The roads were too narrow, and the Lego cars made by her friend were not able to cross all roads.
  • Frustration moved to a different level when Lego cars of both the friends got stuck over a bridge.

Prisha screamed. “Father, I need help.” Here comes Mr. Kohli—Prisha’s father and the solution architect with a vast 15+ years of software industry experience.

Mr. Kohli smiled and said, “Girls, you need to know system design.

Mr. Kohli explained to the girls, “System design is a way where you think and plan everything before you start building anything.” In this, you think about how your system will look, what its components are, how you will safeguard your system, how you will make sure your whole system does not fall apart if something wrong happens, and how this system can grow in the future.”

Prisha and her friend were puzzled. “But we are just kids. Why can’t we create whatever and however we want to?

You can, my dear kids, replied Mr. Kohli, but if you do not follow some important system design principles, your things will fall apart as your Lego city. Mr. Kohli also told kids that YouTube, Amazon, and Netflix systems are also based on important design principles.

Prisha and her friend now understood the importance of system design. Both the girls wanted to learn more about system design and wanted to build a better Lego city. So, Mr. Kohli introduced them to some important system design concepts.

So, you started creating your Lego city. You started creating a few houses and roads in the neighbourhood. Imagine some of your friends join you with their Lego blocks, and you all now want to expand your city with additional buildings and roads, but you don’t want to destroy what you have already created. This is what scalability is.

As per Wikipedia, scalability is:

Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system.

So, scalability in your Lego city is the ability to grow your Lego city smoothly by adding extra Lego blocks.

In technology, the scalability of a system allows you to incorporate more users, data, and additional features without revamping the whole system.

In summary, scalability in system design is like ensuring your Lego city is ready for growth—flexible, upgradeable, and efficiently built to handle more challenges as they come. This friendly approach helps us understand that a well-planned system is never static; it’s always evolving, just like your favourite Lego creation.

Here are some of the key ways to achieve scalability:

  • Vertical Scaling: This is like increasing the floors or rooms in a building in Lego City. Vertical scaling in the digital world is upgrading your hardware, like increasing CPU, memory, or storage.
  • Horizontal Scaling: This is when you have more people in your LEGO city; you create more houses in your neighborhood without affecting the current buildings. In technology it’s like adding more servers or nodes to distribute the work.
  • Load Balancing: Imagine you have a fair in your LEGO city, and there you have your own very big lemonade store. So, you have multiple billing counters so as to distribute the load evenly. In a digital system, load balancing distributes the traffic across multiple servers or nodes so as to save servers from crashing because of overloading.
  • Caching: This is why you keep your most used pieces separately near you so you won’t search for them among all the Lego blocks. In the digital world, caching is storing frequently accessed data in a high-performance memory location so as to fetch them quickly instead of searching them in the bulkier databases.

These strategies all work together to ensure your system grows efficiently and remains robust under pressure. Stay tuned for my upcoming blogs, where we dive even deeper into each of these techniques and explore more ways to build scalable systems.

Imagine in your Lego City a bridge collapsed. What will happen if you are not able to go to the other side of the city? Fault tolerance takes care of this. It says, Your system should not fall apart even if some parts of your system become inoperational.

According to Wikipedia, fault tolerance is

Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components.

Here are some key ways we achieve fault tolerance:

  • Heartbeat and Failure Detection: It’s like patrolling in your Lego City, checking if everything is working fine and there are no loose points where any error started happening.
    In software systems we use “heartbeat” signals—messages to detect failure. Heartbeat is a small message to check if the other system is working in the correct condition.
  • Checkpoint and Rollback Recovery: If you are building a tower in your Lego City and there is a problem with that tower, then you won’t need to create your whole Lego city from scratch. You will just either fix the tower or just remove it and make your Lego city the same as it was before you started building the tower.

In digital systems, a checkpoint is the system state when the system was correctly working and its state was saved. So, in case of failure, the saved working system state can be restored quickly without rebuilding the whole system.

These mechanisms make systems more resilient and ensure that failures don’t lead to complete breakdowns. Stay tuned for my next blog, where I’ll dive deeper into each of these techniques and show how fault tolerance keeps modern technology running smoothly!

In your Lego City, imagine you have a power station that is giving electricity to your city. One day, due to a glitch, it stopped working. What will happen? Your city will be in the dark. Now what if you have one more power station on standby? Now, you just switch that on and start supplying electricity from there. This is what high availability is.

According to Wikipedia:

High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

Here are some key ways we achieve high availability:

  • Redundancy
    As we saw above, two power stations helped Lego City from complete blackout. Redundancy is when you have the same set of structures (like a power station). In digital systems, critical systems or data are duplicated across multiple systems or locations, which can be switched over in case of any failure or disaster.
  • Load Balancing
    Picture a popular Lego theme park with several ticket booths. Visitors are directed to the booth with the shortest line, preventing any single booth from becoming overwhelmed. Similarly, load balancing in system design distributes incoming network traffic across multiple servers. This ensures no single server bears too much load, optimizing resource utilization and enhancing system responsiveness.
  • Failover Clustering
    Consider a Lego hospital with multiple ambulances on standby. If one ambulance is out of service, another is ready to respond immediately. Failover clustering involves grouping multiple servers so that if one fails, another can take over its workload without disrupting services. This setup ensures continuous operation, even during component failures.
  • Distributed Data Storage
    Imagine your Lego city has warehouses in different districts. If one warehouse faces an issue, others can supply the necessary goods, ensuring the city’s operations continue smoothly. In system design, deploying components across various geographic locations protects against localized failures, such as natural disasters or regional power outages, enhancing overall system resilience.
  • Regular System Maintenance and Updates
    Think of your Lego city’s roads and bridges requiring regular inspections and repairs to prevent major breakdowns. Similarly, conducting routine system maintenance and applying timely updates helps identify and fix vulnerabilities, ensuring the system remains robust and reducing the likelihood of unexpected failures.
  • Geographic Distribution
    Imagine your Lego city has multiple fire stations strategically placed in different neighbourhoods. If a fire breaks out in one area, the nearest station can respond promptly, minimizing damage. In system design, geographic distribution involves placing data centers in various locations. This strategy ensures that if one data center experiences an issue, others can handle the load, maintaining service continuity and reducing latency for users in different regions.

By integrating these strategies, you can design your system to be robust and reliable. Stay tuned for my next blog, where I’ll dive deeper into each of these techniques in my upcoming blogs and show how high availability keeps running in any condition!

Imagine Prisha and her friend opened a Lego Burger 🍔 shop in their Lego City.

Both the friends want to keep it open 24/7 (this is high availability!).

But what if the kitchen manager and cashier are not communicating due to the intercom not working? The cashier does not know when the burger is ready or if a type of burger can be made or is out of stock (this is called network partition (P)).

Now the choice is yours:

  1. Do you want to keep selling burgers even if the cashier doesn’t have any idea of the type of burger available? (This is called availability (A).)
  2. Or do you want to stop taking the orders until the cashier verifies the burger availability, which might result in a delay after every order as the cashier needs to go to the kitchen for availability? (This is called consistency (C).)

This is exactly what CAP Theorem is about! It helps us decide whether to keep things running no matter what or pause until everything is perfectly in sync.

Sounds interesting, doesn’t it? We’ll dive deeper into CAP Theorem in our upcoming blogs! Stay tuned! 🚀

Now understanding the importance of system design, Prisha and her friends ask, Where can we see these design principles in the real world? Mr. Kohli replied, “Big companies like Netflix, Google, and Amazon use these system design ideas.” Let’s see with some examples.

Netflix and YouTube never go down (at least not often). But millions of users watch them at the same time.

💡 How do they achieve this?

  • Load Balancing: These services spread traffic across multiple servers.
  • Redundancy: If any of the servers are faulty, another one takes over.
  • Failover Clustering: If an entire region goes down, traffic is shifted to another data center.

During the “Black Friday Sale” 🏷️ on Amazon. Billions of people shop. You add a product to the cart and make payment. What if, just before you start paying, a server crashes! Does Amazon stop working? No! Your cart is still there, and you can complete the purchase without even noticing any issue.

💡 How do they achieve this?

  • Redundancy & Replication: Amazon stores its data across multiple servers at different locations.
  • Heartbeat & Failure Detection: Amazon continuously checks all its system health. During any failure, it redirects the traffic to backup servers.
  • Checkpoint & Rollback Recovery: If during checkout the Amazon server crashes, it rolls back to the last saved state. So, your order is not lost.

After understanding system design principles, Prisha and her friend created a beautiful and fully functional Lego city having stable buildings, efficient roads, and backup power stations.

Similarly, in the real world, software is designed to ensure that applications and services don’t crash when traffic increases. If the system is critical, high availability principles are used.

Mr. Kohli finally smiled, “Whether you’re building with Legos or code, thinking ahead and designing for scalability, reliability, and efficiency makes all the difference.”

Want to learn more about scalable architectures, fault tolerance, and system reliability? Stay tuned for more fun, story-driven explanations of system design concepts!

DON’T MISS ANY OPPORTUNITY

2 thoughts on “Building Tech Like Lego: Why System Design Matters in Today’s Digital Landscape”

  1. I used to think that engineering is a subject that requires alot of craming but after I read this article it made me feel that stories are one of the best ways of making things easy and understandable

Comments are closed.

Scroll to Top