All Articles

Harvard CS75 - Web Scalability

Today I watched a lecture on web scalability from David Malan that was really interesting. Here are some notes and takeaways from it. The slides for this lecture can be found here.

Vertical scaling

  • more resources/throwing more money at machines
  • adding RAM and/or CPU to the same server and machine Note: there’s a very finite limit to the amount of resources we can add to any one machine

Horizontal scaling

  • rather than using fancy machines, use multiple less fancy machines to less of their ultimate capacity
  • this means that an HTTP request gets distributed amongst multiple servers

    • one way to do this is to use the load balancer as the as the receiver of the request client -> request -> load balancer IP address -> load balancer distributes requests

Load balancer routing

How does the load balancer route the request to the correct, or most efficient server?


  1. Keep track of which server is the busiest, and send the request to the least busy server - but this requires that each server have the same content

    • downside: this would mean a lot of space redundancy: content * n servers
  2. Have dedicated servers that each have their own content

    • downside: certain servers could be responsible for more popular content, and thus receive many more requests than the others
  3. “Round Robin” style load balancing: have the load balancer return the IP address of the first machine, then the second, etc. and have the order wrap around

    • downside: you can have power users that distort the traffic but still be used n^th of the time


“Sticky Sessions”: when data is preserved over time even if the user leaves and comes back

We have to be mindful of sessions: when a user is logged on with a session cookie to one server, they need to be sent to the same server when they visit the site the next time.


  1. We can abstract out session data to a separate server connected to all other servers

    • downside: this introduces a single point of failure - what happens if our session server goes down?
  2. We could store the IP address of the server in the user’s cookie

    • downside: doesn’t preserve privacy of server IP’s, not resilient to change
  3. Use the load balancer to set a value in the cookie that hashes to a private server IP

    • this seems like the best solution!


Some sites (like Craigslist) use cached html files that they dynamically populate with information from a SQL database.

memcache: a key-value storage mechanism for SQL id’s in a FIFO fashion. Works by LRU (least recently used), where oldest entries are thrown out when the cche reaches capacity.

Database replication

Sometimes we want to make automatic copies of important information. The most common pattern is having a primary-replica relationship, where the primary database has read and write privileges, but the replicas only read from the primary.

Here’s an example I found on MongoDB’s website: primary


Malan uses the example of Facebook in the early days where each university had its own server - i.e. and

This creates the problem that when you want to push out new features, you have to push them to each individual university’s server, and that it’s difficult for people to share information between universities.

It also creates a load balancing problem if one university has much more traffic/students than others

Enter partitioning: instead of each university having their own server, you could partition the servers by a more evenly distributed piece of information: last name.

Here’s a slide that illustrates this: image of database partitioning