Scaling: Comparing Apples to Oranges

Disclaimer:

  1. I'm not a scaling expert
  2. I'm not even a programmer
  3. I'm neither a Digg nor a Stack Overflow user
  4. I have no problems with Joel Spolsky

With those points out of the way, I wanted to write a post about a statement that Joel Spolsky made that made in regards to scaling and comparing Stack Overflow to Digg. I think it can go without saying (based off of the above disclaimers) that I don't understand the intricacies of either Digg or Stack Overflow's code base and server setup, however, I have culminated some of the discussion on this topic.

 

Digg: 200MM page views, 500 servers. Stack Overflow: 60MM page views, 5 servers. What am I missing?

As I'm sure many of you know Digg is a social news site where community members can submit links for other community members to vote and comment on as well as dynamic widgets that can be found on thousands of sites across the web.

Stack Overflow, is a Q&A style site which allows it's community (of programmers) to ask and answer questions of various topics on programming and vote up or down on the submitted questions and the answers.

So what is being said about this comparison?  First off, I'll direct you to a former Digg engineer's comment about this very subject. This post more speaks about what servers Digg has running (or had during his time there).  

 

Interestingly enough, there is a decent conversation thread on reddit about this as well (where people far brighter and informed on scaling than I, discuss the differences), that I thought I would highlight some of the more on topic comments.

 

We're not comparing apples to apples here. Digg has a different feature set and a completely different social setup. Stack Overflow doesn't promote "discussions" along the same line as digg does, nor does it have nearly as many simultaneous DB requests. "Page views" doesn't mean much (consider all the "outside" AJAX requests too.)

I'm pretty sure if you look at the read/write DB ratio, Digg will be much, much heavier on the write site than SO.

The hot pages Digg wouldn't be able cache efficiently as discussions are ongoing and updates are expected (community participation is encouraged, caching responses for even a few minutes hampers participation). Comments are frequent, small, and semi-threaded (at least they were).

Digg will have a large amount of activity on all new stories as they rise up and eventually they fall off. Caching old stories won't help because no one visits them, the bulk of their 200MM visits are on the newest most popular links.

Stack Overflow will have articles hit by google, so visits are spread over new and old questions. Since old questions become static (already answered so no need for new responses) they can be cached completely, and since a good portion of those 20MM visits are spread out over old posts thanks to google those static page caches will have a major impact.

Plus near-realtime response is not expected on Stack Overflow (You walk in expecting to wait a few hours for an answer) and discussion is not the goal (no threaded responses, you respond directly to the question). Comments are infrequent and large and a longer cache won't have as much of a pronounced impact on visitor perception as it would on Digg (again, as a visitor you aren't expecting responses frequently or in large numbers).

Edit Digg's server architecture could probably be slimmed down, but Stack Overflow vs. Digg is apples and oranges when you take into account usage.

 

The proportion of write to read traffic Digg is built to receive in comparison to SO.

Reads are relatively easy to scale out: cache and load balance/proxy. (Add more SQL slaves, etc)

Writes are much much harder to scale out. You start dealing with flooding, stale caches (which in turn makes cache regeneration a scaling issue), issues that (a) a large number of SQL slaves only exacerbate due to replication latency and (b) proxy caches do absolutely nothing to address.

Twitter took your stance, "oh it's easy just cache the pages, yadda" and realized their error when their servers were being taken down by floods of write requests. Twitter had to rapidly expand their server pool not to accommodate more viewers, but so that their systems could effectively handle bursts of write activity.

For similar reasons, Digg's architecture is going to necessarily be much more complicated than SO's as their activity is heavy on the writes and those writes come in waves and bursts (major events will cause a burst in Digg's traffic while SO will remain relatively consistent in comparison).

..nearly all traffic is focused on new entries and semi-realtime discussion is/was a cornerstone of the site, making caching much more difficult. Stack Overflow gets a lot of traffic to old questions from Google and can do a full static cache of those pages. Digg has to maintain frequent changes to pages that get slammed all at once for a few hours/days then promptly ignored by traffic for other, newer pages.

 

So what are they all talking about?  Essentially the points that I have pulled out are that one cannot compare the scaling needs of Digg to Stack Overflow due to the different nature of request loads each site has and how much of each site's pages can be handled in cache and also that Digg is unique from Stack Overflow in the sense that they have dynamic widgets that are embedded on thousands of sites some with extremely high traffic, where Stack Overflow doesn't have to deal with the load issues of these widgets.

To be fair, I've only pulled out the comments that are in agreement that Stack Overflow and Digg aren't comparable in their scaling needs, if you think these viewpoints are wrong  or disagree, I encourage  you to read and join in on the reddit discussion about this topic

Scaling is a hard and unsolved problem. A few dozen engineers are working very hard to work on a problem that isn't exactly the same for each of them.  I put together this blog post because it's so easy for a person who isn't a tech person to hear statements like this and make blanket opinions based off of some one else's opinion.  

While I'm not a fan of Digg myself, I do find the problems of scaling an absolutely fascinating topic, one that isn't widely understood and often misunderstood.  I've seen so frequently in Social Media people repeat criticisms they've heard in regards to engineering as their own opinions without having a basis of understanding of those criticisms or the oposite side.

Help me kick cancer’s ass!

The time is almost upon us! Starting this Saturday (October 16th) at 8am Pacific time, I will be participating in a 24 hour gaming marathon to raise money for the Children's Hospital & Research Center in Oakland, California.  Below, you can click on the clip to be taken to my donation page.  100% of your donations go directly to the charity and are tax-deductible.  Please consider sparing even as much as a single dollar or two.  

 

You can also join in on the fun by tuning in to my Justin.tv channel, which will be providing a live stream of the gaming awesomeness with myself and team WNOHGB! You can either watch the stream from tia-marie.com or directly on Justin.tv (links provided below).  

 

http://www.tia-marie.com/live.html - Video/Chat Embed 

http://www.justin.tv/tia_marie - Justin.tv Direct link.

I'll also be doing some giveaways during the stream, so don't forget to pop by the channel and see what you can possibly win!

 

Yeow

^ Click me ^

Die Antwoord’s $O$ hit’s Stores Today!

If you read my previous post, you've gotten the full blow by blow on one of Die Antwoord's songs, "Evil Boy." Today I was thoroughly pleased to discover that their entire album can now be purchased for digital download on Amazon and iTunes!

If you liked "Evil Boy" you should certainly pick up a copy of this album, I already have and it's downloading as I write this.

I'd love to hear what you think of the album so leave me a comment or shoot me a tweet to @tia_marie.

 

The Social Commentary behind Die Antwoord’s “Evil Boy”

It’s no secret that I’m particular fond of foreign music. I find myself drawn to genres of music in other regions of the world that I would not enjoy produced in the states.

One song in particular that drew my attention is the song “Evil Boy” by the South African Zef group Die Antwoord (Afrikaans: “The Answer“). The song is written in English, Afrikaans and Xhosa and has a sound similar to American hip hop.

Admittedly, what drew me to the song originally was the total WTF/NSFW nature of the video associated with the song. Dark, grungy and incredibly phallic symbolism is used throughout the entire video. Watching the video for the first time, with no lyrics or context is very confusing. Which is what led me to dig around deeper.

The lyrics have pieces of social commentary about adolescent circumcision within Xhosa tribes as well as a shout out to AIDS awareness campaign from 2002, to encourage condom use across Africa and even a word play joke on the armed wing of the African National Congress.

Before saying much more, I’ll let you watch the video for yourself, pay special attention to the period of 1m24s to 2m10s. This section is sung in Xhosa by Wanga:

 

On first pass, you’re probably very confused and upset that I didn’t mention NSFW about a dozen times. If you read the english lyrics in the section of the video I mentioned, it may on first glance sound as anti-homosexual commentary. However, as the band has explained, these lyrics are in regards to Wanga’s personal experience with his tribe’s ritual circumcision:

So, the story behind this video and song (or part of the story — there’s so much going on!) is that Wanga felt that he was being coerced into a form of ritual circumcision by his community. It’s sort of taken for granted within his ethnic group that you must do this, so much so that if you are a young man and you do not participate, you are ostracized, as the band explained to me.

The thinking, and this is communicated very directly to the young men, is that if you don’t participate, you’re gay. You’re effeminate. You’re not a real man. You never mature from being a boy to being a man.

He struggled with all of this in real life: with what it meant for his personal and cultural identity. And he came to a point where he was like, you know what? Fuck you all. The fact that I won’t consent to having my penis sliced with an unsterilized knife, out in the bush, and risk infection or worse– that doesn’t mean “I’m gay,” as you say. I reject this tradition. If that’s what being a man is, fuck it, I don’t want to be a man. I’ll be an “evil boy for life,” even if it means I am ostracized from my community.

You might have chosen different lyrics, but dude, it’s not our story or our culture or our world experience at all.

It’s his.

It’s gone into more explanation about the environment of ritual circumcision:

We’re not talking about the same thing as what happens in Western countries, with babies in a sterile hospital environment… we’re talking about boys in their late teens going off into the bush with an unsterilized knife and a blanket, no anesthesia, etc. The ritual apparently results in some number of casualties, infections, and permanent (unintended) injury to the teen males who go through the tradition, and some of the kids who are now more urbanized, with access to other ways of thinking, want to opt out. That’s what Wanga’s lyrics are about.

This ritual circumcision is performed in unsanitary conditions, often with the same knives that has caused the spread of STDs and HIV.  Post initiative deaths are extremely high and there is great social stigma associated with the ritual.  There is much attention (but still not enough) towards the practice of female genital mutilation, yet very little attention has been turned to the practice of adolescent ritual circumcision.   

 
rss xboxlive twitter google+ reddit github spotify flickr linkedin youtube foursquare formspring klout