CSE 141 Turn-in assignment number 2

New computers for Google

Due Monday, Feb 11 in class

Important notice: This is an individual homework. You should not help classmates with their calculations, nor ask for such help from anyone other than the instructor or TA's. (Exception - it's OK to ask a classmate what the questions mean if it isn't clear. Still, it would be good to check with us anyway.)


NOTE: This homework isn't particularly long or difficult. It's just intended to give you a bit more practice on calculation-type questions. Please work carefully and write up your answer clearly.

Suppose you are in charge of buying new computers for Google. (Google is a very good web-based search engine - check it out at www.google.com). Suppose (and I am completely making this up) that 90% of the queries to Google are "simple queries" and the rest are "advanced". Each simple query requires bringing in 1 MByte of data from disk, each advanced ones require getting 10 MBytes from disk. Google's systems are multiprogrammed, so the disk assesses for one query can completely overlap (i.e. can be pipelined with) the computation for a different query. You will buy enough computers so that whenever a query comes in it can be immediately assigned to a computer for processing.

Your experiments show that it takes 1 second of CPU time for each simple query and 3 seconds for each advanced query. The machines can get data from disk at a rate of 2 MB/sec.

  1. What is the average amount of CPU time required per query?
  2. What is the latency of a simple query (taking into account both the CPU and I/O time)? Of an advanced query?
  3. Suppose you were to devote a machine to processing only simple queries. What would its throughput be?
  4. Suppose you devote a machine to processing only advanced queries. What would its throughput be? (Be careful!)
  5. Suppose you divide the machines into two groups: one group will only process simple queries and the other will only process advanced queries. How many computers do you need to buy for each group in order to be able to process 10 million queries per hour, assuming (as always) that 90% of the queries are simple and 10% are advanced?
  6. If instead, you only have one group of computers, and each computer gets an average mix of simple and advanced queries to process, how many computers will you need to buy?

In case you are interested, and I am not making this up, Google currently has about 20,000 (500 MHz) Pentium II processors installed across four sites in the US. They receive over 100 million queries daily. They have indexed over 3 billion web pages, where the typical page is 10 KBytes of data.

And in case you're interested in playing with Google's data, check out their programming contest.

Answers to some frequently asked questions (well, they were asked at least once):