Important notice: This is an individual homework. You should not help classmates with their calculations, nor ask for such help from anyone other than the instructor or TA's. (Exception - it's OK to ask a classmate what the questions mean if it isn't clear. Still, it would be good to check with us anyway.)
NOTE: This homework isn't particularly long or difficult. It's just intended to give you a bit more practice on calculation-type questions. Please work carefully and write up your answer clearly.
Suppose you are in charge of buying new computers for Google. (Google is a very good web-based search engine - check it out at www.google.com). Suppose (and I am completely making this up) that 90% of the queries to Google are "simple queries" and the rest are "advanced". Each simple query requires bringing in 1 MByte of data from disk, each advanced ones require getting 10 MBytes from disk. Google's systems are multiprogrammed, so the disk assesses for one query can completely overlap (i.e. can be pipelined with) the computation for a different query. You will buy enough computers so that whenever a query comes in it can be immediately assigned to a computer for processing.
Your experiments show that it takes 1 second of CPU time for each simple query and 3 seconds for each advanced query. The machines can get data from disk at a rate of 2 MB/sec.
In case you are interested, and I am not making this up, Google currently has about 20,000 (500 MHz) Pentium II processors installed across four sites in the US. They receive over 100 million queries daily. They have indexed over 3 billion web pages, where the typical page is 10 KBytes of data.
And in case you're interested in playing with Google's data, check out their programming contest.
Answers to some frequently asked questions (well, they were asked at least once):
A: No. Assume that each query requires different data. Incidentally, most of the computation is done after the data is read, not before. But that doesn't affect the problem.
A: Yes. When processing billions of queries, the start-up and clean-up time wasted is insignificant anyway.
A: You can do your calculations assuming that they always come in as nine simple queries then one advanced. Of course, that won't actually happen, but since there are thousands of queries coming in each second, the webserver (these are additional computers that interface with the network) can (almost) always ensure that they never send two advanced queries in a row to the same computer for processing. Incidentally, Google actually has about 100 computers that are dedicated webservers, in addition to the 20,000 query processors.
A: Yes.
A: You are correct - I'll get rid of the confusing term.
A: No. Whenever one particular query requests data from disk, the processor suspends the processing of that query and works on a different one.