Google sent me a "what to know in on-site interviews" email. Here it is.

After two phone interviews Google asked me to visit London and have a whole day of chatting about technology and solving intricate coding puzzles. Just to see how good I am on a scale of 1 to Google.

An animation of the quicksort algorithm sortin...

They aldo sent me an email with advice. It can be summed up as "You should know everything. If it's to do with computers, you should know it. Here are 5 books and 4 fancy algorithms you should read. You must also be intimately familiar with all these basic-ish algorithms. This is your two week notice. Good luck. Oh and take a look at these videos too!"

Technical interviews have always been fun for me, I just love talking shop with anyone who would listen and how often do you get the chance to talk shop with some of the [provenly] best engineers out there?

Somehow, instead of being super solid advice, the email did nothing but cement my idea that it's impossible to study up on all this and I should just hope I've got what it takes anyway.

Here's the email, copied verbatim.

Guide to getting an engineering job with Google

Software engineers work in small, rapid teams, each handling the full development cycle without need for hierarchy; product design, system architecture, coding, testing & product launching at global scale. They don't tend to make prototypes; but rather work in realtime, with live feedback from real people across the globe. Engineers don't always get it right first time, but iterate fast in an agile, unit test environment.

Google takes an academic approach to the interviewing process. This means that we’re interested in your thought process, your approach to problem solving as well as your coding abilities.

Technical Interviews

You can expect questions that evaluate your skills in three primary areas:

  1. Coding
  1. Algorithms
  1. System design

Interview Hints

Videos - Interviewing at Google - An inside look at Google - Google Youtube Channel - Underneath the covers at Google - Google i/O 2011

Recommended books

Background knowledge: Background knowledge: The Google infrastructure Google has the world’s most formidable large-scale computing and storage infrastructure. Clusters of Linux-based machines with custom job and storage management systems allows us to build applications that access petabytes of data or process millions of requests a day. We can build such systems because, amongst other things, we have full control over the software stack. Four of the key elements include:

I asked an experienced Google engineer how/why Google uses Big O notation and they gave me this information: Google-scale problems fall into two categories: those that span across machines, across very large amounts of data, and those that supply a large number of users with very quick access to data, such as search.

When we design a solution for either sort of problem, we are deal with large numbers of items, for example a large number of web pages to search, or customer reports to aggregate up. This means that any algorithm that is not at least as good as linear (O(n)) in time is likely to be too slow. User-facing products such as search require even better: that means aiming for constant time operation (O(1)), no matter how large our index, we need to show results in a fixed amount of time. Sometimes that means we scale the system up to cope with the number of Web pages we have to search. Any example of where your understanding of performance profiling would be applicable is now that we have a constant-time solution, we need to minimise the overhead of just one user performing a search; it's no good if one user's query causes 100MB to be allocated on 1000 computers.

Another example would be when we index the Web. This means taking billions of Web pages, computing metrics on each of them to extract the search terms that might trigger them, and then putting them in a searchable index. Adding each page means we might need to compare it with every other page to find the best result for a query, but in fact we can perform the index in linear time with respect to the whole internet by performing the entire computation in parallel. What engineers have done is distilled the problem to an 'embarrassingly parallel' one where as much computation as possible is done on each single page before the more expensive step of comparing it with other pages. This reduces the overall time to index the Web to the extent where we can keep much of the internet searchable even if the page is only 15 minutes old. Here, we have used algorithms and Big O analysis to both decrease latency and increase throughput; performance profiling plays a part but but only once a truly scalable solution has been found.