When I first started at Cornell, I took the Advanced Systems class taught by Fred Schneider. To this day, I think that was the best introductory graduate course to Computer Science Systems research. The course covered a broad spectrum of systems research topics including: specification, operating systems, virtual machines, distributed systems, and much more. I still refer to the course’s reading list every now and then and I highly recommend it to anyone interested in Computer Systems.
Updates from hussam RSS Toggle Comment Threads | Keyboard Shortcuts
I accidentally stumbled upon a nice CS academic blog by Henry Robinson at Cloudera. Unfortunately the blog does not seem to be maintained any more is on temporary hiatus, but it does have many interesting articles. Here, mostly for my future reference, are some that I found particularly interesting:
- A Brief Tour of FLP Impossibility
- Consistency and availability in Amazon’s Dynamo
- Consensus Protocols: Two-Phase Commit
- Consensus Protocols: Three-Phase Commit
- Consensus with lossy links: Establishing a TCP connection
- Consensus Protocols: Paxos
- Consensus Protocols: A Paxos Implementation
- Barbara Liskov’s Turing Award, and Byzantine Fault Tolerance
- CAP confusion: Problems with Partition Tolerance
- The Theorem That Will Not Go Away
Anyway, I hope Henry goes back to blogging.
Today Google accused Microsoft Bing of stealing their search results. The Bing team of course denies this. To prove this, Google set up a sting operation where they hardwired their search engine to return unrelated links for gibberish search queries that normally return no results on either Bing or Google. Then, according to their blog post: [emphasis added by me]
We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.
We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted. We were surprised that within a couple weeks of starting this experiment, our inserted results started appearing in Bing.
The Bing team says they rely on thousands of inputs to compute search results, one of them being the clickstream obtained from users who choose to share their web usage experience. To me this sounds legitimate. IE 8 has a feature that detects if you’ve entered a search query into a search engine; I assume so because the browser search box will be filled with the search query even if you entered it into the search box on the web page. So if a user chooses to send web usage feedback, then I would assume it is ok to return information such as “when the user searched for XYZ, he first went to this site, then clicked on that section and that section.” Of course, without revealing any personal information.
Learning relevant links, and re-ranking search results, based on what users click on is already done behind the scenes by *all* search engines. Google determines the order of search results partly based on which links users tended to click on; a search result that is posted near the bottom of the list but receives many clicks from users will surely make its way higher up the list. Using clickstreams is just like that, but instead of learning from people who click on your search results, you learn from people who opt-in to send you usage feedback. So why is Google surprised that when they opt-in to send Bing browsing feedback data then it will be used to improve search results?
What really disappoints me is that I believe that Google’s post is nothing but a PR stunt. Techies would know that fuss is about nothing, but the mainstream media and average consumers would read their post, take it at face value, and brand Bing as a “cheap imitation” as Google claims.
I really thought Google was above this sort of cheap PR stunts, and that’s why I am disappointed.
What constitutes stealing search results?
Since Google brought up the subject, it is worth pondering; what would actually constitute stealing search results? If a search engine actively uses Google’s search results and displays them as its own, then that is stealing. If the IE and the Bing Toolbar returned the search results that Google displayed back to Microsoft, then that is also stealing. Clearly, Bing wasn’t doing any of that.