Yesterday, Tim Converse wrote a very interesting article discussing the “>challenges of discerning the difference between quality aggregation of content and spam. This task can be a major challenge for search engines – what baseline decides the difference between a resource like Google News and your average feed scraper?
Google news provides excellent, high-quality results and has stringent requirements for news providers. Average feed scrapers scrape, well, whatever they can find. But can an algorithm tell the difference?
Tim notes, in particular, the interesting recursive nature of searching aggregators. Since many aggregators are scraping results from other search engines, it’s not impossible to have some very complex results.
As an in-between case ask yourself this: if you’re doing a websearch (on Google, Yahoo!, MSN, …) do you want any of the results to be … search-result pages themselves (from Google, Yahoo!, MSN)? That is, if you search for “snorklewacker” on MSN web search, and you click on result #4, do you want to find yourself looking at a websearch results page for “snorklewacker” on Yahoo! Search, which in turn has (as result #3) the Google search results page for “snorklewacker”?
Altogether, an interesting question – no really conclusive answers, however.