A patent granted to Google today explores Web spam and the manipulation of documents and links on the Web. It describes how the rankings of pages may be influenced if they are identified as “manipulative.”
The identification of manipulative documents, how they might be grouped together, and how they could be treated by the search engine is described in some detail. That treatment might include removal of pages from the search index, reductions in rankings for pages, and possibly a change in how quality scores (PageRank) are calculated for links from manipulative pages.
The patent was filed almost 4 years ago, on December 10, 2003, and wasn’t granted until today.
A good number of papers and patent applications have been published since then on Web spam, and have explored more detailed approaches, but this patent is interesting in that captures some aspects of how Google may have been detecting and fighting Web spam over the past few years (and may still be).