Link Spam Detection Based on Mass Estimation
Due to the high cost of producing quality information versus the profitability and scalability of spam, most pages on the web are spam. No matter what you do, if you run a quality website, you are going to have some spammy websites link to you and/or steal your content.
The best way to fight this off is not to spend lots of time worrying about spammy links, but to spend the extra time to build some links that could be trusted to offset the effects of spammy links.
Algorithms like the spam mass estimation research are going to be based on relative size. Since quality links typically have more PageRank (or authority by whatever measure they chose to use) than most spam links, you can probably get away with having 40 or 50 spammy links for every real, quality link.
Another interesting bit mentioned in the research paper was that generally the web follows power laws. This quote might be as clear as mud, so I will clarify it shortly.
A number of recent publications propose link spam detection methods. For instance, Fetterly et al. [Fetterly et al., 2004] analyze the indegree and outdegree distributions of web pages. Most web pages have in- and outdegrees that follow a power-law distribution. Occasionally, however, search engines encounter substantially more pages with the exact same in- or outdegrees than what is predicted by the distribution formula. The authors find that the vast majority of such outliers are spam pages.
Indegrees and outdegrees above refer to link profiles, specifically to inbound links and outbound links. Most spam generator software and bad spam techniques leave obvious mathematical footprints.
If you are using widely hyped and marketed spam site generator software, most of it is likely going to be quickly discounted by link analysis algorithms since many other people will be creating thousands of similar sites with similar link profiles and similar footprints.
Originally posted 2008-05-28 03:59:12.
Popularity: 3% [?]
