Spamdexing - Wikipedia

Spamdexing - Wikipedia
Jump to content
From Wikipedia, the free encyclopedia
(Redirected from
Link spam
Deliberate manipulation of search engine indexes
This article
needs additional citations for
verification
Please help
improve this article
by
adding citations to reliable sources
. Unsourced material may be challenged and removed.
Find sources:
"Spamdexing"
news
newspapers
books
scholar
JSTOR
February 2021
Learn how and when to remove this message
Spamdexing
(also known as
search engine spam
search engine poisoning
black-hat search engine optimization
search spam
or
web spam
is the deliberate manipulation of
search engine
indexes
. It involves a number of methods, such as
link building
and repeating related or unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.
Spamdexing could be considered to be a part of
search engine optimization
although there are many SEO methods that improve the quality and appearance of the content of web sites and serve content useful to many users.
Overview
edit
Search engines use a variety of
algorithms
to determine relevancy
ranking
. Some of these include determining whether the search term appears in the
body text
or
URL
of a
web page
. Many search engines check for instances of spamdexing and will remove suspect pages from their indexes. Also, search-engine operators can quickly block the results listing from entire websites that use Spamdexing, perhaps in response to user complaints of false matches.
The rise of Spamdexing in the mid-1990s made the leading search engines of the time less useful. Using unethical methods to make websites rank higher in search engine results than they otherwise would is commonly referred to in the SEO (search engine optimization) industry as "
black-hat
" SEO.
These methods are more focused on breaking the search-engine-promotion rules and guidelines. In addition to this, the perpetrators run the risk of their websites being severely penalized by the
Google Panda
and
Google Penguin
search-results ranking algorithms.
Common spamdexing techniques can be classified into two broad classes:
content spam
term spam
) and
link spam
History
edit
The earliest known reference
to the term
spamdexing
is by Eric Convey in his article "Porn sneaks way back on Web",
The Boston Herald
, May 22, 1996, where he said :
The problem arises when site operators load their Web pages with hundreds of extraneous terms so search engines will list them among legitimate addresses.
The process is called "Spamdexing," a combination of
spamming
—the Internet term for sending users unsolicited information—and "
indexing
."
Keyword stuffing had been used in the past to obtain top search engine rankings and visibility for particular phrases.
This method is outdated and adds no value to rankings today. In particular,
Google
no longer gives good rankings to pages employing this technique.
Hiding text from the visitor is done in many different ways. Text colored to blend with the background,
CSS
z-index
positioning to place text underneath an image — and therefore out of view of the visitor — and
CSS
absolute positioning to have the text positioned far from the page center are all common techniques. By 2005, many invisible text techniques were easily detected by major search engines.
citation needed
"Noscript" tags are another way to place hidden content within a page.
10
11
While they are a valid optimization method for displaying an alternative representation of scripted content, they may be abused, since search engines may index content that is invisible to most visitors.
In the past, keyword stuffing was considered to be either a
white hat
or a
black hat
tactic, depending on the context of the technique, and the opinion of the person judging it. While a great deal of keyword stuffing was employed to aid in Spamdexing, which is of little benefit to the user, keyword stuffing in certain circumstances was not intended to skew results in a deceptive manner. Whether the term carries a
pejorative
or neutral
connotation
is dependent on whether the practice is used to pollute the results with pages of little relevance, or to direct traffic to a page of relevance that would have otherwise been de-emphasized due to the search engine's inability to interpret and understand related ideas. This is no longer the case. Search engines now employ themed, related keyword techniques to interpret the intent of the content on a page.
12
13
Content spam
edit
These techniques involve altering the logical view that a search engine has over the page's contents. They all aim at variants of the
vector space model
for information retrieval on text collections.
Keyword stuffing
edit
Keyword stuffing
is a
search engine optimization
(SEO) technique in which keywords are loaded into a web page's
meta tags
, visible content, or
backlink
anchor text
in an attempt to gain an unfair rank advantage in
search engines
. Keyword stuffing may lead to a
website
being temporarily or permanently banned or penalized on major search engines.
14
The repetition of
words
in
meta tags
may explain why many
search engines
no longer use these tags. Nowadays, search engines focus more on the content that is unique, comprehensive, relevant, and helpful that overall makes the quality better which makes keyword stuffing useless, but it is still practiced by many webmasters.
15
16
17
Many major search engines have implemented
algorithms
that recognize keyword stuffing, and reduce or eliminate any unfair search advantage that the tactic may have been intended to gain, and oftentimes they will also penalize, demote or remove websites from their indexes that implement keyword stuffing.
18
19
Changes and algorithms specifically intended to penalize or ban sites using keyword stuffing include the Google Florida update (November 2003)
Google Panda
(February 2011)
20
Google Hummingbird
(August 2013)
21
and
Bing
's September 2014 update.
22
Headlines in online news sites are increasingly packed with just the search-friendly keywords that identify the story. Traditional reporters and editors frown on the practice, but it is effective in optimizing news stories for search.
23
Hidden or invisible text
edit
Unrelated
hidden text
is disguised by making it the same color as the background, using a tiny font size, or hiding it within
HTML
code such as "no frame" sections,
alt attributes
, zero-sized
DIVs
, and "no script" sections. People manually screening red-flagged websites for a search-engine company might temporarily or permanently block an entire website for having invisible text on some of its pages. However, hidden text is not always spamdexing: it can also be used to enhance
24
Meta-tag stuffing
edit
This involves repeating keywords in the
meta tags
, and using meta keywords that are unrelated to the site's content. This tactic has been ineffective. Google declared that it doesn't use the keywords meta tag in its online search ranking in September 2009.
25
Doorway pages
edit
"Gateway" or
doorway pages
are low-quality web pages created with very little content, which are instead stuffed with very similar keywords and phrases. They are designed to rank highly within the search results, but serve no purpose to visitors looking for information. A doorway page will generally have "click here to enter" on the page; autoforwarding can also be used for this purpose. In 2006, Google ousted vehicle manufacturer
BMW
for using "doorway pages" to the company's German site, BMW.de.
26
Google announced that they will inflict a penalty on doorway tactics.
27
Scraper sites
edit
Scraper sites
are created using various programs designed to "scrape" search-engine results pages or other sources of content and create "content" for a website.
citation needed
The specific presentation of content on these sites is unique, but is merely an amalgamation of content taken from other sources, often without permission. Such websites are generally full of advertising (such as
pay-per-click
ads), or they redirect the user to other sites. It is even feasible for scraper sites to outrank original websites for their own information and organization names.
Article spinning
edit
Article spinning
involves rewriting existing articles, as opposed to merely scraping content from other sites, to avoid penalties imposed by search engines for
duplicate content
. This process is undertaken by hired writers
citation needed
or automated using a
thesaurus
database or an
artificial neural network
Machine translation
edit
Similarly to
article spinning
, some sites use
machine translation
to render their content in several languages, with no human editing, resulting in unintelligible texts that nonetheless continue to be indexed by search engines, thereby attracting traffic.
Link spam
edit
Link spam is defined as links between pages that are present for reasons
other than merit.
28
Link spam takes advantage of link-based ranking algorithms, which gives
websites
higher rankings the more other highly ranked websites link to it. These techniques also aim at influencing other link-based ranking techniques such as the
HITS algorithm
citation needed
Link farms
edit
Main article:
Link farm
Link farms are tightly-knit networks of websites that link to each other for the sole purpose of exploiting the search engine ranking algorithms. These are also known facetiously as
mutual admiration societies
29
Use of links farms has greatly reduced with the launch of Google's first Panda Update in February 2011, which introduced significant improvements in its spam-detection algorithm.
Private blog networks
edit
Blog networks
(PBNs) are a group of authoritative websites used as a source of contextual links that point to the owner's main website to achieve higher search engine ranking. Owners of PBN websites use expired domains or
auction domains
that have
backlinks
from high-authority websites. Google targeted and penalized PBN users on several occasions with several massive deindexing campaigns since 2014.
30
Hidden links
edit
Putting
hyperlinks
where visitors will not see them is used to increase
link popularity
. Highlighted link text can help rank a webpage higher for matching that phrase.
Sybil attack
edit
Sybil attack
is the forging of multiple identities for malicious intent, named after the famous
dissociative identity disorder
patient and the book about her that shares her name, "
Sybil
".
31
32
A spammer may create multiple web sites at different
domain names
that all link to each other, such as fake blogs (known as
spam blogs
).
Spam blogs
edit
Main article:
Spam blog
Spam blogs are blogs created solely for commercial promotion and the passage of link authority to target sites. Often these "splogs" are designed in a misleading manner that will give the effect of a legitimate website but upon close inspection will often be written using spinning software or be very poorly written with barely readable content. They are similar in nature to link farms.
33
34
Guest blog spam
edit
Guest blog spam is the process of placing guest blogs on websites for the sole purpose of gaining a link to another website or websites. Unfortunately, these are often confused with legitimate forms of guest blogging with other motives than placing links. This technique was made famous by
Matt Cutts
, who publicly declared "war" against this form of link spam.
35
Buying expired domains
edit
See also:
Domaining
Some link spammers utilize expired domain crawler software or monitor DNS records for domains that will expire soon, then buy them when they expire and replace the pages with links to their pages. However, it is possible but not confirmed that Google resets the link data on expired domains.
citation needed
To maintain all previous Google ranking data for the domain, it is advisable that a buyer grab the domain before it is "dropped".
Some of these techniques may be applied for creating a
Google bomb
—that is, to cooperate with other users to boost the ranking of a particular page for a particular query.
Using world-writable pages
edit
Main article:
Forum spam
Web sites that can be edited by users can be used by spamdexers to insert links to spam sites if the appropriate anti-spam measures are not taken.
Automated
spambots
can rapidly make the user-editable portion of a site unusable. Programmers have developed a variety of automated
spam prevention techniques
to block or at least slow down spambots.
citation needed
Spam in blogs
edit
Main article:
Spam in blogs
Spam in blogs is the placing or
solicitation
of links randomly on other sites, placing a desired keyword into the hyperlinked text of the inbound link. Guest books, forums, blogs, and any site that accepts visitors' comments are particular targets and are often victims of drive-by spamming where automated software creates nonsense posts with links that are usually irrelevant and unwanted.
Comment spam
edit
Comment spam is a form of link spam that has arisen in web pages that allow dynamic user editing such as
wikis
blogs
, and
guestbooks
. It can be problematic because
agents
can be written that automatically randomly select a user edited web page, such as a Wikipedia article, and add spamming links.
36
Wiki spam
edit
Wiki spam is when a spammer uses the open editability of
wiki
systems to place links from the wiki site to the spam site.
Referrer log spamming
edit
Referrer spam
takes place when a spam perpetrator or facilitator accesses a
web page
(the
referee
), by following a link from another web page (the
referrer
), so that the referee is given the address of the referrer by the person's web browser.
citation needed
Some
websites
have a referrer log which shows which pages link to that site. By having a
robot
randomly access many sites enough times, with a message or specific address given as the referrer, that message or Internet address then appears in the referrer log of those sites that have referrer logs. Since some
Web search engines
base the importance of sites on the number of different sites linking to them, referrer-log spam may increase the search engine rankings of the spammer's sites. Also, site administrators who notice the referrer log entries in their logs may follow the link back to the spammer's referrer page.
citation needed
Countermeasures
edit
Because of the large amount of spam posted to user-editable
webpages
, Google proposed a "nofollow" tag that could be embedded with links.
37
A link-based search engine, such as Google's
PageRank
system, will not use the link to increase the score of the linked website if the link carries a nofollow tag.
38
This ensures that spamming links to user-editable websites will not raise the sites ranking with search engines. Nofollow is used by several websites, such as
Wordpress
39
Blogger
and
Wikipedia
40
Other types
edit
Mirror websites
edit
mirror site
is the hosting of multiple websites with conceptually similar content but using different
URLs
. Some search engines give a higher rank to results where the keyword searched for appears in the URL.
URL redirection
edit
URL redirection
is the taking of the user to another page without their intervention,
e.g.
, using
META refresh
tags,
Flash
JavaScript
Java
or
Server side redirects
. However,
301 Redirect
, or permanent redirect, is not considered as a malicious behavior.
Cloaking
edit
Cloaking
refers to any of several means to serve a page to the search-engine
spider
that is different from that seen by human users. It can be an attempt to mislead search engines regarding the content on a particular web site. Cloaking, however, can also be used to ethically increase accessibility of a site to users with disabilities or provide human users with content that search engines aren't able to process or parse. It is also used to deliver content based on a user's location; Google itself uses
IP delivery
, a form of cloaking, to deliver results. Another form of cloaking is
code swapping
i.e.
, optimizing a page for top ranking and then swapping another page in its place once a top ranking is achieved. Google refers to these types of redirects as
Sneaky Redirects
41
Countermeasures
edit
This section
needs expansion
. You can help by
adding missing information
October 2017
Page omission by search engine
edit
Spamdexed pages are sometimes eliminated from search results by the search engine.
42
Page omission by user
edit
Users can employ search operators for filtering. For Google, a keyword preceded by "-" (minus) will omit sites that contains the keyword in their pages or in the URL of the pages from search result. As an example, the search "-" will eliminate sites that contains word "" in their pages and the pages whose URL contains "".
Users could also use the
Google Chrome
extension "Personal Blocklist (by Google)", launched by Google in 2011 as part of countermeasures against
content farming
43
Via the extension, users could block a specific page, or set of pages from appearing in their search results. As of 2021, the original extension appears to be removed, although similar-functioning extensions may be used.
Possible solutions to overcome search-redirection poisoning redirecting to illegal
internet pharmacies
include notification of operators of vulnerable legitimate domains. Further, manual evaluation of SERPs, previously published link-based and content-based algorithms as well as tailor-made automatic detection and classification engines can be used as benchmarks in the effective identification of pharma scam campaigns.
44
See also
edit
Adversarial information retrieval
– Information retrieval strategies in datasets
Cloaking
– Search engine optimization technique
Content farm
– Organization that creates web content optimised for views
Doorway page
– Misleading web page
Hidden text
– Invisible text on a computer display
Search engine indexing
– Method for data management – overview of search engine indexing technology
Link farm
– Group of websites that link to each other
TrustRank
– Link analysis algorithm
Web scraping
– Method of extracting data from websites
Microsoft SmartScreen
– Microsoft Windows anti-malware system
Microsoft Defender Antivirus
– Anti-malware software
Scraper site
– Website which copies content from others
Trademark stuffing
White fonting
– Hiding text in a document
References
edit
SearchEngineLand
, Danny Sullivan's video explanation of Search Engine Spam, October 2008
Archived
2008-12-17 at the
Wayback Machine
"Google Search Central"
. 2023-02-23.
. Retrieved 2023-5-16.
"Word Spy - spamdexing" (definition), March 2003, webpage:
WordSpy-spamdexing
Archived
2014-07-18 at the
Wayback Machine
Gyöngyi, Zoltán
Garcia-Molina, Hector
(2005),
"Web spam taxonomy"
(PDF)
Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2005 in The 14th International World Wide Web Conference (WWW 2005) May 10, (Tue)-14 (Sat), 2005, Nippon Convention Center (Makuhari Messe), Chiba, Japan.
, New York, NY: ACM Press,
ISBN
1-59593-046-9
archived
(PDF)
from the original on 2020-02-15
, retrieved
2007-10-05
Zuze, Herbert; Weideman, Melius (2013-04-12).
"Keyword stuffing and the big three search engines"
Online Information Review
37
(2):
268–
286.
doi
10.1108/OIR-11-2011-0193
ISSN
1468-4527
Ntoulas, Alexandros
Manasse, Mark
Najork, Marc
Fetterly, Dennis
(2006), "Detecting Spam Web Pages through Content Analysis",
The 15th International World Wide Web Conference (WWW 2006) May 23–26, 2006, Edinburgh, Scotland.
, New York, NY: ACM Press,
ISBN
1-59593-323-9
Egele, Manuel; Kolbitsch, Clemens (August 22, 2009).
"Removing web spam links from search engine results"
Journal in Computer Virology
51–
62.
doi
10.1007/s11416-009-0132-6
. Retrieved
August 6,
2025
"SEO basics: what is black hat SEO?"
IONOS Digitalguide
. 23 May 2017
. Retrieved
2022-08-22
Smarty, Ann (2008-12-17).
"What Is BlackHat SEO? 5 Definitions"
Search Engine Journal
Archived
from the original on 2012-06-21
. Retrieved
2012-07-05
Whalen, Jill (July 12, 2007).
"Keyword Stuffing Is Gross And Disgusting!"
Search Engine Land
. Retrieved
August 6,
2025
"18 Scripts"
W3C
. Retrieved
August 6,
2025
"