Cloaking

Google imposes severe penalties on certain SEO techniques, especially those that disrupt the market. To be considered White Hat, all disruptive SEO methods must comply with Google’s policies and guidelines. Despite their effectiveness, some techniques are deemed unacceptable by Google. One such technique is cloaking. Cloaking involves presenting different content to users and bots (Google search crawlers). This method manipulates page rankings by showing keyword-rich content to Google bots while delivering user-centric content to visitors. It is a form of spamdexing, combining spamming and indexing.

### Cloaking: An SEO Technique to Avoid

Google imposes severe penalties on certain SEO techniques, especially those that disrupt the market. To be considered White Hat, all disruptive SEO methods must comply with Google’s policies and guidelines. Despite their effectiveness, some techniques are deemed unacceptable by Google. One such technique is cloaking.

Cloaking involves presenting different content to users and bots (Google search crawlers). This method manipulates page rankings by showing keyword-rich content to Google bots while delivering user-centric content to visitors. It is a form of spamdexing, combining spamming and indexing.

According to Wikipedia:

“Cloaking is an SEO technique where the content presented to the search engine spider is different from what is shown to the user’s browser. This is done by delivering content based on the IP addresses or the User-Agent HTTP header of the user requesting the page. When a user is identified as a search engine spider, a server-side script delivers a different version of the web page, containing content not present on the visible page, or present but not searchable. The purpose of cloaking is sometimes to deceive search engines so they display the page when it would not otherwise be displayed (black hat SEO).” Cloaking is the highest degree of web spamming. Najork defined “Web spam refers to a host of techniques to subvert the ranking algorithms of web search engines and cause them to rank search results higher than they would otherwise.”

### Types of Cloaking

#### User-Agent Cloaking

A user agent is a software program that helps users retrieve website information on their operating system. When a query is entered, the browser sends a cloaking script or code to the server. This code differentiates between user agents searching for the webpage. If the user is a Google bot, a specific page rich in keywords is served to enhance the page’s rank. If it is a regular visitor, a different page is served. In this type, crawlers are identified by examining the user agent header field of the HTTP request.

#### IP Cloaking

Different pages are served to users based on their IP address. If the IP addresses are associated with Google search crawlers, a keyword-rich page is served to improve ranks. If the user’s IP address is different, another page is served.

### Verifying Googlebot Requests

According to Google Search Central:

“You can verify if a web crawler accessing your server is really Googlebot (or another Google user agent). This is useful if you’re concerned that spammers or other troublemakers are accessing your site while claiming to be Googlebot. Google doesn’t post a public list of IP addresses for website owners to whitelist because these IP address ranges can change, causing problems for any website owners who have hard-coded them. Therefore, you must run a DNS lookup as described next.

To verify Googlebot as the caller:

1. Run a reverse DNS lookup on the accessing IP address from your logs using the host command.

2. Verify that the domain name is either googlebot.com or google.com.

3. Run a forward DNS lookup on the domain name retrieved in step 1 using the host command on the retrieved domain name. Verify that it’s the same as the original accessing IP address from your logs.”

### Advantages of IP Cloaking

IP cloaking can help keep competitors at bay when they are researching your website for prices or other USPs. The process is simple: identify the IPs your competitor uses to track you and serve a different page to them.

### Other Types of Cloaking

#### JavaScript Cloaking

JavaScript-enabled browsers of users are served one page, and those without JavaScript enabled are served different pages.

#### Hidden Texts or Links

Another cloaking method is not serving different pages to users and bots but hiding the text. Text, images, or videos are made invisible to users while remaining on the page and readable by Google bots.

### Google’s Stance on Hidden Text and Links

Google Search Central states that hiding text or links to manipulate PageRank algorithms violates Google’s guidelines. Techniques include:

– Using white text on a white background

– Placing text behind an image

– Using CSS to position text off-screen

– Setting the font size to 0

– Hiding a link by linking only one small character (e.g., a hyphen in a paragraph)

In the past, some website owners benefited by hiding text when Google used a simple text-matching algorithm to rank, but not anymore. Google evaluates sites for hidden text or links by looking for anything not easily viewable by visitors.

### Accessibility and Non-Deceptive Hidden Text

Not all hidden text is deceptive. For example, if your site includes technologies like JavaScript, images, or Flash files, using descriptive text for these items can improve site accessibility. Test your site’s accessibility by turning off JavaScript, Flash, and images in your browser or using a text-only browser like Lynx.

#### Tips to Make Your Site Accessible:

Images: Use the alt attribute to provide descriptive text explaining the image.

JavaScript: Place the same content in a `<noscript>` tag. Ensure that the content in the `<noscript>` tag matches the JavaScript content.

Videos: Include descriptive text about the video in HTML and consider providing transcripts.

### Detecting Cloaking

Najork suggested a technique to identify cloaked web servers by fetching the first object from a webpage or site with a crawler request and a second snapshot with a random browser request. If both objects do not match, the server is considered cloaked. This method requires minimal copies to determine cloaking but sometimes falsely identifies dynamic pages/sites or frequently updated pages/sites as cloaked.

### Detection Using HITS and HOTs

In this method, three copies of the URL are collected. Lexical terms are removed, and other words are recorded once, regardless of frequency. The difference between the crawler copies’ terms (TCC) and the browser copies’ terms (TBC) is recorded. If TBC > TCC and the difference exceeds a threshold, the page is marked as cloaked. Many other methods exist to detect cloaking, but none can guarantee Google’s rules, algorithms, or methods for identification.

### Dynamic Serving vs. Cloaking

Dynamic serving is not considered cloaking.