NETSEER, INC.

NETSEER CRAWLER

If you've reached this page, then you've likely been crawled by NetSeer's spider. Thank you very much for letting us crawl your website. However, if our crawl was unwanted, we apologize for the intrusion. Please continue reading for more information on how to limit our spider's access to your website.

About NetSeer

NetSeer, Inc. is a Santa Clara based Internet startup backed by blue chip Northern and Southern California Venture Capitalists. We have developed technologies that will redefine the $20B online advertising industry. After several years of stealth development, the company is poised to launch its service. Additional information regarding our privacy policy can be found http://www.netseer.com/privacypolicy.html

NetSeer's Spider

NetSeer's spider crawls the web by starting from a few well-known entry points and recursively following links. By indexing the visited pages, we build up a body of knowledge that helps us to provide better targeted advertising. We follow all W3 guidelines regarding web crawling. This means that you can prevent our spider from indexing pages that you wish to remain private, or from following links within your website. Instructions below will assist you in configuring your website to prevent spider access. If, for any reason, you believe that our spider is not following W3 guidelines, please contact us at (408)992-5255.

How You Can Help

By allowing NetSeer's spider and other non-malicious spiders to crawl your site, you are already helping to advance the state-of-the-art in Internet services. You are also supporting competition and allowing new ideas to be explored in the world's largest electronic playground. This type of innovation and cooperation is what lead to the creation of the Internet and the World Wide Web. We sincerely hope that you will work with us to help create the Internet of tomorrow!

Customizing Spider Access to Your Website

We will not crawl anything you would like to remain private. By using the Standard for Robot Exclusion (SRE) you can let NetSeer's spider, as well as other spiders, know not to crawl your site. There are two techniques that can be used to customize or otherwise limit spider access to your website, a robots.txt file, and in-page META instructions.

robots.txt : You can place a file named "robots.txt" at the top level of your website, e.g. http://www.mywebsite.com/robots.txt . This file tells crawlers which directories can or cannot be crawled. It is important to note that this filename is case-sensitive. We have configured our spider to be a bit more forgiving, but typically a spider will only respect this file if it is correctly named and formatted.

The crawler looks for a file called "robots.txt". Robots.txt is a file website administrators can place at the top level of a site to direct the behavior of web crawling robots. NetSeer's crawler will always pick up a copy of the robots.txt file prior to its crawl of the Web.

To exclude NetSeer's crawler, the robots.txt file should look like this:

User-agent: netseer
Disallow: /

To exclude just one directory (and its subdirectories), say, the /images/ directory, the file should look like this:

User-agent: netseer
Disallow: /images/

Visit http://www.robotstxt.org/wc/faq.html for more details on how to instruct robots when they visit your site.

Using a META tag: If you cannot create a robots.txt, you can also limit spider behavior through the use of META tags that direct visiting spiders.

Like any META tag, it should be placed in the HEAD section of an HTML page. You should put it in every page on your site, as a robot can encounter a deep link to any page on your site.

The "NAME" attribute must be "ROBOTS".

Valid values for the "CONTENT" attribute are: "INDEX", "NOINDEX", "FOLLOW", "NOFOLLOW". Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is "INDEX,FOLLOW."

<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> (Don't index this page, but follow links)
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW"> (Index this page, but don't follow links)
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> (Don't index, don't follow links)

 

 

For more information, please do not hesitate to contact us at (408)992-5255.