ext_170863 ([identity profile] strawberryfrog.livejournal.com) wrote in [personal profile] freckles_and_doubt 2012-03-31 11:43 pm (UTC)

robots.txt is usually a very simple text file, which lists the parts that search engines shouldn't go into. Google et al *should* honour it, mostly because it lists routes that would waste both parties time and effort (e.g. the site's internal search results pages and "I don't have a page for that" pages. Lack of this could at one stage cause the googlebot to get lost in an endless Borgain library of generated pages).

It shouldn't be tucked away anywhere, it should be at the root of the site. That's where it lives, if present. It isn't a way to hack a site in itself, it's just a text file.

Possibly the hits that you see are googlebots requesting robots.txt and finding nothing. The last internet-facing site that I worked on, I noticed these requests quite soon. So I made a simple robots.txt. But then I had the luxury of erecting "keep out" sign that didn't list any particular routes, just disallowed everything. You *probably* don't want that if it's how people find you. You could put down a simple one that allows everything to be indexed.

I'm not convinced that it gives much away in terms of things to look for - the fact that you're running wordpress gives much more away IMHO. It may be a red herring.

Post a comment in response:

If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting