Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

White House website hanky panky

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Archives » General Discussion (Through 2005) Donate to DU
 
Syrinx Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 03:29 PM
Original message
White House website hanky panky
A robots.txt file is a file that tells spiders, like those used by search engines, what material they are not allowed to download. In other words, it is a file that restricts what material may be indexed by search engines.

The White House robots.txt is over 1,500 lines long.

Below is just a short sample of that file. The whole thing can be seen at http://www.whitehouse.gov/robots.txt .

Disallow: /911/911day/iraq
Disallow: /911/911day/text
Disallow: /911/heroes/iraq
Disallow: /911/heroes/text
Disallow: /911/iraq
Disallow: /911/patriotism/iraq
Disallow: /911/patriotism/text
Disallow: /911/patriotism2/iraq
Disallow: /911/patriotism2/text
Disallow: /911/progress/iraq
Disallow: /911/progress/text
Disallow: /911/remembrance/iraq
Disallow: /911/remembrance/text
Disallow: /911/response/iraq
Disallow: /911/response/text
Disallow: /911/sept112002/iraq
Disallow: /911/sept112002/text
Disallow: /911/text
Disallow: /afac/index.htm/text
Disallow: /afac/iraq
Disallow: /afac/text
Disallow: /agencycontact/iraq
Disallow: /agencycontact/text
Disallow: /appointments/iraq
Disallow: /appointments/text
Disallow: /ask/20030515/iraq
Disallow: /ask/20030515/text
Disallow: /ask/20030520/iraq
Disallow: /ask/20030520/text
Disallow: /ask/20030625/iraq
Disallow: /ask/20030625/text
Disallow: /ask/20030701/iraq
Disallow: /ask/20030701/text
Disallow: /ask/images/iraq
Disallow: /ask/images/text
Disallow: /ask/iraq
Disallow: /ask/print/iraq
Disallow: /ask/print/text
Disallow: /ask/text
Disallow: /ask/video/iraq
Disallow: /ask/video/text
Disallow: /cea/iraq
Disallow: /cea/text
Disallow: /ceq/iraq
Disallow: /ceq/text
Disallow: /climatechangefactsheet/iraq
Disallow: /climatechangefactsheet/text
Printer Friendly | Permalink |  | Top
Loonman Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 03:31 PM
Response to Original message
1. 9/11, Iraq, Climate Change
Sounds about right.
Printer Friendly | Permalink |  | Top
 
trotsky Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 03:33 PM
Response to Original message
2. Would the robots.txt file also exempt content...
...from those "archive" websites, that save historical website content?
Printer Friendly | Permalink |  | Top
 
Syrinx Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 03:35 PM
Response to Reply #2
4. Yes, I think so.
Printer Friendly | Permalink |  | Top
 
Name removed Donating Member (0 posts) Send PM | Profile | Ignore Tue Oct-28-03 03:33 PM
Response to Original message
3. Deleted message
Message removed by moderator. Click here to review the message board rules.
 
Ficus Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 03:44 PM
Response to Original message
5. my favorites
Disallow: /president/september11/iraq
Disallow: /stateoftheunion/2002/text


muhahahahahaha

:dem: :dem:
Printer Friendly | Permalink |  | Top
 
StClone Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 03:54 PM
Response to Original message
6. Bush isn't respomsible for this
Edited on Tue Oct-28-03 03:57 PM by StClone
This is scary.

Bush has an awareness of the protective wall of secrecy built by his autonomous minions. The secrecy has been built to keep us Americans from the truth because it is so horrible to him.
Printer Friendly | Permalink |  | Top
 
BlueEyedSon Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 03:55 PM
Response to Original message
7. *kick*
nt
Printer Friendly | Permalink |  | Top
 
DBoon Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 04:12 PM
Response to Original message
8. In Slashdot
Printer Friendly | Permalink |  | Top
 
Syrinx Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 04:17 PM
Response to Reply #8
9. thanks
I go to Slashdot most days, but somehow missed this one.
Printer Friendly | Permalink |  | Top
 
BurtWorm Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 04:22 PM
Response to Original message
10. Time to Google some dates
Disallow: /ask/20030515/iraq
Disallow: /ask/20030515/text
Disallow: /ask/20030520/iraq
Disallow: /ask/20030520/text
Disallow: /ask/20030625/iraq
Disallow: /ask/20030625/text
Disallow: /ask/20030701/iraq
Disallow: /ask/20030701/text
Printer Friendly | Permalink |  | Top
 
Dude_CalmDown Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 04:42 PM
Response to Original message
11. epa.gov - 864 lines
Disallow: /oppbppd1/biopesticides/factsheets/
Disallow: /oppecumm/
Disallow: /CumulativeExposure/
Disallow: /Cumulativeexposure/
Disallow: /cumulativeexposure/
Disallow: /oppefed1/ecorisk/old/
Disallow: /oppeoee1/globalwarming/climate/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse1/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse10/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse11/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse12/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse14/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse15/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse2/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse3/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse4/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse5/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse6/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse7/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse8/images/
Disallow: /oppeoee1/globalwarming/greenhouse/greenhouse9/images/
Disallow: /oppeoee1/globalwarming/greenhouse/images/
Disallow: /oppeoee1/globalwarming/images/
Disallow: /oppeoee1/globalwarming/impacts/images/
Disallow: /oppeoee1/globalwarming/kids/images/
Disallow: /oppeoee1/globalwarming/resources/
Disallow: /oppeoee1/globalwarming/transfer/

More - http://www.epa.gov/robots.txt
Printer Friendly | Permalink |  | Top
 
benburch Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 04:47 PM
Response to Original message
12. robots.txt is a SUGGESTION only...
So, that content is probably indexed somewhere by somebody... But most of the top name search engines WILL respect robots.txt.

-Ben
Printer Friendly | Permalink |  | Top
 
electricmonk Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 04:54 PM
Response to Original message
13. What I don't get
Is why a government website that we pay for is blocking any search engine or spider from indexing or archiving. I can understand commercial websites wanting to block them it can use up a lot of bandwidth if you have a bunch of pages and robot comes through and caches everything. It seems to me though for a government website to do it would be some sort of FOI violation.
Printer Friendly | Permalink |  | Top
 
Selwynn Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 04:59 PM
Response to Original message
14. DU has their own disallow list - what are they hiding? (WH list legit?)
This is an email from my best friend, which he send in response to my sending him this link.

Before you flame him, I'd like to point out he is a die hard democrat but he's also a man of science and looks at things skeptically:

---
Well, a couple of things about this.

1. There's a lot of reasons to use the REP (Robot Exclusion Protocol). Hiding information is one, but there are several legitimate reasons to use it as well. One is keeping robots from cataloging or persisting obsolete or even removed URL's. Another big one is site management. A large website like whitehouse.gov wants to be able to manage itself and manage how users access portions of the web site. If you go to the main whitehouse.gov site, at the top right is a text box you can use to search. Put in "iraq" and hit Search. You'll get back 1600-some results, and if you scrolled through there you would probably find most of these "disallowed" url-bits. Why? The whitehouse, which believe me is a closely monitored web site, probably doesn't want users hopping in and out on random information theyf ind from external search engines.

I don't immediately get suspicious when websites use filtering for search robots. Not a problem. What is interesting about this particular robots.txt file is that everything that is being filtered uses the same format. It is odd, no doubt about that. , do you have any context around this find? Did somebody have any more information about the format of whitehouse.gov URL's and why these would be filtered specifically?


2. I could be wrong, but I am not aware of any federal imperative to make all information that is technically "public information" available to the public via the Web. So you said, because the information is public they can't just yank it from the web site. But why not? Is there an imperative in place that requires the Feds to keep any and all "once-made-public" information always public? And via the web? If not, then I suspect that there are reasons

3. Check out the covert action on the following website. They have a whole "archive" directory that is inaccessible to search engines. What kind of secrets could it contain?
http://www.democraticunderground.com/robots.txt

And lastly, , you know I'm just giving you shit. :-)
Printer Friendly | Permalink |  | Top
 
Dude_CalmDown Donating Member (1000+ posts) Send PM | Profile | Ignore Tue Oct-28-03 05:20 PM
Response to Reply #14
15. I agree with you're friend
If they really wanted something hidden it would be. Having a "robots" list is not at all uncommon. I really only find it interesting if there is one seriously extensive list. A few key words could filter out a lot of unnecessary traffic so why go to such trouble to make such a detailed list - especially since we pay for that site.
Printer Friendly | Permalink |  | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Sun May 05th 2024, 10:21 AM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » Archives » General Discussion (Through 2005) Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC