Web robots are programs that make automatic requests to servers.
For example, search engines use robots (which are sometimes known as Web crawlers)
to retrieve pages for inclusion in their search database. You can provide
a robots.txt file to identify URLs that robots are not allowed to visit.
On visiting a Web site, a robot should make a request for the document
robots.txt, using the URL
http://www.example.com/robots.txt
where
www.example.com is the host name for the
site. If you have host names that can be accessed using more than one port
number, robots should request the robots.txt file for each combination of
host name and port number. The policies listed in the file can apply to all
robots, or name specific robots. Disallow statements are used to name URLs
that the robots should not visit. Note that even when you provide a robots.txt
file, any robots which do not comply with the robots exclusion standard might
still access and index your Web pages.
If a Web browser requests a robots.txt
file and you do not provide one, CICS sends an error response to the browser
as follows:
- If you are using the CICS-supplied default analyzer DFHWBAAX, a 404 (Not
Found) response is returned. No CICS message is issued in this situation.
- If you are using the sample analyzer DFHWBADX, or a similar analyzer which
is only able to interpret the URL format that was required before CICS TS
Version 3, the analyzer is likely to misinterpret the path robots.txt as
an incorrectly specified converter program name. In this case, message DFHWB0723
is issued, and a 400 (Bad Request) response is returned to the browser. To
avoid this situation, you can either modify the analyzer program to recognize
the robots.txt request and provide a more suitable error response, or provide
a robots.txt file using a URIMAP definition (which means that
the sample analyzer program is bypassed for these requests).
To provide a robots.txt file for all or some of your host names: