Enterprise Information Portal APIs

com.ibm.gcs.component.config
Class CrawlPattern

java.lang.Object
  |
  +--com.ibm.gcs.component.config.CrawlPattern

public class CrawlPattern
extends java.lang.Object

This part of a Group Config represents a pattern of URLs that should be crawled. It has five sections: a recursion-depth, an array of URL seeds, an array of type include patterns, an array of exclude patterns, and an array of include patterns. It is constructed from the crawl-pattern element in the config file.

See Also:
URLSeed, URLExcIncPattern, Group, Config

Method Summary
 void addIncludePattern(java.lang.String pattern)
          adds an inlcude pattern to this crawl pattern
 void addSeed(java.lang.String url)
          adds a seed url to this crawl pattern
 java.net.PasswordAuthentication getAuthentication(java.net.URL u)
          returns authentication for the URL if the URL has a path at or deeper than the depth of the last slash in the path field of a seed URL and if the seed URL has authentication information specified.
 URLNamePattern[] getContentTypePatterns()
          returns an array of contentType patterns
 URLExcIncPattern[] getExcludePatterns()
          returns an array of exclude patterns
 Group getGroup()
          returns the Group that owns this URL pattern
 URLExcIncPattern[] getIncludePatterns()
          returns an array of include patterns
 int getIndex()
           
 int getRecursionDepth()
          returns the how deep the crawler should recursively follow links
 URLSeed[] getURLSeeds()
          returns the how deep the crawler should recursively follow links
 boolean hasContentTypePatternList()
          returns true if this URL pattern has a contentType pattern list
 boolean hasExcludePatternList()
          returns true if this URL pattern has an exclude pattern list
 boolean hasIncludePatternList()
          returns true if this URL pattern has an include pattern list
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getIndex

public int getIndex()
Returns:
int The index of the pattern in the group.

addSeed

public void addSeed(java.lang.String url)
             throws ConfigException
adds a seed url to this crawl pattern

addIncludePattern

public void addIncludePattern(java.lang.String pattern)
                       throws ConfigException
adds an inlcude pattern to this crawl pattern

getGroup

public Group getGroup()
returns the Group that owns this URL pattern

getRecursionDepth

public int getRecursionDepth()
returns the how deep the crawler should recursively follow links

getURLSeeds

public URLSeed[] getURLSeeds()
returns the how deep the crawler should recursively follow links

getContentTypePatterns

public URLNamePattern[] getContentTypePatterns()
returns an array of contentType patterns

hasContentTypePatternList

public boolean hasContentTypePatternList()
returns true if this URL pattern has a contentType pattern list

getExcludePatterns

public URLExcIncPattern[] getExcludePatterns()
returns an array of exclude patterns

hasExcludePatternList

public boolean hasExcludePatternList()
returns true if this URL pattern has an exclude pattern list

getIncludePatterns

public URLExcIncPattern[] getIncludePatterns()
returns an array of include patterns

hasIncludePatternList

public boolean hasIncludePatternList()
returns true if this URL pattern has an include pattern list

getAuthentication

public java.net.PasswordAuthentication getAuthentication(java.net.URL u)
returns authentication for the URL if the URL has a path at or deeper than the depth of the last slash in the path field of a seed URL and if the seed URL has authentication information specified. Otherwise returns null.

EIP Web Crawler APIs

(c) Copyright International Business Machines Corporation 1996, 2002. IBM Corp. All rights reserved.