Enterprise Information Portal APIs

com.ibm.gcs.db.component
Class DB2URLContainer

java.lang.Object
  |
  +--com.ibm.gcs.urlpool.URLContainer
        |
        +--com.ibm.gcs.urlpool.MemoryURLContainer
              |
              +--com.ibm.gcs.db.component.DB2URLContainer
All Implemented Interfaces:
java.lang.Cloneable, java.io.Serializable

public class DB2URLContainer
extends com.ibm.gcs.urlpool.MemoryURLContainer

A DB2URLContainer object provides access to the crawl information for a URL which is stored in DB2 relations. Each DB2URLContainer is composed of a

  1. DB2URLRow object which accesses the table URLCRAWLTABLE, and
  2. DB2AnnotationsList object which accesses the table LINKS_TABLE.
Except for setState(), the public methods provided by this class perform read-only operations on the DB2 tables. All the set methods store the modifications in memory. These modifications are only written into the database when other package members explicitly save the DB2URLRow or DB2AnnotationsList objects.

Note: In addition to the get and set methods specified by the interface, this class also provides corresponding get and set methods that take a Transaction object. This object is used to execute any necessary SQL inserts, updates, and queries. The corresponding methods, which do not take a Transaction object, each create a new Transaction object, call the instance method with the object, and commit the transaction. Any classes that use a Transaction object explicitly should pass the Transaction to DB2URLContainer's methods explicitly in order to avoid Transaction deadlock.

See Also:
Serialized Form

Constructor Summary
DB2URLContainer()
          Default constructor called by URLContainerFactory.
DB2URLContainer(java.lang.String urlString)
           
DB2URLContainer(java.lang.String urlString, com.ibm.gcs.urlpool.Annotation ann)
           
DB2URLContainer(java.lang.String urlString, CrawlPattern urlPTree)
           
DB2URLContainer(java.lang.String urlString, CrawlPattern urlPTree, com.ibm.gcs.urlpool.Annotation ann)
           
DB2URLContainer(java.lang.String urlString, CrawlPattern urlPTree, java.lang.Boolean isSeed)
          constructor
DB2URLContainer(java.lang.String urlString, com.ibm.gcs.urlpool.URLContainer parentUrlC)
           
DB2URLContainer(java.lang.String urlString, com.ibm.gcs.urlpool.URLContainer parentUrlC, com.ibm.gcs.urlpool.Annotation ann)
           
DB2URLContainer(java.lang.String urlString, com.ibm.gcs.urlpool.URLContainer parentUrlC, CrawlPattern urlPTree)
           
DB2URLContainer(java.lang.String urlString, com.ibm.gcs.urlpool.URLContainer parentUrlC, CrawlPattern urlPTree, com.ibm.gcs.urlpool.Annotation ann)
           
 
Method Summary
 void addAnnotation(com.ibm.gcs.urlpool.Annotation ann)
          Add an annotation for this urlC.
 void addAnnotation(com.ibm.gcs.urlpool.Annotation ann, com.ibm.gcs.urlpool.URLContainer urlC)
          Add an annotation for this urlC.
 void addAnnotations(com.ibm.gcs.urlpool.Annotation[] anns)
          Add a set annotations for this urlC.
 java.util.Enumeration getAnnotationEnums()
          Gets the list of all annotations for this urlC as an enumeration object.
 com.ibm.gcs.urlpool.Annotation[] getAnnotations()
          Gets the list of all annotations for this urlC.
 com.ibm.gcs.urlpool.Annotation[] getAnnotations(Transaction t)
          Gets the list of all annotations for this urlC as an array.
 CrawlPattern getCrawlPattern()
          Gets the associated URL pattern tree that says how to traverse and summarize this URL.
 CrawlPattern getCrawlPattern(Transaction t)
          Gets the associated URL pattern tree that says how to traverse and summarize this URL.
 int getDepth()
          Get the recursion depth of the URL.
 int getDepth(Transaction t)
          If data has to be loaded from/written to the database, executes the SQL query using the transaction object.
 com.ibm.gcs.resourcepool.ResourceCollection getDSC()
          Returns the ResourceCollection that this URLContainer represents.
 boolean getHide(Transaction t)
          Get the hide flag for the URLContainer.
 int getPriority(Transaction t)
          Get the priority of this container for a Priority Crawl
 com.ibm.gcs.urlpool.URLState getState()
          Get the state of this url container.
 com.ibm.gcs.urlpool.URLState getState(Transaction t)
          Get the state of this url container.
 java.lang.String getURLString()
          Return the url string.
 boolean hasAnnotations()
          Returns true if the urlC has annotations, false otherwise.
 boolean isSeed()
          Returns true if it is a seed url
 void setCrawlPattern(CrawlPattern crawlPattern)
          Sets the associated URL pattern tree that says how to traverse and summarize this URL.
 void setCrawlPattern(CrawlPattern crawlPattern, PreparedTransaction t)
          Sets the associated URL crawl pattern that says how to traverse and summarize this URL.
 void setDepth(int depth)
          Set the recursion depth of the URL.
 void setHide(boolean hidden)
          Set the hide flag for the URLContainer.
 void setPriority(int p)
          Set the priority of this container for a Priority Crawl
 void setState(com.ibm.gcs.urlpool.URLState state)
          Set the state of this url container.
 void setState(com.ibm.gcs.urlpool.URLState state, Transaction t)
          Set the state of this url container.
 void setURLString(java.lang.String urlString)
          Set the URLString for the container.
 
Methods inherited from class com.ibm.gcs.urlpool.URLContainer
getUniqueName, setUniqueName
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DB2URLContainer

public DB2URLContainer()
Default constructor called by URLContainerFactory.

DB2URLContainer

public DB2URLContainer(java.lang.String urlString)
Parameters:
urlString - The url string identifying this object.

DB2URLContainer

public DB2URLContainer(java.lang.String urlString,
                       com.ibm.gcs.urlpool.Annotation ann)
Parameters:
urlString - The url string identifying this object..
ann -  

DB2URLContainer

public DB2URLContainer(java.lang.String urlString,
                       com.ibm.gcs.urlpool.URLContainer parentUrlC)
Parameters:
urlString - The url string identifying this object..
parentUrlC - A url container pointing to this one.

DB2URLContainer

public DB2URLContainer(java.lang.String urlString,
                       com.ibm.gcs.urlpool.URLContainer parentUrlC,
                       com.ibm.gcs.urlpool.Annotation ann)
Parameters:
urlString - The url string identifying this object..
parentUrlC - A url container pointing to this one.
ann - The annotation by the parentUrlC

DB2URLContainer

public DB2URLContainer(java.lang.String urlString,
                       CrawlPattern urlPTree)
Parameters:
urlString - The url string identifying this object..
urlPTree - The url patten tree.

DB2URLContainer

public DB2URLContainer(java.lang.String urlString,
                       com.ibm.gcs.urlpool.URLContainer parentUrlC,
                       CrawlPattern urlPTree)
Parameters:
urlString - The url string identifying this object..
urlPTree - The url patten tree.
parentUrlC - A url container pointing to this one.

DB2URLContainer

public DB2URLContainer(java.lang.String urlString,
                       com.ibm.gcs.urlpool.URLContainer parentUrlC,
                       CrawlPattern urlPTree,
                       com.ibm.gcs.urlpool.Annotation ann)
Parameters:
urlString - The url string identifying this object..
urlPTree - The url patten tree.
parentUrlC - A url container pointing to this one.
ann - The annotation by the parentUrlC about this urlC

DB2URLContainer

public DB2URLContainer(java.lang.String urlString,
                       CrawlPattern urlPTree,
                       com.ibm.gcs.urlpool.Annotation ann)
Parameters:
urlString - The url string identifying this object..
urlPTree - The url patten tree.
ann - An annnotation about this urlC.

DB2URLContainer

public DB2URLContainer(java.lang.String urlString,
                       CrawlPattern urlPTree,
                       java.lang.Boolean isSeed)
constructor
Parameters:
urlString - a URL string
urlPTree - a configuration structure that says how to traverse and summarize this URL
See Also:
CrawlPattern
Method Detail

setURLString

public void setURLString(java.lang.String urlString)
Set the URLString for the container. The string can only be set once. To be used with default constructor. The effect of calling default constructor followed by this method is identical to calling DB2URLContainer(urlString).

If the urlString is greater than 250 characters, throws DB2ComponentException.

Overrides:
setURLString in class com.ibm.gcs.urlpool.URLContainer
Parameters:
urlString - The url string identifying this object.
See Also:
DB2URLContainer(String urlString)

getURLString

public java.lang.String getURLString()
Return the url string.
Overrides:
getURLString in class com.ibm.gcs.urlpool.URLContainer
Returns:
String the url string identifying this object

setCrawlPattern

public void setCrawlPattern(CrawlPattern crawlPattern)
                     throws DB2ComponentException
Sets the associated URL pattern tree that says how to traverse and summarize this URL. This URL pattern tree represents the crawl space for the URL.
Overrides:
setCrawlPattern in class com.ibm.gcs.urlpool.URLContainer
Parameters:
CrawlPattern - the URLPatternTree
Throws:
DB2ComponentException - on TransactionException

getCrawlPattern

public CrawlPattern getCrawlPattern()
                             throws DB2ComponentException
Gets the associated URL pattern tree that says how to traverse and summarize this URL. This URL pattern tree represents the crawl space for the URL.
Overrides:
getCrawlPattern in class com.ibm.gcs.urlpool.URLContainer
Returns:
CrawlPattern the URLPatternTree object.
Throws:
DB2ComponentException - on TransactionException

setCrawlPattern

public void setCrawlPattern(CrawlPattern crawlPattern,
                            PreparedTransaction t)
                     throws DB2ComponentException
Sets the associated URL crawl pattern that says how to traverse and summarize this URL. If data has to be loaded from/written to the database, executes the SQL query using the transaction object.
Parameters:
CrawlPattern - the CrawlPattern for this URL
t - The Transaction object for DB2 access.
Throws:
DB2ComponentException - on TransactionException

getCrawlPattern

public CrawlPattern getCrawlPattern(Transaction t)
                             throws DB2ComponentException
Gets the associated URL pattern tree that says how to traverse and summarize this URL. If data has to be loaded from/written to the database, executes the SQL query using the transaction object.
Parameters:
t - The Transaction object for DB2 access.
Returns:
CrawlPattern the URLPatternTree object.
Throws:
TransactionException -  

setDepth

public void setDepth(int depth)
Set the recursion depth of the URL. The depth is the number of hops away from the seed that the the URL was found.
Overrides:
setDepth in class com.ibm.gcs.urlpool.URLContainer
Parameters:
depth - recursion depth

getDepth

public int getDepth()
             throws DB2ComponentException
Get the recursion depth of the URL. The depth is the number of hops away from the seed that the the URL was found.
Overrides:
getDepth in class com.ibm.gcs.urlpool.URLContainer
Returns:
int recursion depth
Throws:
DB2ComponentException - on TransactionException Catches DB2ComponentException from DB2ConfigTable and DB2URLRow

getDepth

public int getDepth(Transaction t)
             throws DB2ComponentException
If data has to be loaded from/written to the database, executes the SQL query using the transaction object.
Returns:
int recursion depth
Throws:
DB2ComponentException - on TransactionException Catches DB2ComponentException from DB2ConfigTable and DB2URLRow

setState

public void setState(com.ibm.gcs.urlpool.URLState state)
Set the state of this url container. If the state is a final state, write final state info into the history. The url may be virgin, toBeCrawled, toBeSummarized, summarized, or failed. The state information is written directly to the database if it is a final state.
Overrides:
setState in class com.ibm.gcs.urlpool.URLContainer
Returns:
URLState the state of this urlC.
Throws:
DB2ComponentException - on TransactionException

setState

public void setState(com.ibm.gcs.urlpool.URLState state,
                     Transaction t)
Set the state of this url container. If the state is a final state, write final state info into the history. The url may be virgin, toBeCrawled, toBeSummarized, summarized, or failed. The state information is written directly to the database if it is a final state.
Returns:
URLState the state of this urlC.
Throws:
DB2ComponentException - on TransactionException

getState

public com.ibm.gcs.urlpool.URLState getState()
                                      throws DB2ComponentException
Get the state of this url container. The url may be virigin, toBeCrawled, toBeSummarized, summarized, or failed.
Overrides:
getState in class com.ibm.gcs.urlpool.URLContainer
Returns:
URLState the state of this urlC.
Throws:
DB2ComponentException - on TransactionException

getState

public com.ibm.gcs.urlpool.URLState getState(Transaction t)
                                      throws DB2ComponentException
Get the state of this url container. The url may be virgin, toBeCrawled, toBeSummarized, summarized, or failed. If data has to be loaded from/written to the database, executes the SQL query using the transaction object.
Returns:
URLState the state of this urlC.
Throws:
DB2ComponentException - on TransactionException

getHide

public boolean getHide(Transaction t)
                throws TransactionException
Get the hide flag for the URLContainer. 1 for hidden, 0 for visible.
Parameters:
t - The transaction object to use for db access.
Throws:
TransactionException - on failed SQL update.

setHide

public void setHide(boolean hidden)
Set the hide flag for the URLContainer. 1 for hidden from, 0 for seen by the URLCollection classes.
Parameters:
t - The transaction object to use for db access.
Throws:
TransactionException - on failed SQL update.

getPriority

public int getPriority(Transaction t)
                throws TransactionException
Get the priority of this container for a Priority Crawl
Parameters:
t - The transaction object to use for db access.
Throws:
TransactionException - on failed SQL update.

setPriority

public void setPriority(int p)
Set the priority of this container for a Priority Crawl
Parameters:
t - The transaction object to use for db access.
Throws:
TransactionException - on failed SQL update.

isSeed

public boolean isSeed()
Returns true if it is a seed url
Returns:
boolean True if the URL is a seed, false otherwise.

getDSC

public com.ibm.gcs.resourcepool.ResourceCollection getDSC()
                                                   throws URLCrawlException
Returns the ResourceCollection that this URLContainer represents. It basically connects to the URL and expects the URLContentHandler to return it as a ResourceCollection object
Overrides:
getDSC in class com.ibm.gcs.urlpool.MemoryURLContainer
Returns:
the associated ResourceCollection object
See Also:
see com.ibm.almaden.gcs.gcsurl.*

getAnnotations

public com.ibm.gcs.urlpool.Annotation[] getAnnotations()
Gets the list of all annotations for this urlC. Returns an empty array, if no annotations.
Overrides:
getAnnotations in class com.ibm.gcs.urlpool.URLContainer
Returns:
Annotation[] an array of Annotation objects.

getAnnotations

public com.ibm.gcs.urlpool.Annotation[] getAnnotations(Transaction t)
                                                throws TransactionException
Gets the list of all annotations for this urlC as an array. Returns an empty array, if no annotations.
Returns:
Annotation[] an array of Annotation objects.
Throws:
DB2ComponentException - on TransactionException

getAnnotationEnums

public java.util.Enumeration getAnnotationEnums()
Gets the list of all annotations for this urlC as an enumeration object.
Overrides:
getAnnotationEnums in class com.ibm.gcs.urlpool.URLContainer
Returns:
Enumeration-An enumeration of Annotation objects.
Throws:
DB2ComponentException - on TransactionException

addAnnotation

public void addAnnotation(com.ibm.gcs.urlpool.Annotation ann)
Add an annotation for this urlC.
Overrides:
addAnnotation in class com.ibm.gcs.urlpool.URLContainer
Parameters:
ann - The annotation for this urlC.
Throws:
DB2ComponentException - on TransactionException

addAnnotation

public void addAnnotation(com.ibm.gcs.urlpool.Annotation ann,
                          com.ibm.gcs.urlpool.URLContainer urlC)
                   throws DB2ComponentException
Add an annotation for this urlC.
Parameters:
ann - The annotation for this urlC.
annotator - The annotator/parent url
Throws:
DB2ComponentException -  

addAnnotations

public void addAnnotations(com.ibm.gcs.urlpool.Annotation[] anns)
Add a set annotations for this urlC.
Overrides:
addAnnotations in class com.ibm.gcs.urlpool.URLContainer
Parameters:
annotations - An array of annotation objects for this urlC.
Throws:
DB2ComponentException - on TransactionException

hasAnnotations

public boolean hasAnnotations()
Returns true if the urlC has annotations, false otherwise.
Overrides:
hasAnnotations in class com.ibm.gcs.urlpool.URLContainer
Throws:
DB2ComponentException -  

EIP Web Crawler APIs

(c) Copyright International Business Machines Corporation 1996, 2002. IBM Corp. All rights reserved.