|
Enterprise Information Portal APIs |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.ibm.gcs.db.component.DB2URLCollection
This URL collection class enqueues URLs based on depth By default, it does not save any annotation information. For maximum performance, make sure that there is an index on urlpoolstable(state_id, hide, time)
Fields inherited from interface com.ibm.gcs.urlpool.URLCollection |
copyright |
Constructor Summary | |
DB2URLCollection()
Default Constructor. |
|
DB2URLCollection(java.util.Hashtable args)
Constructor. |
|
DB2URLCollection(URLPoolConfig.Pair[] args)
Constructor. |
Method Summary | |
void |
cleanup()
provides cleanup-operation on the Collection like writing caches etc. |
com.ibm.gcs.urlpool.URLContainer |
get()
Returns the next URL to be crawled. |
com.ibm.gcs.urlpool.URLContainer |
get(com.ibm.gcs.util.jdp.UnaryPredicate predicate)
Gets the next URL from the collection that satisfies a particular predicate based on the hashing scheme used. |
boolean |
isEmpty()
Returns true if there are no more URLs in the database pool of URLs to be crawled, false otherwise. |
int |
mySize()
Returns the number of URLs currently in this collection's cache. |
void |
put(com.ibm.gcs.urlpool.URLContainer urlC)
Adds a URLContainer object into the database pool of URLs to be crawled, if the url has not already been visited or is not waiting to be crawled. |
void |
put(com.ibm.gcs.urlpool.URLContainer[] urlCArray)
Adds an array of URLContainer objects into the database pool of URLs. |
int |
size()
Returns the total number of visible URLs from the database pool of URLs to be crawled. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public DB2URLCollection()
public DB2URLCollection(URLPoolConfig.Pair[] args)
args
- The arguments to use for this collection.
These arguments should be name/value pairs that specify
values for cachesize, driver, dbname, user, and password.
keepannotations, isdistributedpublic DB2URLCollection(java.util.Hashtable args)
args
- The arguments to use for this collection.
These arguments should be name/value pairs that specify
values for cachesize, driver, dbname, user, and password.
keepannotations, isdistributed
This constructor uses a Hashtable to pass parameters.
It can therefore be used from the outside. Thsi is for test purposes only
and NOT included in the official version of the GCS!!!Method Detail |
public int size()
SELECT COUNT(*) FROM urlpoolstable WHERE urlpoolstable.STATE_ID=1 AND urlpoolstable.HIDE=0
size
in interface com.ibm.gcs.urlpool.URLCollection
DB2ComponentException
- SQL error caused query to fail.DB2Queue
public int mySize()
public boolean isEmpty()
isEmpty
in interface com.ibm.gcs.urlpool.URLCollection
java.lang.RuntimeException
- SQL error caused query to fail.public com.ibm.gcs.urlpool.URLContainer get()
This method first checks the cache. If the cache is empty it loads the next set of URLs into the cache through DB2Queue. DB2Queue executes the following SQL query:
SELECT * FROM urlpoolstable WHERE urlpoolstable.STATE_ID=1 AND urlpoolstable.HIDE=0 ORDER BY timeIt returns the next URL from the cache as a DB2URLContainer.
get
in interface com.ibm.gcs.urlpool.URLCollection
URLContainer
public com.ibm.gcs.urlpool.URLContainer get(com.ibm.gcs.util.jdp.UnaryPredicate predicate)
get
in interface com.ibm.gcs.urlpool.URLCollection
predicate
- a unary predicate objectURLContainer
,
UnaryPredicate
public void put(com.ibm.gcs.urlpool.URLContainer[] urlCArray) throws DB2ComponentException
Calls put( URLContainer, Transaction )
to determine if the URL must be updated. If the
method returns true, saves the changes for each URL
to the database.
It is more efficient to put URLs into the pool as a group
than singly.
put
in interface com.ibm.gcs.urlpool.URLCollection
urlCArray
- An array of urls to add to the database poolpublic void cleanup()
com.ibm.gcs.urlpool.URLCollection
cleanup
in interface com.ibm.gcs.urlpool.URLCollection
com.ibm.gcs.urlpool.URLCollection
DB2URLCollection
public void put(com.ibm.gcs.urlpool.URLContainer urlC)
Calls put( URLContainer, Transaction )
to determine if the URL must be updated in the table. If the
method returns true, saves the changes to the database.
put
in interface com.ibm.gcs.urlpool.URLCollection
URLContainer
- urlC The URL to add to the database pool
|
EIP Web Crawler APIs | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |