|
Enterprise Information Portal APIs |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.ibm.gcs.component.Component | +--com.ibm.gcs.gatherer.Gatherer
This is the main Component
of GCS.
It starts, monitors, and stops the Crawler
and Summarizer
.
It is called from an external ComponentRunner
,
such as the executable GCS
class.
The Gatherer has four states:
The Gatherer is constructed and configured by some external class
that implements ComponentRunner
(e.g., GCS
).
It is passed a Config
file name
(or an actual Config), which it uses to construct and configure
Crawler
and
Summarizer
components.
The Gatherer is also responsible for configuring the loggers,
network properties, text and graph status monitors,
and the Temp FilePool
.
Next the Gatherer is started by some external class (e.g., GCS
).
All it does here is create and run a new "Gatherer" Thread, which
automaticaly brings it to its next state...
The Gatherer thread starts the Crawler
and
Summarizer
, which actually do the work.
All the Gatherer thread does now is check [and report] the crawl status,
and stop when everything is done.
The status may be output as text to System.out (using the "gcs.status.monitor" logger),
or as a graph to the MonitorGraphComponent
.
There are two cases where the Gatherer is stopped:
(1) it was told to stop by some external class, or
(2) there is nothing left to crawl or summarize
(the URLPool
is empty,
the ResourcePool
is empty,
all of the Crawler
worker threads
are waiting for new URLContainer
URLs,
and all of the Summarizer
threads are waiting for new Resource
s.
When the Gatherer is stopped, it stops the Crawler and Summarizer,
and cleans up the temp file pool.
If you are writing an program that calls the gatherer and want to be notified when it is done, you can wait on the gatherer.isDone() object.
Crawler
,
Summarizer
Field Summary | |
static boolean |
crawlerStatus
|
java.lang.Boolean |
isDone
|
static boolean |
summarizerStatus
|
static boolean |
threadStatus
|
Fields inherited from interface com.ibm.gcs.component.Schedulable |
copyright |
Constructor Summary | |
Gatherer(ComponentRunner componentRunner,
java.lang.String[] args,
Config config)
(constructor) |
|
Gatherer(ComponentRunner componentRunner,
java.lang.String[] args,
java.lang.String configFileName)
(constructor) |
Method Summary | |
void |
crawlerUpdate(boolean threadWorking)
update the gatherer that a crawler thread is working or waiting |
Crawler |
getCrawler()
returns the crawler sub-component |
int |
getMaxNumURLsToCrawl()
get the maximum number of URLs to crawl |
int |
getNumCrawlers()
get the number of working crawler threads |
int |
getNumResourcesSummarized()
get the number of URLs that have been summarized |
int |
getNumSummarizers()
get the number of working summarizer threads |
int |
getNumURLsCrawled()
get the number of URLs that have been crawled |
int |
getResourcePoolSize()
get the number of URLs waiting to be summarized |
com.ibm.gcs.summarizer.Summarizer |
getSummarizer()
returns the summarizer sub-component |
int |
getURLPoolSize()
get the number of URLs waiting to be crawled |
boolean |
isDone()
returns true if the Gatherer is all done |
void |
kill(java.lang.String reason)
kills a crawl!!! |
void |
run()
start the crawler and gatherer threads; loop monitoring |
void |
setCrawlerThreadGroup(GCSThreadGroup crawlerThreadGroup)
sets the crawler thread group (called by crawler start method) |
void |
setResourcePool(com.ibm.gcs.resourcepool.ResourcePool resourcePool)
sets the Resource Pool (called by the summarizer constructor) |
void |
setSummarizerThreadGroup(GCSThreadGroup summarizerThreadGroup)
sets the summarizer thread group (called by summarizer start method) |
void |
setURLPool(com.ibm.gcs.urlpool.URLPool urlPool)
sets the URL Pool (called by the crawler constructor) |
void |
start()
start gatherer |
void |
stop()
stop a crawl |
void |
summarizerUpdate(boolean threadWorking)
update the gatherer that a summarizer thread is working or waiting |
static void |
threadStatusUpdate(char newStatus)
|
static void |
threadStatusUpdate(char newStatus,
char newStatusExt)
|
Methods inherited from class com.ibm.gcs.component.Component |
getArgv, getConfig, getName, getTempFilePool, getVersion |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public java.lang.Boolean isDone
public static boolean crawlerStatus
public static boolean summarizerStatus
public static boolean threadStatus
Constructor Detail |
public Gatherer(ComponentRunner componentRunner, java.lang.String[] args, Config config) throws ConfigException
componentRunner
- the ComponentRunner that is creating this gathererargs[]
- command line arguments (unused?)config
- the configuration to usepublic Gatherer(ComponentRunner componentRunner, java.lang.String[] args, java.lang.String configFileName) throws ConfigException
componentRunner
- the ComponentRunner that is creating this gathererargs[]
- command line arguments (unused?)configFileName
- the name of the configuration file to useMethod Detail |
public void setURLPool(com.ibm.gcs.urlpool.URLPool urlPool)
public void setResourcePool(com.ibm.gcs.resourcepool.ResourcePool resourcePool)
public void setCrawlerThreadGroup(GCSThreadGroup crawlerThreadGroup)
public void setSummarizerThreadGroup(GCSThreadGroup summarizerThreadGroup)
public void start()
public void run()
public void stop()
public void kill(java.lang.String reason)
public void crawlerUpdate(boolean threadWorking)
threadWorking
- whether the current thread is crawling or waiting for a URLpublic static void threadStatusUpdate(char newStatus)
public static void threadStatusUpdate(char newStatus, char newStatusExt)
public void summarizerUpdate(boolean threadWorking)
threadWorking
- whether the current thread is summarizing or waiting for a resourcepublic int getNumCrawlers()
public int getNumSummarizers()
public int getURLPoolSize()
public int getNumURLsCrawled()
public int getMaxNumURLsToCrawl()
public int getResourcePoolSize()
public int getNumResourcesSummarized()
public Crawler getCrawler()
public com.ibm.gcs.summarizer.Summarizer getSummarizer()
public boolean isDone()
|
EIP Web Crawler APIs | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |