|
IBM Information Integrator for Content V8.2 APIs |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.ibm.mm.beans.infomining.CMBInfoMiningBean | +--com.ibm.mm.beans.infomining.CMBConnectedMiningBean | +--com.ibm.mm.beans.infomining.CMBWebCrawlerService
CMBWebCrawlerService - Create CMBItems from crawled documents.
This bean monitors the webspace, i.e., the crawled files created by the Web Crawler. These files are moved to an "archive" directory which is created in the webspace directory. CMBItems are created from those files which can then be categorized (with the CMBCategorizationService), summarized (CMBSummarizationService), and/or imported into the database (CMBCatalogService). The crawled files must begin with a metadata "Headline" containing its URL, last modified date, etc. Use the web crawler IMY.INI file SAVE_HEADLINES option (in the ini file section STORE) to let the crawler to write such a headline. See web crawler documentation for directory and file details.
Constructor Summary | |
CMBWebCrawlerService()
Default constructor. |
Method Summary | |
void |
addCMBResultListener(com.ibm.mm.beans.CMBResultListener l)
Adds the specified result listener to receive events from this bean. |
java.lang.String |
getFilterEncoding()
Gets the filter encoding |
int |
getPageSize()
Gets the number of CMBItems within a single CMBTextAnalysisRequestEvent. |
int |
getPollCycles()
Gets overall number of times to poll. |
int |
getPollMinutes()
Get minutes to wait before beginning next poll |
java.lang.String |
getRootDirectory()
Gets the root directory where the crawler stores the crawled documents. |
java.lang.String |
getWebSpace()
Gets the webspace which is monitored by the web crawler |
boolean |
isArchiveEnabled()
Gets option if imported files should be kept in archive. |
void |
removeCMBResultListener(com.ibm.mm.beans.CMBResultListener l)
Removes the specified result listener so that it no longer receives events from this bean. |
void |
setArchiveEnabled(boolean keepInArchive)
Sets option if imported files should be kept in archive. |
void |
setFilterEncoding(java.lang.String filterEncoding)
Sets the filter encoding. |
void |
setPageSize(int pageSize)
Sets the number of CMBItems to be carried by a single CMBTextAnalysisRequestEvent. |
void |
setPollCycles(int newPollCycles)
Sets overall number of times to poll. |
void |
setPollMinutes(int newPollMinutes)
Sets minutes to wait before beginning next poll. |
void |
setRootDirectory(java.lang.String rootDir)
Sets the root directory where the crawler stores the crawled documents. |
void |
setWebSpace(java.lang.String webSpace)
Sets the webspace which is monitored by the Web crawler |
void |
start()
Start the polling of the webspace. |
Methods inherited from class com.ibm.mm.beans.infomining.CMBConnectedMiningBean |
getConnection, isConnected, onCMBConnectionReply, setConnection, validateConnection |
Methods inherited from class com.ibm.mm.beans.infomining.CMBInfoMiningBean |
addCMBExceptionListener, addCMBTraceListener, isTraceEnabled, removeCMBExceptionListener, removeCMBTraceListener, setTraceEnabled |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public CMBWebCrawlerService()
Method Detail |
public void start() throws com.ibm.mm.beans.CMBNoConnectionException
public void addCMBResultListener(com.ibm.mm.beans.CMBResultListener l)
l
- the result listenerpublic void removeCMBResultListener(com.ibm.mm.beans.CMBResultListener l)
l
- the result listenerpublic void setFilterEncoding(java.lang.String filterEncoding)
filter
- encoding.getFilterEncoding()
public java.lang.String getFilterEncoding()
setFilterEncoding(String)
public void setPollCycles(int newPollCycles)
overall
- number of times to pollgetPollCycles()
public int getPollCycles()
setPollCycles(int)
public void setPollMinutes(int newPollMinutes)
minutes
- to wait before beginning next pollgetPollCycles()
public int getPollMinutes()
setPollMinutes(int)
public void setRootDirectory(java.lang.String rootDir)
root
- directory where the crawler stores the crawled documentsgetRootDirectory()
public java.lang.String getRootDirectory()
setRootDirectory(String)
public void setArchiveEnabled(boolean keepInArchive)
isArchiveEnabled()
public boolean isArchiveEnabled()
setArchiveEnabled(boolean)
public void setPageSize(int pageSize)
number
- of CMBItems to be carried by a single CMBTextAnalysisRequestEventgetPageSize()
public int getPageSize()
setPageSize(int)
public void setWebSpace(java.lang.String webSpace)
name
- of webspace to be monitored by the Web crawlergetWebSpace()
public java.lang.String getWebSpace()
setWebSpace(String)
|
IBM Information Integrator for Content V8.2 APIs |
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |