Enterprise Information Portal APIs

com.ibm.gcs.db.component
Class UrlCrawlTableDef

java.lang.Object
  |
  +--com.ibm.gcs.db.component.UrlCrawlTableDef

public class UrlCrawlTableDef
extends java.lang.Object

This class provides the constants for the column names of the table containing the urls and crawl info, URLCRAWLTABLE It also provides a method to construct the sql string to create the URLCRAWLTABLE as follows:

 
  create table URLCRAWLTABLE
  ( 
    URL varchar(250) primary key not null,
    DEPTH int not null default 0,
    CRAWL_PATTERN_ID varchar(100),
    STATE varchar(40) default 'VIRGIN',
    TIME timestamp not null default current timestamp,
    TIME_CRAWLED timestamp,
    EXCEPTION varchar(1000),
    HIDE smallint not null default 0,
    CRAWL_FREQ date,
    STATE_ID smallint	not null default 0,
    PRIORITY smallint not null default 0
  )


Field Summary
static java.lang.String CRAWL_PATTERN_ID
          The key for the CrawlPattern:  CRAWL_PATTERN_ID.
static java.lang.String DEPTH
          The recursion depth of this url in the crawl:  DEPTH.
static java.lang.String EXCEPTION
          Exception message for failed states:  EXCEPTION.
static java.lang.String FREQ
          How often should this url be crawled? :  CRAWL_FREQ.
static java.lang.String HIDE
          Flag to be used to specify whether the url is visible in the collection:  HIDE.
static java.lang.String LAST_MODIFIED
          The time the source was last modified, specifically as returned by the http response:   LAST_MODIFIED.
static java.lang.String PRIORITY
          The crawl priority.:  PRIORITY.
static java.lang.String STATE
          The current state:  STATE.
static java.lang.String STATE_ID
          What state is this url in?:  STATE_ID.
static java.lang.String TABLE
          The name of the table:   URLCRAWLTABLE.
static java.lang.String TIME
          The time of the current state:  TIME.
static java.lang.String URL_KEY
          The name of the url:   URL.
static java.lang.String VISIT_TIME
          The time crawled:  VISIT_TIME.
 
Constructor Summary
UrlCrawlTableDef()
           
 
Method Summary
static void createTable(Transaction t)
          Given a Transaction object, executes the create statements (which create the table and associated indexes on the table).
static void dropTable(Transaction t)
          Given a Transaction object, executes the drop statement.
static java.util.Enumeration getCreateIndexSQL()
          Returns the SQL CREATE statements to create the necessary indexes on the URLCRAWLTABLE.
static java.lang.String getCreateSQL()
          Returns the SQL CREATE statement to create the URLCRAWLTABLE.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TABLE

public static final java.lang.String TABLE
The name of the table:   URLCRAWLTABLE.

URL_KEY

public static final java.lang.String URL_KEY
The name of the url:   URL.
SQL DEF: VARCHAR(250) NOT NULL PRIMARY KEY

DEPTH

public static final java.lang.String DEPTH
The recursion depth of this url in the crawl:  DEPTH.
SQL DEF: INT NOT NULL DEFAULT 0

CRAWL_PATTERN_ID

public static final java.lang.String CRAWL_PATTERN_ID
The key for the CrawlPattern:  CRAWL_PATTERN_ID.
SQL DEF: VARCHAR(40)

TIME

public static final java.lang.String TIME
The time of the current state:  TIME.
SQL DEF: TIMESTAMP NOT NULL DEFAULT CURRENT TIMESTAMP

STATE

public static final java.lang.String STATE
The current state:  STATE.
SQL DEF: VARCHAR(16) NOT NULL

VISIT_TIME

public static final java.lang.String VISIT_TIME
The time crawled:  VISIT_TIME.
SQL DEF: DATE

LAST_MODIFIED

public static final java.lang.String LAST_MODIFIED
The time the source was last modified, specifically as returned by the http response:   LAST_MODIFIED.
SQL DEF: DATE

EXCEPTION

public static final java.lang.String EXCEPTION
Exception message for failed states:  EXCEPTION.
SQL DEF: VARCHAR(4000)

HIDE

public static final java.lang.String HIDE
Flag to be used to specify whether the url is visible in the collection:  HIDE.
SQL DEF: SMALL INT NOT NULL DEFAULT 0

FREQ

public static final java.lang.String FREQ
How often should this url be crawled? :  CRAWL_FREQ.
SQL DEF: DATE

STATE_ID

public static final java.lang.String STATE_ID
What state is this url in?:  STATE_ID.
SQL DEF: SMALLINT NOT NULL DEFAULT 0
See Also:
for conversions.

PRIORITY

public static final java.lang.String PRIORITY
The crawl priority.:  PRIORITY.
SQL DEF: SMALLINT NOT NULL DEFAULT 0
Constructor Detail

UrlCrawlTableDef

public UrlCrawlTableDef()
Method Detail

getCreateSQL

public static java.lang.String getCreateSQL()
Returns the SQL CREATE statement to create the URLCRAWLTABLE. The SQL CREATE statement is as follows:
 
  create table URLCRAWLTABLE( 
    URL varchar(250) primary key not null,
    DEPTH int not null default 0,
    CRAWL_PATTERN_ID varchar(100), 
    STATE varchar(40) default 'VIRGIN',
    TIME timestamp not null default current timestamp,
    VISIT_TIME timestamp,
    LAST_MODIFIED timestamp,
    EXCEPTION varchar(1000),
    HIDE smallint not null default 0,
    CRAWL_FREQ date,
    STATE_ID smallint	not null default 0,
    PRIORITY smallint not null default 0
  )
Note: As a result of column dependencies, the treetable must be created before the urlpoolstable.
Returns:
The sql string to create the urlpoolstable.

getCreateIndexSQL

public static java.util.Enumeration getCreateIndexSQL()
Returns the SQL CREATE statements to create the necessary indexes on the URLCRAWLTABLE. The enumeration contains the following CREATE INDEX statements:
Returns:
An enumeration of SQL strings each contain a separate createIndex statement.

createTable

public static void createTable(Transaction t)
                        throws TransactionException
Given a Transaction object, executes the create statements (which create the table and associated indexes on the table). The treetable must be created before this table, so throws an exception if this table does not exist.
Parameters:
t - The transaction object through which to execute the create statements.
Throws:
TransactionException - on failed SQL update or if ConfigTable does not already exist.

dropTable

public static void dropTable(Transaction t)
                      throws TransactionException
Given a Transaction object, executes the drop statement.
Parameters:
t - The transaction object through which to execute the drop statements.
Throws:
TransactionException - on failed SQL update.

EIP Web Crawler APIs

(c) Copyright International Business Machines Corporation 1996, 2002. IBM Corp. All rights reserved.