IBM Information Integrator for Content V8.2 APIs

com.ibm.mm.sdk.common.infomining
Class DKIKFDocumentFilter

java.lang.Object
  |
  +--com.ibm.mm.sdk.common.infomining.DKIKFDocumentFilter

public class DKIKFDocumentFilter
extends java.lang.Object

A document filter can be used to get the textual content of a document.


Field Summary
static java.lang.String ANSI
          7-bit ANSI
static java.lang.String ANSI8
          8-bit ANSI
static java.lang.String ASCII
          7-bit ASCII
static java.lang.String ASCII8
          8-bit ASCII
static java.lang.String CHINESEBIG5
          Plain text file uses Chinese Big 5 character set (DBCS).
static java.lang.String CHINESEGB
          Plain text file uses Chinese GB character set (DBCS).
static java.lang.String DEFAULT
          Default encoding, uses automatic codepage detection and the system codepage as fallback.
static java.lang.String HANGEUL
          Plain text file uses Korean Hangul character set (DBCS).
static java.lang.String HTML_CHINESEBIG5
          html file encoded in Chinese Big 5 character set
static java.lang.String HTML_CHINESEEUC
          html file encoded in Chinese EUC character set
static java.lang.String HTML_CHINESEGB
          html file encoded in Chinese GB character set
static java.lang.String HTML_JAPANESEEUC
          html file encoded in Japanese EUC character set
static java.lang.String HTML_JAPANESESJIS
          html file encoded in Japanese ShiftJIS character set
static java.lang.String HTML_KOREANHANGUL
          html file encoded in Korean Hangul character set
static java.lang.String JAPANESE_EUC
          Plain text file uses Japanese EUC character set (DBCS).
static java.lang.String SHIFTJIS
          Plain text file uses Japanese ShiftJIS character set (DBCS).
static java.lang.String UNICODE
          UCS-2 encoded files
 
Constructor Summary
DKIKFDocumentFilter(DKIKFService ikfService)
          Creates a new filter object.
 
Method Summary
 java.util.Map getContent(byte[] documentBytes)
          Returns a map containing the textual content of the specified document.
 java.util.Map getContent(java.io.InputStream in)
          Returns a map containing the textual content of the document that is read from the specified stream.
 java.lang.String getFilterEncoding()
          Returns the currently set encoding.
 DKIKFService getService()
          Returns the service object used by this filter.
 DKIKFTextDocument getTextDocument(byte[] documentBytes)
          Returns a text document containing all textual parts of the specified document.
 DKIKFTextDocument getTextDocument(java.io.InputStream in)
          Returns a text document containing all textual parts of the document that is read from the specified stream.
 void setFilterEncoding(java.lang.String encoding)
          Sets the character encoding for data retrieval.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SHIFTJIS

public static java.lang.String SHIFTJIS
Plain text file uses Japanese ShiftJIS character set (DBCS).

JAPANESE_EUC

public static java.lang.String JAPANESE_EUC
Plain text file uses Japanese EUC character set (DBCS).

CHINESEGB

public static java.lang.String CHINESEGB
Plain text file uses Chinese GB character set (DBCS).

CHINESEBIG5

public static java.lang.String CHINESEBIG5
Plain text file uses Chinese Big 5 character set (DBCS).

HANGEUL

public static java.lang.String HANGEUL
Plain text file uses Korean Hangul character set (DBCS).

HTML_JAPANESESJIS

public static java.lang.String HTML_JAPANESESJIS
html file encoded in Japanese ShiftJIS character set

HTML_JAPANESEEUC

public static java.lang.String HTML_JAPANESEEUC
html file encoded in Japanese EUC character set

HTML_CHINESEBIG5

public static java.lang.String HTML_CHINESEBIG5
html file encoded in Chinese Big 5 character set

HTML_CHINESEGB

public static java.lang.String HTML_CHINESEGB
html file encoded in Chinese GB character set

HTML_CHINESEEUC

public static java.lang.String HTML_CHINESEEUC
html file encoded in Chinese EUC character set

HTML_KOREANHANGUL

public static java.lang.String HTML_KOREANHANGUL
html file encoded in Korean Hangul character set

ANSI

public static java.lang.String ANSI
7-bit ANSI

ANSI8

public static java.lang.String ANSI8
8-bit ANSI

ASCII

public static java.lang.String ASCII
7-bit ASCII

ASCII8

public static java.lang.String ASCII8
8-bit ASCII

UNICODE

public static java.lang.String UNICODE
UCS-2 encoded files

DEFAULT

public static java.lang.String DEFAULT
Default encoding, uses automatic codepage detection and the system codepage as fallback. This is the default filter encoding.
Constructor Detail

DKIKFDocumentFilter

public DKIKFDocumentFilter(DKIKFService ikfService)
Creates a new filter object.
Parameters:
ikfService - the service object to be used by the filter.
Method Detail

setFilterEncoding

public void setFilterEncoding(java.lang.String encoding)
Sets the character encoding for data retrieval. This method allows selecting a codepage which is used for converting textual data into Unicode. Setting the codepage is necessary for double-byte documents if the system code page is not the document code page. If no codepage is set system codepage is assumed.
See public fields for encodings.
Parameters:
encoding - codepage to be used by the filter.
See Also:
getFilterEncoding()

getFilterEncoding

public java.lang.String getFilterEncoding()
Returns the currently set encoding.
Returns:
the encoding
See Also:
setFilterEncoding(java.lang.String)

getContent

public java.util.Map getContent(byte[] documentBytes)
                         throws java.io.IOException,
                                DKIKFDocumentFilterException
Returns a map containing the textual content of the specified document.
Parameters:
documentBytes - the complete document as a byte array
Returns:
a map containing the textual content of the document
Throws:
java.io.IOException - if an IOException is thrown during document processing.
java.io.IOException - if the filter encounters a problem during document processing.
DKIKFAuthorizationException - if the user or group does not have the privilege IKFRunAnalysisFunc
See Also:
getContent(InputStream), getTextDocument(InputStream), getTextDocument(byte[])

getContent

public java.util.Map getContent(java.io.InputStream in)
                         throws java.io.IOException,
                                DKIKFDocumentFilterException
Returns a map containing the textual content of the document that is read from the specified stream. The input stream is not closed.
Parameters:
in - the input stream to read the document
Returns:
a map containing the textual content of the document
Throws:
java.io.IOException - if an IOException is thrown during document processing.
java.io.IOException - if the filter encounters a problem during document processing.
DKIKFAuthorizationException - if the user or group does not have the privilege IKFRunAnalysisFunc
See Also:
getContent(byte[]), getTextDocument(InputStream), getTextDocument(byte[])

getTextDocument

public DKIKFTextDocument getTextDocument(byte[] documentBytes)
                                  throws java.io.IOException,
                                         DKIKFDocumentFilterException
Returns a text document containing all textual parts of the specified document.
Parameters:
documentBytes - the complete document as a byte array
Returns:
a text document containing all textual parts of the document
Throws:
java.io.IOException - if an IOException is thrown during document processing.
java.io.IOException - if the filter encounters a problem during document processing.
DKIKFAuthorizationException - if the user or group does not have the privilege IKFRunAnalysisFunc
See Also:
getContent(byte[]), getContent(InputStream), getTextDocument(InputStream)

getTextDocument

public DKIKFTextDocument getTextDocument(java.io.InputStream in)
                                  throws java.io.IOException,
                                         DKIKFDocumentFilterException
Returns a text document containing all textual parts of the document that is read from the specified stream. The input stream is not closed.
Parameters:
in - the input stream to read the document
Returns:
a text document containing all textual parts of the document
Throws:
java.io.IOException - if an IOException is thrown during document processing.
java.io.IOException - if the filter encounters a problem during document processing.
DKIKFAuthorizationException - if the user or group does not have the privilege IKFRunAnalysisFunc
See Also:
getContent(byte[]), getContent(InputStream), getTextDocument(byte[])

getService

public DKIKFService getService()
Returns the service object used by this filter.
Returns:
the service object used by this filter

IBM Information Integrator for Content V8.2 APIs

© Copyright International Business Machines Corporation 1996, 2003 IBM Corp. All rights reserved.