Enterprise Information Portal APIs

com.ibm.gcs.netutil
Class URLProcessor

java.lang.Object
  |
  +--com.ibm.gcs.netutil.URLProcessor

public class URLProcessor
extends java.lang.Object

URLProcessor is a helper class for parsing a URL String. It has static methods for determining whether a URL is relative, normalizing and making a URL absolute, getting URL segments, etc.


Constructor Summary
URLProcessor()
           
 
Method Summary
static java.lang.String changePathIntoWindowsFile(java.lang.String path)
          makes the path into a filename (escapes '\' and '/') and replaces illegal Windows filesystem characters with escape sequences; this makes the path a valid filename in Windows, and the URL should still be resolved correctly on the web.
static java.lang.String cleanPathForWindows(java.lang.String path)
          replaces illegal Windows filesystem characters with escape sequences; this makes the path a valid path in Windows, and the URL should still be resolved correctly on the web.
static java.lang.String getDir(java.net.URL url)
          (static) gets the dir from a URL (basically everything up to and ending with the last "/").
static java.lang.String getFileExtension(java.lang.String urlString)
          (static) gets the file extension from a URL String
static java.lang.String getFileExtension(java.net.URL url)
          (static) gets the file extension from a URL
static java.lang.String getFilename(java.net.URL url)
          (static) gets the filename from a URL (not including the dir).
static boolean isRelative(java.lang.String url)
          (static) checks whether a URL is relative.
static void main(java.lang.String[] args)
           
static java.util.Collection makeAbsolute(java.net.URL baseURL, java.util.Collection relURLStrings)
          (static) makes a Collection of URLs absolute according to some base URL, with redundant URLs removed.
static java.net.URL makeAbsolute(java.net.URL baseURL, java.lang.String relURLString)
          (static) makes a URL absolute according to some base URL
static java.lang.String removeSingleDots(java.lang.String path)
          replaces "/./" in a path String with "/"
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

URLProcessor

public URLProcessor()
Method Detail

isRelative

public static boolean isRelative(java.lang.String url)
(static) checks whether a URL is relative.
Parameters:
url - the URL String to check
Returns:
true if the URL is relative

makeAbsolute

public static java.net.URL makeAbsolute(java.net.URL baseURL,
                                        java.lang.String relURLString)
(static) makes a URL absolute according to some base URL
Parameters:
baseURL - the URL base
relURLString - a relative (or absolute) URL string
Returns:
absolute URL Normalizes the URL by doing the following:
  1. puts protocol and host in lower-case e.g., hTTp://wWw.FooBar.COM/InDeX.html --> http://www.foobar.com/InDeX.html
  2. ensures that there is the minimal path e.g., http://www.foobar.com --> http://www.foobar.com/
  3. replaces backward slash with forward slash in path e.g., http://www.foobar.com/\/\foo\bar --> http://www.foobar.com////foo/bar
  4. removes multiple slashes from the path, e.g., http://www.foobar.com///foo//bar//// --> http://www.foobar.com/foo/bar/
  5. removes single-dots from the path, e.g., http://www.foobar.com/foo/./bar.html --> http://www.foobar.com/foo/bar.html
  6. escapes spaces in the path, e.g., http://www.foobar.com/foo bar/ --> http://www.foobar.com/foo%20bar/
  7. resolves double-dots in the path, e.g., http://www.foobar.com/foo/../index.html --> http://www.foobar.com/index.html
  8. removes any redundant port 80 from an http URL, e.g., http://www.foobar.com:80/foobar/ --> http://www.foobar.com/foobar/

makeAbsolute

public static java.util.Collection makeAbsolute(java.net.URL baseURL,
                                                java.util.Collection relURLStrings)
(static) makes a Collection of URLs absolute according to some base URL, with redundant URLs removed. Note that this returns a Colleciton, not a Set, for performance reasons.
Parameters:
baseURL - the URL base
relURLStrings - a Collection of relative (or absolute) URL Strings
Returns:
a Collection of absolute URLs (with redundant URLs removed) also Normalizes each URL as described above.

getFileExtension

public static java.lang.String getFileExtension(java.lang.String urlString)
(static) gets the file extension from a URL String

getDir

public static java.lang.String getDir(java.net.URL url)
(static) gets the dir from a URL (basically everything up to and ending with the last "/"). The returned dir will begin and end with slashes.

getFilename

public static java.lang.String getFilename(java.net.URL url)
(static) gets the filename from a URL (not including the dir). If a URL filename doesn't have an extension, then it is considered part of the dir so null will be returned

getFileExtension

public static java.lang.String getFileExtension(java.net.URL url)
(static) gets the file extension from a URL

removeSingleDots

public static java.lang.String removeSingleDots(java.lang.String path)
replaces "/./" in a path String with "/"

cleanPathForWindows

public static java.lang.String cleanPathForWindows(java.lang.String path)
replaces illegal Windows filesystem characters with escape sequences; this makes the path a valid path in Windows, and the URL should still be resolved correctly on the web.

changePathIntoWindowsFile

public static java.lang.String changePathIntoWindowsFile(java.lang.String path)
makes the path into a filename (escapes '\' and '/') and replaces illegal Windows filesystem characters with escape sequences; this makes the path a valid filename in Windows, and the URL should still be resolved correctly on the web.

main

public static void main(java.lang.String[] args)

EIP Web Crawler APIs

(c) Copyright International Business Machines Corporation 1996, 2002. IBM Corp. All rights reserved.