org.egothor.html
Class HTMLParser

java.lang.Object
  extended by org.egothor.html.HTMLParser

public class HTMLParser
extends java.lang.Object

This class is part of the Egothor Project

Author:
Leo Galambos

Constructor Summary
HTMLParser(boolean clinks)
          Constructor for the HTMLParser object.
HTMLParser(boolean clinks, boolean ilinks)
          Constructor for the HTMLParser object.
 
Method Summary
 java.net.URI getBaseURL()
          Return the base URL.
 java.util.ArrayList<Anchor> getImageLinks()
          Return a Vector of the links obtained by the Handler.
 java.util.ArrayList<Anchor> getLinks()
          Return a Vector of the links obtained by the Handler.
 java.util.HashMap<java.lang.String,java.lang.String> getMeta()
          Return a Hashtable containing the metadata obtained by the Handler.
 CharStream getReader(java.io.Reader i, java.lang.String baseURL, EventEncoder encoder)
          Return a StringReader that will use the given input stream and read from the given URL.
 java.lang.String getSummary()
          Return the summary.
 java.lang.String getTitle()
          Return the title.
static void main(java.lang.String[] args)
          Entry point to the HTMLParser application.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTMLParser

public HTMLParser(boolean clinks)
Constructor for the HTMLParser object. The HTML files are parsed using the parser of the superclass.

Parameters:
clinks - if set to true the object will collect links from the document
See Also:
getLinks()

HTMLParser

public HTMLParser(boolean clinks,
                  boolean ilinks)
Constructor for the HTMLParser object. The HTML files are parsed using the parser of the superclass.

Parameters:
clinks - if set to true the object will collect links from the document
ilinks - if set to true the object will collect img-src links from the document
Method Detail

getBaseURL

public java.net.URI getBaseURL()
Return the base URL. Gotten via the Handler.

Returns:
The baseURL value

getLinks

public java.util.ArrayList<Anchor> getLinks()
Return a Vector of the links obtained by the Handler.

Returns:
a Vector

getImageLinks

public java.util.ArrayList<Anchor> getImageLinks()
Return a Vector of the links obtained by the Handler.

Returns:
a Vector

getMeta

public java.util.HashMap<java.lang.String,java.lang.String> getMeta()
Return a Hashtable containing the metadata obtained by the Handler.

Returns:
a Hashtable

getReader

public CharStream getReader(java.io.Reader i,
                            java.lang.String baseURL,
                            EventEncoder encoder)
                     throws java.io.IOException
Return a StringReader that will use the given input stream and read from the given URL.

Parameters:
i - the input stream
baseURL - where to read from
Returns:
a StringReader
Throws:
java.io.IOException - if an I/O error occurs

getSummary

public java.lang.String getSummary()
Return the summary. Gotten via the Handler.

Returns:
the summary

getTitle

public java.lang.String getTitle()
Return the title. Gotten via the Handler.

Returns:
the title

main

public static void main(java.lang.String[] args)
Entry point to the HTMLParser application. This program requires one argument: the path to an HTML file to parse.

Parameters:
args - the path to the file to parse