HTMLParser

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.egothor.html
Class HTMLParser

java.lang.Object
  org.egothor.html.HTMLParser

public class HTMLParser
extends java.lang.Object
extends java.lang.Object

This class is part of the Egothor Project

Author:: Leo Galambos

Constructor Summary
`HTMLParser(boolean clinks)` Constructor for the HTMLParser object.
`HTMLParser(boolean clinks, boolean ilinks)` Constructor for the HTMLParser object.

Method Summary
`java.net.URI`	`getBaseURL()` Return the base URL.
`java.util.ArrayList<Anchor>`	`getImageLinks()` Return a Vector of the links obtained by the Handler.
`java.util.ArrayList<Anchor>`	`getLinks()` Return a Vector of the links obtained by the Handler.
`java.util.HashMap<java.lang.String,java.lang.String>`	`getMeta()` Return a Hashtable containing the metadata obtained by the Handler.
`CharStream`	`getReader(java.io.Reader i, java.lang.String baseURL, EventEncoder encoder)` Return a StringReader that will use the given input stream and read from the given URL.
`java.lang.String`	`getSummary()` Return the summary.
`java.lang.String`	`getTitle()` Return the title.
`static void`	`main(java.lang.String[] args)` Entry point to the HTMLParser application.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

HTMLParser

public HTMLParser(boolean clinks)

Constructor for the HTMLParser object. The HTML files are parsed using the parser of the superclass.

Parameters:: clinks - if set to true the object will collect links from the document
See Also:: getLinks()

HTMLParser

public HTMLParser(boolean clinks,
                  boolean ilinks)

Constructor for the HTMLParser object. The HTML files are parsed using the parser of the superclass.

Parameters:: clinks - if set to true the object will collect links from the document; ilinks - if set to true the object will collect img-src links from the document

Method Detail

getBaseURL

public java.net.URI getBaseURL()

Return the base URL. Gotten via the Handler.

Returns:: The baseURL value

getLinks

public java.util.ArrayList<Anchor> getLinks()

Return a Vector of the links obtained by the Handler.

Returns:: a Vector

getImageLinks

public java.util.ArrayList<Anchor> getImageLinks()

Return a Vector of the links obtained by the Handler.

Returns:: a Vector

getMeta

public java.util.HashMap<java.lang.String,java.lang.String> getMeta()

Return a Hashtable containing the metadata obtained by the Handler.

Returns:: a Hashtable

getReader

public CharStream getReader(java.io.Reader i,
                            java.lang.String baseURL,
                            EventEncoder encoder)
                     throws java.io.IOException

Return a StringReader that will use the given input stream and read from the given URL.

Parameters:: i - the input stream; baseURL - where to read from
Returns:: a StringReader
Throws:: java.io.IOException - if an I/O error occurs

getSummary

public java.lang.String getSummary()

Return the summary. Gotten via the Handler.

Returns:: the summary

getTitle

public java.lang.String getTitle()

Return the title. Gotten via the Handler.

Returns:: the title

main

public static void main(java.lang.String[] args)

Entry point to the HTMLParser application. This program requires one argument: the path to an HTML file to parse.

Parameters:: args - the path to the file to parse

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.egothor.html Class HTMLParser

HTMLParser

HTMLParser

getBaseURL

getLinks

getImageLinks

getMeta

getReader

getSummary

getTitle

main

org.egothor.html
Class HTMLParser