Version: 1.04 Tue Jun 4 13:28:39 PDT 2002 This is a Ruby library for building trees representing HTML structure. See the file INSTALL for installation instructions. Copyright (C) 2002, Ned Konz License: Ruby's See http://www.bike-nomad.com/ruby/ for the most recent version. ============= PREREQUISITES ============= ------------- html-parser ------------- This requires the html-parser library to be installed. This can be obtained from: http://www.jin.gr.jp/~nahi/Ruby/html-parser/html-parser-19990912p2.tar.gz NOTE: You have to change lines 198 and 199 in sgml-parser.rb to read: elsif (attrvalue[0] == ?' && attrvalue[-1] == ?') or (attrvalue[0] == ?" && attrvalue[-1] == ?") This requires that you edit line 199. ------------- Test::Unit ------------- The tests require the Test::Unit library, which can be gotten from http://testunit.talbott.ws/ If you don't have Test::Unit, it won't run the tests as part of the install. ------------- REXML ------------- The XPath support (html/xpath.rb) and the HTMLTree::XMLParser (html/xmltree.rb) both require REXML version 2.3.4 to work correctly. Earlier versions will fail the tests. You can get REXML from: http://www.germane-software.com/software/rexml/ =========== CHANGES =========== ------------------ Changes from 1.03: ------------------ * Added HTMLTree::XMLParser, which makes a REXML document from the given HTML. * Changed HTMLTree::Element::print_on() to write() * Made it so that a string or IO can be passed to HTMLTree::Element::dump() * Made it so that a string or IO can be passed to HTMLTree::Element::write() ------------------ Changes from 1.02: ------------------ * added XPath and XML conversion (needs REXML) * Wrapped all code in namespaces. The following class names have changed: -- in html/element.rb HTMLDocument => HTMLTree::Document HTMLElement => HTMLTree::Element HTMLData => HTMLTree::Data HTMLComment => HTMLTree::Comment HTMLSpecial => HTMLTree::Special -- in html/tags.rb HTMLTag => HTML::Tag HTMLBlockTag => HTML::BlockTag HTMLInlineTag => HTML::InlineTag HTMLBlockOrInlineTag => HTML::BlockOrInlineTag HTMLEmptyTag => HTML::EmptyTag -- in html/tree.rb HTMLTreeParser => HTMLTree::Parser -- in html/stparser.rb StackingParser => HTML::StackingParser * added HTMLTree::Element.root() ------------------ Changes from 1.01: ------------------ * documented change to sgml-parser. * added bin/ebaySearch.rb example ------------------ Changes from 1.0: ------------------ * attributes now maintain their order. Though this probably isn't strictly necessary under HTML, it may make it easier to compare document versions. * the generated tree now has a top-level node for the document itself, so the DTD can be stored. THIS WILL REQUIRE CODE CHANGES if you have code that assumes that the root node is always . To find the node, you can use the new methods HTMLTreeParser#html() or HTMLDocument#html_node(): html = parser.html() Or, querying the tree: html = parser.tree.html_node() * comments are stored in the tree * added HTMLElement#print_on() to print a (sub)tree to an IO stream