You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Andy Clark <an...@apache.org> on 2002/05/07 13:46:27 UTC

[Announce] NekoHTML 0.5 Available

I just finished adding a bunch of somewhat minor but useful 
features to NekoHTML. So this message is to announce the 
availability of the new version of the NekoHTML parser for 
Xerces2 and the XNI framework.

New in this release:

  * fixed some location reporting information bugs;
  * added feature to report character boundaries of events via 
    the associated augmentations object; 
  * added feature to disable tag balancing; and 
  * added features to notify handlers of start and end of 
    character and built-in XML and HTML entity references.

You can pick up the new release at the following URL:

  http://www.apache.org/~andyc/

Probably the most notable is that you can retrieve the
character boundary information of each event. This allows
you to retrieve the beginning and ending line and column
information for each piece of markup and text. In addition, 
each attribute within a start element tag *also* passes 
along its boundary information.

When implementing this feature I found and fixed a bug in
the location reporting mechanism in the 0.4 release. Before,
in the case when a <META http-equiv=...> tag was discovered
and the character reader was changed, the location information
would not reset when the beginning of the file was re-parsed.
In addition, newlines appearing between attributes would
also mess-up the line counting. These bugs have now been
fixed.

Beside this big change, I added the ability to turn OFF the
tag balancing. This was added strictly for the performance
benefit when the application is only interested in the
occurrence of element, attribute, and/or content within the
document regardless of the ill-formed structure. This feature
should NOT be used to access an HTML document as XML and is
therefore turned ON by default.

However, this feature is really nice for a variety of apps
written with NekoHTML. For example, you could write a tool
that scours your local copy of a website, finding all of
the links within the HTML pages found there, and report
if any of the links are broken. (This would be a good way
to check your static site for broken links!) In this
situation, you don't need tag balancing -- you just need
to look for attributes like A/@href, IMG/@src, etc.

I mention this specific example because I have ALREADY
written a tool using NekoHTML that does exactly this! At 
the moment I'm cleaning up the UI a bit. I want to be able 
to click on a broken link and have it open the file and 
position the cursor at the problem link -- hence the 
addition of the first big change I mentioned. :)

Once done (or as near done as I care to make it), I will
release the tool under the same Apache-style license as
the NekoHTML parser. Besides the usefulness of the tool,
it is also a good example of how to use the XNI framework
to write components that emit XML events which then can
be used by any other XML processing code, like Xalan. 

In my tool, I have an object that searches local 
directories recursively, emitting XML events that give
information about each file found there. This information
can then be processing directly by Xalan to transform it
to an HTML report. Or it can be passed to the GUI that
displays the information in a Swing JTree, highlighting
the trees, files, and links that are troublesome.

I have other ideas for HTML/XML tools that I will
continue to develop and release to the community. I'm
writing them for my own personal use but you may also
find them useful.

Enjoy!

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org