You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Takumi Fujiwara <tr...@yahoo.com> on 2002/08/20 22:05:35 UTC
Entity Reference in NekoHTML parser
I am using NekoHTML Parser, I think it translates
entity references during parsing, is it possible to
turn OFF that feature (e.g. if it sees in the
text node, leave it as )?
Thank for any help.
__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: Entity Reference in NekoHTML parser
Posted by Andy Clark <an...@apache.org>.
Takumi Fujiwara wrote:
> Thanks for your response. But I don't want it to convert, I don't
> want it to create a DOM node either. I just want it to leave it as it
> is.
You can use a filter for this purpose. Here is an
example:
import org.cyberneko.html.filters.DefaultFilter;
import org.apache.xerces.util.XMLStringBuffer;
import org.apache.xerces.xni.Augmentations;
import org.apache.xerces.xni.XMLLocator;
import org.apache.xerces.xni.XMLResourceIdentifier;
import org.apache.xerces.xni.XMLString;
import org.apache.xerces.xni.XNIException;
public Entity2Text extends DefaultFilter {
boolean inEntityRef;
XMLStringBuffer buffer = new XMLStringBuffer();
public void startDocument(XMLLocator locator,
String encoding,
Augmentations augs)
throws XNIException {
super.startDocument(locator, encoding, augs);
inEntityRef = false;
}
public void characters(XMLString text,
Augmentations augs)
throws XNIException {
if (!inEntityRef) {
super.characters(text, augs);
}
}
public void startGeneralEntity(String name,
XMLResourceIdentifier id,
String encoding,
Augmentations augs)
throws XNIException {
inEntityRef = true;
buffer.clear();
buffer.append(name);
super.characters(buffer, augs);
}
public void endGeneralEntity(String name,
Augmentations augs)
throws XNIException {
inEntityRef = false;
}
}
Then you just turn on the notification of entity refs
and append the filter to the parsing pipeline. Like so:
XMLParserConfiguration config = new HTMLConfiguration();
config.setFeature("http://cyberneko.org/html/features/scanner/notify-builtin-refs",
true);
XMLDocumentFilter[] filters = { new Entity2Text() };
config.setProperty("http://cyberneko.org/html/properties/filters",
filters);
The "setFeature" and "setProperty" methods can also
be called on the parser classes.
Does this work for you?
P.S. I wrote the code by heart so I may have some
mistakes. But this should get you started.
--
Andy Clark * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: Entity Reference in NekoHTML parser
Posted by Takumi Fujiwara <tr...@yahoo.com>.
Andy,
Thanks for your response. But I don't want it to
convert, I don't want it to create a DOM node either.
I just want it to leave it as it is.
For example, the source is this:
<A> This is some text with & </a>
I want the parse to create a HTMLAnchor element and
then a Text element with node value " This is some
text with & ".
Is this possible?
Thank you.
--- Andy Clark <an...@apache.org> wrote:
> Takumi Fujiwara wrote:
> > I am using NekoHTML Parser, I think it translates
> > entity references during parsing, is it possible
> to
> > turn OFF that feature (e.g. if it sees in
> the
> > text node, leave it as )?
>
> Currently, it reports *all* character content which
> means converting to its character equivalent.
> However, you can set a feature so that the standard
> HTML entity boundaries are reported[1]. This would,
> for example, create a DOM entity ref node for "nbsp"
> and the others.
>
> Would this help your problem?
>
> [1]
>
http://www.apache.org/~andyc/neko/doc/html/settings.html
>
>
"http://cyberneko.org/html/features/scanner/notify-builtin-refs"
>
> --
> Andy Clark * andyc@apache.org
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail:
> xerces-j-user-help@xml.apache.org
>
__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: Entity Reference in NekoHTML parser
Posted by Andy Clark <an...@apache.org>.
Takumi Fujiwara wrote:
> I am using NekoHTML Parser, I think it translates
> entity references during parsing, is it possible to
> turn OFF that feature (e.g. if it sees in the
> text node, leave it as )?
Currently, it reports *all* character content which
means converting to its character equivalent.
However, you can set a feature so that the standard
HTML entity boundaries are reported[1]. This would,
for example, create a DOM entity ref node for "nbsp"
and the others.
Would this help your problem?
[1] http://www.apache.org/~andyc/neko/doc/html/settings.html
"http://cyberneko.org/html/features/scanner/notify-builtin-refs"
--
Andy Clark * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org