You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Martin Bischoff <ul...@rz.uni-karlsruhe.de> on 2004/03/04 16:16:15 UTC
Element content splitted up
Hi all.
I noticed the following while using Xindice 1.1b3 under Windows 2000
Professional and Java 1.4.2_01.
I added several documents to a collection. The content of each document is
like the following:
<Element type="Route"><Name>name_of_the_process_element</Name> ... </Element>
First I retrieve some documents with the following code:
String xpath = "//Element[TopLevel='true']";
ResourceSet resourceSet = null;
try {
resourceSet = queryService.query(xpath);
ResourceIterator resourceIterator = resourceSet.getIterator();
pe = new ProcessElement[(int) resourceSet.getSize()];
int i = 0;
while (resourceIterator.hasMoreResources()) {
XMLResource resource = (XMLResource) resourceIterator.nextResource();
pe[i].setName(DOMParser.toDocument((String)resource.getContent()).getElementsByTagName("Name").item(0).getFirstChild().getNodeValue()
);
++i;
}
} catch (XMLDBException e) {
System.err.println("XML:DB Exception occured " + e.errorCode + " " +
e.getMessage());
System.err.println(e);
} catch (XindiceException e) {
System.err.println(e);
}
this all works fine
When I afterwards execute the following
ProcessElement child = new ProcessElement();
try {
XMLResource resource = (XMLResource)processes.getResource(id);
Document doc = DOMParser.toDocument((String) resource.getContent());
child.setName(DOMParser.toDocument((String)resource.getContent()).getElementsByTagName("Name").item(0).getFirstChild().getNodeValue()
);
} catch (XMLDBException e) {
System.err.println("XML:DB Exception occured " + e.errorCode + " " +
e.getMessage());
System.err.println(e);
} catch (XindiceException e) {
System.err.println(e);
}
I notice that the name isn't complete. By debugging in Eclipse I found out
that the content of the element "Name" hast been splitted up into two child
elements of the type TextImpl. So, by using the getFirstChild() method I
only get one part of it.
I also noticed that the splitting takes place just after 64 characters of
the file.
Yes, I could paste all the child elements of the type TextImpl together,
but is there also another way?
Thanks
Martin
Re: Element content splitted up
Posted by Martin Bischoff <ul...@rz.uni-karlsruhe.de>.
>>AFAIU, this is normal mode of operation for XML parsers and DOM / SAX.
>>They can always split text into multiple TextNodes or character() events
>>as they want it. I'm not sure there is way around this; I think you
>>should collect all consecutive TextNodes and combine them.
>>Vadim
>
>There's a normalize() method in org.w3c.dom.Node that solves
>this problem, putting all Text nodes into one. Description:
Thanks all for your answers.
I changed to dom4j and found the getText() method returns the whole content
Martin
Re: Element content splitted up
Posted by Murray Altheim <m....@open.ac.uk>.
Vadim Gritsenko wrote:
> Martin Bischoff wrote:
>
>
>>By debugging in Eclipse I found out that the content of the element
>>"Name" hast been splitted up into two child elements of the type
>>TextImpl. So, by using the getFirstChild() method I only get one part
>>of it.
>
> AFAIU, this is normal mode of operation for XML parsers and DOM / SAX.
> They can always split text into multiple TextNodes or character() events
> as they want it. I'm not sure there is way around this; I think you
> should collect all consecutive TextNodes and combine them.
>
> Vadim
There's a normalize() method in org.w3c.dom.Node that solves
this problem, putting all Text nodes into one. Description:
Puts all Text nodes in the full depth of the sub-tree underneath this
Node, including attribute nodes, into a "normal" form where only
structure (e.g., elements, comments, processing instructions, CDATA
sections, and entity references) separates Text nodes, i.e., there are
neither adjacent Text nodes nor empty Text nodes. This can be used to
ensure that the DOM view of a document is the same as if it were saved
and re-loaded, and is useful when operations (such as XPointer
lookups) that depend on a particular document tree structure are to be
used. In cases where the document contains CDATASections, the normalize
operation alone may not be sufficient, since XPointers do not
differentiate between Text nodes and CDATASection nodes.
Murray
......................................................................
Murray Altheim http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK .
"Peoples' primary requirement is that some kind of coherence be
provided. Stories give people the feeling that there is meaning,
that there is ultimately an order lurking behind the incredible
confusion of appearances and phenomena that surrounds them. This
order is what people require more than anything else; yes, I
would almost say that the notion of order or story is connected
with the godhead. Stories are substitutes for God. Or maybe the
other way round." -- Wim Winders
Re: Element content splitted up
Posted by Vadim Gritsenko <va...@reverycodes.com>.
Martin Bischoff wrote:
> By debugging in Eclipse I found out that the content of the element
> "Name" hast been splitted up into two child elements of the type
> TextImpl. So, by using the getFirstChild() method I only get one part
> of it.
AFAIU, this is normal mode of operation for XML parsers and DOM / SAX.
They can always split text into multiple TextNodes or character() events
as they want it. I'm not sure there is way around this; I think you
should collect all consecutive TextNodes and combine them.
Vadim