You are viewing a plain text version of this content. The canonical link for it is here.

Posted to xindice-users@xml.apache.org by Martin Bischoff <ul...@rz.uni-karlsruhe.de> on 2004/03/04 16:16:15 UTC

Element content splitted up

Hi all.

I noticed the following while using Xindice 1.1b3 under Windows 2000 
Professional and Java 1.4.2_01.

I added several documents to a collection. The content of each document is 
like the following:
<Element type="Route"><Name>name_of_the_process_element</Name> ... </Element>

First I retrieve some documents with the following code:

String xpath = "//Element[TopLevel='true']";
ResourceSet resourceSet = null;
try {
	resourceSet = queryService.query(xpath);
	ResourceIterator resourceIterator = resourceSet.getIterator();
	pe = new ProcessElement[(int) resourceSet.getSize()];
	int i = 0;
	while (resourceIterator.hasMoreResources()) {
		XMLResource resource = (XMLResource) resourceIterator.nextResource();
		pe[i].setName(DOMParser.toDocument((String)resource.getContent()).getElementsByTagName("Name").item(0).getFirstChild().getNodeValue() 
);
		++i;
	}
} catch (XMLDBException e) {
	System.err.println("XML:DB Exception occured " + e.errorCode + " " + 
e.getMessage());
	System.err.println(e);
} catch (XindiceException e) {
	System.err.println(e);
}

this all works fine



When I afterwards execute the following

ProcessElement child = new ProcessElement();
try {
	XMLResource resource = (XMLResource)processes.getResource(id);
	Document doc = DOMParser.toDocument((String) resource.getContent());
	child.setName(DOMParser.toDocument((String)resource.getContent()).getElementsByTagName("Name").item(0).getFirstChild().getNodeValue() 
);
} catch (XMLDBException e) {
	System.err.println("XML:DB Exception occured " + e.errorCode + " " + 
e.getMessage());
	System.err.println(e);
} catch (XindiceException e) {
	System.err.println(e);
}

I notice that the name isn't complete. By debugging in Eclipse I found out 
that the content of the element "Name" hast been splitted up into two child 
elements of the type TextImpl. So, by using the getFirstChild() method I 
only get one part of it.

I also noticed that the splitting takes place just after 64 characters of 
the file.

Yes, I could paste all the child elements of the type TextImpl together, 
but is there also another way?

Thanks
   Martin

Re: Element content splitted up

Posted by Martin Bischoff <ul...@rz.uni-karlsruhe.de>.

>>AFAIU, this is normal mode of operation for XML parsers and DOM / SAX.
>>They can always split text into multiple TextNodes or character() events
>>as they want it. I'm not sure there is way around this; I think you
>>should collect all consecutive TextNodes and combine them.
>>Vadim
>
>There's a normalize() method in org.w3c.dom.Node that solves
>this problem, putting all Text nodes into one. Description:

Thanks all for your answers.

I changed to dom4j and found the getText() method returns the whole content

Martin

Re: Element content splitted up

Posted by Murray Altheim <m....@open.ac.uk>.

Vadim Gritsenko wrote:
> Martin Bischoff wrote:
> 
> 
>>By debugging in Eclipse I found out that the content of the element 
>>"Name" hast been splitted up into two child elements of the type 
>>TextImpl. So, by using the getFirstChild() method I only get one part 
>>of it.
>
> AFAIU, this is normal mode of operation for XML parsers and DOM / SAX.
> They can always split text into multiple TextNodes or character() events
> as they want it. I'm not sure there is way around this; I think you
> should collect all consecutive TextNodes and combine them.
> 
> Vadim

There's a normalize() method in org.w3c.dom.Node that solves
this problem, putting all Text nodes into one. Description:

   Puts all Text nodes in the full depth of the sub-tree underneath this
   Node, including attribute nodes, into a "normal" form where only
   structure (e.g., elements, comments, processing instructions, CDATA
   sections, and entity references) separates Text nodes, i.e., there are
   neither adjacent Text nodes nor empty Text nodes. This can be used to
   ensure that the DOM view of a document is the same as if it were saved
   and re-loaded, and is useful when operations (such as XPointer
   lookups) that depend on a particular document tree structure are to be
   used. In cases where the document contains CDATASections, the normalize
   operation alone may not be sufficient, since XPointers do not
   differentiate between Text nodes and CDATASection nodes.

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

   "Peoples' primary requirement is that some kind of coherence be
    provided. Stories give people the feeling that there is meaning,
    that there is ultimately an order lurking behind the incredible
    confusion of appearances and phenomena that surrounds them. This
    order is what people require more than anything else; yes, I
    would almost say that the notion of order or story is connected
    with the godhead. Stories are substitutes for God. Or maybe the
    other way round." -- Wim Winders

Re: Element content splitted up

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Martin Bischoff wrote:

> By debugging in Eclipse I found out that the content of the element 
> "Name" hast been splitted up into two child elements of the type 
> TextImpl. So, by using the getFirstChild() method I only get one part 
> of it.

AFAIU, this is normal mode of operation for XML parsers and DOM / SAX.
They can always split text into multiple TextNodes or character() events
as they want it. I'm not sure there is way around this; I think you
should collect all consecutive TextNodes and combine them.

Vadim