You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by gheidorn <gr...@gmail.com> on 2013/01/23 04:41:06 UTC

Tokenize Producing XML That is Not Well-Formed

I have the following XML structure:


  *<c />*
  *<c />*


Element c is optional.  I'm using split tokenize on element b, which works
great when element c is present.  I get tokens in the form of:

*<c />*
*<c />*

The issue is when c is not present, tokenize returns tokens in the form of:

* *

For some odd reason the closing tag for the root node is appearing in my
first token and this triggers an IOException caused by:

Caused by: javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException: The markup in the document following the
root element must be well-formed.]

For what it's worth, the original XML is passing an earlier XSD check with
no problems, so I know the XML that is getting split into tokens is
well-formed and validated.

Has anyone run into this issue before?



--
View this message in context: http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Tokenize Producing XML That is Not Well-Formed

Posted by Henryk Konsek <he...@gmail.com>.
> If you agree, I'll submit a JIRA issue and can
> work on a patch.

Good catch Greg :) . I created the appropriate Jira issue [1]. We
would appreciate if you contribute the patch for the bug you detected.

[1] https://issues.apache.org/jira/browse/CAMEL-6012

--
Henryk Konsek
http://henryk-konsek.blogspot.com

Re: Tokenize Producing XML That is Not Well-Formed

Posted by Christian Müller <ch...@gmail.com>.
+1
We love contributions ;-)
Have a look at http://camel.apache.org/contributing.html

Best,
Christian

Sent from a mobile device
Am 23.01.2013 17:37 schrieb "gheidorn" <gr...@gmail.com>:

> Christian, I created a JUnit that illustrates the issue (see attached).  I
> believe we should enhance the TokenXMLPairExpressionIterator to account for
> self-closing XML tokens.  If you agree, I'll submit a JIRA issue and can
> work on a patch.
>
>
>
> --
> View this message in context:
> http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726081.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>

Re: Tokenize Producing XML That is Not Well-Formed

Posted by gheidorn <gr...@gmail.com>.
Christian, I created a JUnit that illustrates the issue (see attached).  I
believe we should enhance the TokenXMLPairExpressionIterator to account for
self-closing XML tokens.  If you agree, I'll submit a JIRA issue and can
work on a patch.



--
View this message in context: http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726081.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Tokenize Producing XML That is Not Well-Formed

Posted by Christian Müller <ch...@gmail.com>.
You should may consider to implement your own splitter bean for this case.

Best,
Christian

Sent from a mobile device
Am 23.01.2013 08:05 schrieb "gheidorn" <gr...@gmail.com>:

> Alright that wasn't quite it.  I continue to have problems with tokenizing
> elements missing optional children as originally stated.
>
>
>
> --
> View this message in context:
> http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726039.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>

Re: Tokenize Producing XML That is Not Well-Formed

Posted by gheidorn <gr...@gmail.com>.
Alright that wasn't quite it.  I continue to have problems with tokenizing
elements missing optional children as originally stated.



--
View this message in context: http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726039.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Tokenize Producing XML That is Not Well-Formed

Posted by gheidorn <gr...@gmail.com>.
In classic fashion, I've answered my own question after a night of debugging.

For posterity, I will share that I was converting the String representation
of my XML into a w3c Document object, and then splitting that Document
object using tokenize xml.  Probably not intended to work like that!  When I
left the original XML as a string, the tokenize xml works just fine in my
scenario.

That being said, it seems like tokenizing on a Document object is "almost
there" in terms of functionality ...only my edge case isn't working (when
you are missing optional child elements).



--
View this message in context: http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726038.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Tokenize Producing XML That is Not Well-Formed

Posted by gheidorn <gr...@gmail.com>.
I tracked the code back to the TokenXMLPairExpressionIterator, which as the
name indicates, doesn't check to see if the token is self-closing.  I'm
going to open a JIRA Issue to see if we can build in support for
self-closing XML tokens.  I'll see if I can submit a patch for review today.



--
View this message in context: http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726079.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Tokenize Producing XML That is Not Well-Formed

Posted by gheidorn <gr...@gmail.com>.
I wrote a short JUnit and found that if the tag has a closing tag, then the
tokenizeXML works correctly.  If the tag is self-closing, then tokenizeXML
fails.

I have attached the JUnit and am currently walking the code to see if I can
pinpoint the class that does the tokenizeXML to see if I can patch it to
accept self-closing tags.

GenericTokenizeTest.java
<http://camel.465427.n5.nabble.com/file/n5726078/GenericTokenizeTest.java>  



--
View this message in context: http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726078.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Tokenize Producing XML That is Not Well-Formed

Posted by gheidorn <gr...@gmail.com>.
I attached my camel.xml configuration, but here is the route pseudocode where
the issue lies:

route
  sftp
  doTry
    to validator:my.xsd
    split strategyRef=myAggregationStrategy
      tokenize token=ad xml=true
      log message=in.body

camel.xml <http://camel.465427.n5.nabble.com/file/n5726070/camel.xml>  



--
View this message in context: http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726070.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Tokenize Producing XML That is Not Well-Formed

Posted by Henryk Konsek <he...@gmail.com>.
> I updated my XML to escape properly for viewing.  Thanks in advance!

I'm not getting the issue :) . Could you send routes you're using?

--
Henryk Konsek
http://henryk-konsek.blogspot.com

Re: Tokenize Producing XML That is Not Well-Formed

Posted by gheidorn <gr...@gmail.com>.
I updated my XML to escape properly for viewing.  Thanks in advance!



--
View this message in context: http://camel.465427.n5.nabble.com/Tokenize-Producing-XML-That-is-Not-Well-Formed-tp5726035p5726036.html
Sent from the Camel - Users mailing list archive at Nabble.com.