You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Aki Yoshida (JIRA)" <ji...@apache.org> on 2014/05/27 18:46:03 UTC

[jira] [Commented] (CAMEL-7468) Make xmlTokenizer more xml-aware so that it can handle more flexible structures

    [ https://issues.apache.org/jira/browse/CAMEL-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009880#comment-14009880 ] 

Aki Yoshida commented on CAMEL-7468:
------------------------------------

I added a new version that uses the stax parser to search for the target token and extract the token from its underling buffer directly.

As XML tokenizing is inherently different from the non-xml tokenizing. I created its own language and expression for this new xml tokenizer.

I noticed there is a difference in the behavior of XMLStreamReader.getLocation() between woodstox (com.ctc.wstx.sr.ValidatingStreamReader) and JDK (com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl). Namely, woodstox returns the location at the beginning of the token whereas JDK returns the location at the end of the token. For example, when at START_ELEMENT, woodstox returns the position of "<" of that start tag, whereas JDK returns the position of ">" of that tag.

I need to get this behavior clarified and I'll probably need to add an auto-detect mechanism.


> Make xmlTokenizer more xml-aware so that it can handle more flexible structures
> -------------------------------------------------------------------------------
>
>                 Key: CAMEL-7468
>                 URL: https://issues.apache.org/jira/browse/CAMEL-7468
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-core
>            Reporter: Aki Yoshida
>            Assignee: Aki Yoshida
>             Fix For: 2.14.0
>
>
> The existing xmlTokenizer can tokenize an XML document using the specified element tag name and produce a series of tokens that are either the child tokens with the injected namespace declarations from its parent node or the tokens wrapped in their ancestor elements.
> That implementation has several limitations:
> - a specific namespace cannot be specified.
> - a specific hierarchy cannot be specified.
> - the wrap mode assumes each token to have the same ancestor path.
> This patch will remove these limitations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)