You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@camel.apache.org by furchess123 <co...@hotmail.com> on 2015/11/02 15:57:50 UTC

Re: correct way to provide regex in TokenizerExpression?

Hi Claus,
thank you for responding. The problem we are seeing currently is that, if we
provide a regex to the tokenizer to detect token delimiters, the tokenizer
inserts that expression literal into the payload itself - while replacing
the actual delimiters matched by the regex. I think you will agree that
modifying the original payload in any way other than splitting it into
chunks is not a desirable behavior.

I think the most natural and logical way would be to correct the existing
tokenizer functionality to:

1) Correctly identify the individual tokens by matching the delimiters
using the provided regular expression (as it is done today, indeed);
b) Ensure that the resulting exchange message body (a group of N tokens)
retains the original token separators (vs. them being replaced by the regex
literal.)

Also, for all it's worth, perhaps it would be helpful to slightly change the
terminology in the API documentation. What is currently described as the
"token" argument (or "token expression") to the tokenize() method is
actually the "token /delimiter/ expression" - the expression that matches
the delimiters that separate the tokens in the payload. So, in the case of a
file being split into lines or groups of lines, a token represents a line,
obviously, not the separator/delimiter. ;)

--
View this message in context: http://camel.465427.n5.nabble.com/correct-way-to-provide-regex-in-TokenizerExpression-tp5773192p5773322.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: correct way to provide regex in TokenizerExpression?

Posted by SWI <Se...@aixigo.de>.

Hi,

I totally aggree on furchess post and I guess issue
https://issues.apache.org/jira/browse/CAMEL-9241 is related to this topic.
Having the regex literal as delimiter on the grouped result seems broken. 

Actually we replace the "regex literal" after the tokenize took place but it
seems like bad idea to anticipate the "matching" delimiter.

Is there a way to upvote this issue?

Regards,

SWI




--
View this message in context: http://camel.465427.n5.nabble.com/correct-way-to-provide-regex-in-TokenizerExpression-tp5773192p5801297.html
Sent from the Camel - Users mailing list archive at Nabble.com.