You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by furchess123 <co...@hotmail.com> on 2015/10/28 20:58:01 UTC

split().tokenize() w/regex-type token arg and grouping DOESN'T GROUP split items

If a regular expression used in the /tokenize(...)/ method when splitting a
file payload by lines, the "group" method argument is ignored and no lines
are grouped.  For example, consider the following code to split the file
into exchanges with 100 lines per exchange. The regular expression (the
first argument) is used to ensure that any possible type of line-break is
recognized. The second argument ("true") indicates that the provided token
separator string is a regular expression. The third argument indicates that
the split lines must be grouped into exchanges with 100 lines in each.

/split().tokenize/(*"\n|\r\n|\r"*, *true*, 100)...

This results in files being split into individual lines, indeed. However,
the *grouping is completely ignored*. Always. Tested in Camel vv 2.15.2 and
2.16.0.


Is this a known issue? A bug? Is there a way to write a splitter DSL that
configures the route to correctly split a file while being /completely
agnostic of the origin of the file/ (e.g. without having to explicitly
provide one particular type of line separator character sequence that is not
a regex)??



--
View this message in context: http://camel.465427.n5.nabble.com/split-tokenize-w-regex-type-token-arg-and-grouping-DOESN-T-GROUP-split-items-tp5773166.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: split().tokenize() w/regex-type token arg and grouping DOESN'T GROUP split items

Posted by furchess123 <co...@hotmail.com>.
Thank you, Claus!



--
View this message in context: http://camel.465427.n5.nabble.com/split-tokenize-w-regex-type-token-arg-and-grouping-DOESN-T-GROUP-split-items-tp5773166p5773186.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: split().tokenize() w/regex-type token arg and grouping DOESN'T GROUP split items

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

It was recently fixed by a PR with commit id
20259e405e3b349e7c860ae563e8e941446aaaba

You can just create the tokenizer expression manually and set all the options

                TokenizerExpression tz = new TokenizerExpression();
                tz.setGroup(1000);
                tz.setToken(xxxx);

                from("direct:start")
                    .split(tz)
                        .to("mock:split")
                    .end()
                    .to("mock:result");

On Wed, Oct 28, 2015 at 9:26 PM, furchess123 <co...@hotmail.com> wrote:
> Just checked the Camel source code:
>
>     /**
>      * Evaluates a token expression on the message body
>      *
>      * @param token the token
>      * @param regex whether the token is a regular expression or not
>      * @param group to group by the given number
>      * @return the builder to continue processing the DSL
>      */
>     public T tokenize(String token, boolean regex, *int group*) {
>         return delegate.tokenize(token, regex);
>     }
>
> *The  "group" parameter is never used!* Hello????
>
> Is that an oversight? Why is it not documented, or, for that matter, why is
> the documentation misleading and inaccurate? This seems like a major flaw.
> Is there really no way to configure a system-agnostic file splitter that
> would group lines? Or, perhaps, that works if I use XML configuration? But
> I'd hate that, I have no XML in my application, everything is Java
> configured.  Can anyone advise, please?
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/split-tokenize-w-regex-type-token-arg-and-grouping-DOESN-T-GROUP-split-items-tp5773166p5773167.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2nd edition:
https://www.manning.com/books/camel-in-action-second-edition

Re: split().tokenize() w/regex-type token arg and grouping DOESN'T GROUP split items

Posted by furchess123 <co...@hotmail.com>.
Just checked the Camel source code:

    /**
     * Evaluates a token expression on the message body
     *
     * @param token the token
     * @param regex whether the token is a regular expression or not
     * @param group to group by the given number
     * @return the builder to continue processing the DSL
     */
    public T tokenize(String token, boolean regex, *int group*) {
        return delegate.tokenize(token, regex);
    }

*The  "group" parameter is never used!* Hello???? 

Is that an oversight? Why is it not documented, or, for that matter, why is
the documentation misleading and inaccurate? This seems like a major flaw.
Is there really no way to configure a system-agnostic file splitter that
would group lines? Or, perhaps, that works if I use XML configuration? But
I'd hate that, I have no XML in my application, everything is Java
configured.  Can anyone advise, please? 



--
View this message in context: http://camel.465427.n5.nabble.com/split-tokenize-w-regex-type-token-arg-and-grouping-DOESN-T-GROUP-split-items-tp5773166p5773167.html
Sent from the Camel - Users mailing list archive at Nabble.com.