You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Stéphane Maarek <st...@gmail.com> on 2016/07/01 01:04:43 UTC

RouteText questions (regex, grouping, performance)

Hi,

I have a question regarding RouteText. The processor works just fine for me
but maybe I'm missing a couple subtleties:

1) I have a regex to group data by (a pair of IDs), but what do I use the
grouping attribute for? I still get as many outputs as lines
2) My data is coming from a listenUDP. If my batch size is 1, RouteText is
having a lot of trouble processing all the data. I would guess that it
compiles the regex everytime it is executed, is it correct? When I increase
the batch size to 100, RouteText processes everything well. I was wondering
if there could be some sort of optimization on the RouteText to keep the
regex compile nonetheless of the state of the processor?


Thanks a lot!
Stephane

Re: RouteText questions (regex, grouping, performance)

Posted by Mark Payne <ma...@hotmail.com>.

Stephane,

Excellent. In this case, I would say that it may require a bit of experimentation, but I would think that #1 would perform
better, in most cases. #2 would have to read the data only once but would require a lot of CPU to evaluate that regex
(evaluating .* in a regex is super expensive). SplitText would have to read the data again to scan for new-line characters,
but if your system has a reasonable amount of RAM, chances are that the data will be stored in your Operating System's
disk cache anyway, so you will end up reading the content from disk for SplitText. So I think SplitText will yield better
performance for you.

Thanks
-Mark


> On Jul 6, 2016, at 9:17 PM, Stéphane Maarek <st...@gmail.com> wrote:
> 
> Hi Mark,
> 
> Thanks a lot for the insights. I'm using RouteText because I needed the ${line} attribute. I've separated my disks and added the logging you recommended. 
> Final question, and that's I guess a little optimization:
> Is it better to 
> 1) RouteText with an empty group field, then having a splitline processor OR
> 2) RouteText with a group field being (.*), and as my lines are unique, they'll come out already splitted
> 
> Thanks!
> Stephane
> 
> On Thu, Jul 7, 2016 at 1:31 AM Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> wrote:
> Stephane,
> 
> So the Processors that you mention there mostly would require that you split your data up into one-line chunks.
> 
> When you indicate that the expression you would use is "${filename:contains('new'):and(filename:contains('2016'))}"
> that looks like you are routing only on the attributes, not on the content of the text itself. If this is the case, you should
> use RouteOnAttribute, as it will be much more efficient than RouteText. In general, though, that expression would be
> much more efficient than using a regex to match against .*new.*2016.*
> 
> So I would certainly recommend using RouteOnAttribute and using the Expression Language to route based on attributes.
> You can also just add two different properties:
> 
> containsNew = ${filename:contains('new')}
> is2016 = ${filename:contains('2016')}
> 
> And then set the routing strategy to Route to 'match' if all match. This will help make the processor's configuration easier
> to understand if you look at it again in the future.
> 
> Ingesting 1000 packets per second should not be a problem at all on a single node. Some things to consider:
> 
> - Ideally, you would have a separate disk for your content repo, your flowfile repo, and your prov repo.
> 
> - You may want to change the log level to WARN for processors (by adding to your conf/logback.xml <logger name="org.apache.nifi.processors" level="WARN" />)
>   This may or may not make a difference, depending on how resource constrained your disks are.
> 
> - Making the change above to use RouteOnAttribute will certainly help alleviate pressure on both your CPU and your disk.
> 
> - If you don't have enough disks to separate out each of your repositories, would recommend at least putting prov repo on its own disk.
> 
> - If you do have enough disks, you can strip the content repo and your prov repo across multiple disks to scale vertically, and you'll
>   see much better performance this way.
> 
> 
> Thanks
> -Markk
> 
> 
>> On Jul 3, 2016, at 8:27 PM, Stéphane Maarek <stephane.maarek@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi Mark,
>> 
>> 1. I send Flowfile coming through a ListenUDP, with a batch of 100. So most of the time, the flowfiles are multiple lines long. Yet, after the route text, I get as many flowfiles as lines, regardless of the grouping parameter. Is that expected?
>> 
>> 2. I have opened a JIRA: https://issues.apache.org/jira/browse/NIFI-2169 <https://issues.apache.org/jira/browse/NIFI-2169> 
>> 
>> I have few questions:
>> Regarding the fact that it's better to operate on text that have many lines, and if I manage to get RouteText to output many lines:
>>  a) Can ExtractText, ReplaceText, PutMongo, ConvertJSONtoSQL, PutSQL operate on each individual line within a flowfile? (that's basically all the components in my flow)
>> b) is satisfies expression: ${filename:contains('new'):and(filename:contains('2016'))} going to perform better than RegEx: .*new.*2016.* ?
>> c) I have a lot of data coming in (1000 udp packets a second), and yes, the provenance database has been cramming because we have 6 processors dealing with this flow before the data exits NiFi. Are there any optimization I could deal with out of the box?
>> 
>> Thanks,
>> Stephane
>> 
>> On Fri, Jul 1, 2016 at 10:48 PM Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> wrote:
>> Hi Stephane,
>> 
>> For #1, when you say that you get as many output as lines of text, are you sending in FlowFiles that are only
>> one line of text each? The Processor does not aggregate multiple FlowFiles together, so if you are sending in
>> 1-line FlowFiles, it can only route that FlowFile in 1-line outputs.
>> 
>> Re #2: The regular expression is compiled every time. This is done, though, because the Regex allows the Expression
>> Language to be used, so the Regex could actually be different for each FlowFile. That being said, it could certainly be
>> improved by either (a) pre-compiling in the case that no Expression Language is used and/or (b) cache up to say 10
>> Regex'es once they are compiled. Do you mind filing a JIRA to improve the efficiency of this processor?
>> 
>> Also, when you say that the processor is having trouble keeping up with a batch size of 1, there are a few thoughts that
>> come to mind:
>> 
>> * How many concurrent tasks do you have assigned to the processor? Have you tried increasing it?
>> * When processing text in NiFi it is is generally going to be much more efficient to process a single FlowFile with many lines,
>> instead of many small FlowFiles, due to the expense of the Data Provenance that has to be generated. There are some things
>> that we can do to improve efficiency of the data provenance as well, but those improvements have generally been made
>> 'high' priority rather than 'extremely high priority' :) so i would expect to see them coming out possibly toward the end of this year,
>> after 1.0 and a few other major features come out.
>> * Rather than using a Regular Expression, the "Satisfies Expression" Matching Strategy is likely to be more efficient in many cases
>> if it is able to provide the routing logic that you need. It also tends to be easier to read than regular expressions, which is nice when
>> you (or someone else) goes back later to modify the flow.
>> 
>> Please let me know if anything here doesn't make sense or if you have any more questions.
>> 
>> Thanks!
>> -Mark
>> 
>> 
>> > On Jun 30, 2016, at 9:04 PM, Stéphane Maarek <stephane.maarek@gmail.com <ma...@gmail.com>> wrote:
>> >
>> > Hi,
>> >
>> > I have a question regarding RouteText. The processor works just fine for me but maybe I'm missing a couple subtleties:
>> >
>> > 1) I have a regex to group data by (a pair of IDs), but what do I use the grouping attribute for? I still get as many outputs as lines
>> > 2) My data is coming from a listenUDP. If my batch size is 1, RouteText is having a lot of trouble processing all the data. I would guess that it compiles the regex everytime it is executed, is it correct? When I increase the batch size to 100, RouteText processes everything well. I was wondering if there could be some sort of optimization on the RouteText to keep the regex compile nonetheless of the state of the processor?
>> >
>> >
>> > Thanks a lot!
>> > Stephane
>> 
>

Re: RouteText questions (regex, grouping, performance)

Posted by Stéphane Maarek <st...@gmail.com>.

Hi Mark,

Thanks a lot for the insights. I'm using RouteText because I needed the
${line} attribute. I've separated my disks and added the logging you
recommended.
Final question, and that's I guess a little optimization:
Is it better to
1) RouteText with an empty group field, then having a splitline processor OR
2) RouteText with a group field being (.*), and as my lines are unique,
they'll come out already splitted

Thanks!
Stephane

On Thu, Jul 7, 2016 at 1:31 AM Mark Payne <ma...@hotmail.com> wrote:

> Stephane,
>
> So the Processors that you mention there mostly would require that you
> split your data up into one-line chunks.
>
> When you indicate that the expression you would use is
> "${filename:contains('new'):and(filename:contains('2016'))}"
> that looks like you are routing only on the attributes, not on the content
> of the text itself. If this is the case, you should
> use RouteOnAttribute, as it will be much more efficient than RouteText. In
> general, though, that expression would be
> much more efficient than using a regex to match against .*new.*2016.*
>
> So I would certainly recommend using RouteOnAttribute and using the
> Expression Language to route based on attributes.
> You can also just add two different properties:
>
> containsNew = ${filename:contains('new')}
> is2016 = ${filename:contains('2016')}
>
> And then set the routing strategy to Route to 'match' if all match. This
> will help make the processor's configuration easier
> to understand if you look at it again in the future.
>
> Ingesting 1000 packets per second should not be a problem at all on a
> single node. Some things to consider:
>
> - Ideally, you would have a separate disk for your content repo, your
> flowfile repo, and your prov repo.
>
> - You may want to change the log level to WARN for processors (by adding
> to your conf/logback.xml <logger name="org.apache.nifi.processors"
> level="WARN" />)
>   This may or may not make a difference, depending on how resource
> constrained your disks are.
>
> - Making the change above to use RouteOnAttribute will certainly help
> alleviate pressure on both your CPU and your disk.
>
> - If you don't have enough disks to separate out each of your
> repositories, would recommend at least putting prov repo on its own disk.
>
> - If you do have enough disks, you can strip the content repo and your
> prov repo across multiple disks to scale vertically, and you'll
>   see much better performance this way.
>
>
> Thanks
> -Markk
>
>
> On Jul 3, 2016, at 8:27 PM, Stéphane Maarek <st...@gmail.com>
> wrote:
>
> Hi Mark,
>
> 1. I send Flowfile coming through a ListenUDP, with a batch of 100. So
> most of the time, the flowfiles are multiple lines long. Yet, after the
> route text, I get as many flowfiles as lines, regardless of the grouping
> parameter. Is that expected?
>
> 2. I have opened a JIRA: https://issues.apache.org/jira/browse/NIFI-2169
>
> I have few questions:
> Regarding the fact that it's better to operate on text that have many
> lines, and if I manage to get RouteText to output many lines:
>  a) Can ExtractText, ReplaceText, PutMongo, ConvertJSONtoSQL, PutSQL
> operate on each individual line within a flowfile? (that's basically all
> the components in my flow)
> b) is satisfies expression:
> ${filename:contains('new'):and(filename:contains('2016'))} going to perform
> better than RegEx: .*new.*2016.* ?
> c) I have a lot of data coming in (1000 udp packets a second), and yes,
> the provenance database has been cramming because we have 6 processors
> dealing with this flow before the data exits NiFi. Are there any
> optimization I could deal with out of the box?
>
> Thanks,
> Stephane
>
> On Fri, Jul 1, 2016 at 10:48 PM Mark Payne <ma...@hotmail.com> wrote:
>
>> Hi Stephane,
>>
>> For #1, when you say that you get as many output as lines of text, are
>> you sending in FlowFiles that are only
>> one line of text each? The Processor does not aggregate multiple
>> FlowFiles together, so if you are sending in
>> 1-line FlowFiles, it can only route that FlowFile in 1-line outputs.
>>
>> Re #2: The regular expression is compiled every time. This is done,
>> though, because the Regex allows the Expression
>> Language to be used, so the Regex could actually be different for each
>> FlowFile. That being said, it could certainly be
>> improved by either (a) pre-compiling in the case that no Expression
>> Language is used and/or (b) cache up to say 10
>> Regex'es once they are compiled. Do you mind filing a JIRA to improve the
>> efficiency of this processor?
>>
>> Also, when you say that the processor is having trouble keeping up with a
>> batch size of 1, there are a few thoughts that
>> come to mind:
>>
>> * How many concurrent tasks do you have assigned to the processor? Have
>> you tried increasing it?
>> * When processing text in NiFi it is is generally going to be much more
>> efficient to process a single FlowFile with many lines,
>> instead of many small FlowFiles, due to the expense of the Data
>> Provenance that has to be generated. There are some things
>> that we can do to improve efficiency of the data provenance as well, but
>> those improvements have generally been made
>> 'high' priority rather than 'extremely high priority' :) so i would
>> expect to see them coming out possibly toward the end of this year,
>> after 1.0 and a few other major features come out.
>> * Rather than using a Regular Expression, the "Satisfies Expression"
>> Matching Strategy is likely to be more efficient in many cases
>> if it is able to provide the routing logic that you need. It also tends
>> to be easier to read than regular expressions, which is nice when
>> you (or someone else) goes back later to modify the flow.
>>
>> Please let me know if anything here doesn't make sense or if you have any
>> more questions.
>>
>> Thanks!
>> -Mark
>>
>>
>> > On Jun 30, 2016, at 9:04 PM, Stéphane Maarek <st...@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I have a question regarding RouteText. The processor works just fine
>> for me but maybe I'm missing a couple subtleties:
>> >
>> > 1) I have a regex to group data by (a pair of IDs), but what do I use
>> the grouping attribute for? I still get as many outputs as lines
>> > 2) My data is coming from a listenUDP. If my batch size is 1, RouteText
>> is having a lot of trouble processing all the data. I would guess that it
>> compiles the regex everytime it is executed, is it correct? When I increase
>> the batch size to 100, RouteText processes everything well. I was wondering
>> if there could be some sort of optimization on the RouteText to keep the
>> regex compile nonetheless of the state of the processor?
>> >
>> >
>> > Thanks a lot!
>> > Stephane
>>
>>
>

Re: RouteText questions (regex, grouping, performance)

Posted by Mark Payne <ma...@hotmail.com>.

Stephane,

So the Processors that you mention there mostly would require that you split your data up into one-line chunks.

When you indicate that the expression you would use is "${filename:contains('new'):and(filename:contains('2016'))}"
that looks like you are routing only on the attributes, not on the content of the text itself. If this is the case, you should
use RouteOnAttribute, as it will be much more efficient than RouteText. In general, though, that expression would be
much more efficient than using a regex to match against .*new.*2016.*

So I would certainly recommend using RouteOnAttribute and using the Expression Language to route based on attributes.
You can also just add two different properties:

containsNew = ${filename:contains('new')}
is2016 = ${filename:contains('2016')}

And then set the routing strategy to Route to 'match' if all match. This will help make the processor's configuration easier
to understand if you look at it again in the future.

Ingesting 1000 packets per second should not be a problem at all on a single node. Some things to consider:

- Ideally, you would have a separate disk for your content repo, your flowfile repo, and your prov repo.

- You may want to change the log level to WARN for processors (by adding to your conf/logback.xml <logger name="org.apache.nifi.processors" level="WARN" />)
  This may or may not make a difference, depending on how resource constrained your disks are.

- Making the change above to use RouteOnAttribute will certainly help alleviate pressure on both your CPU and your disk.

- If you don't have enough disks to separate out each of your repositories, would recommend at least putting prov repo on its own disk.

- If you do have enough disks, you can strip the content repo and your prov repo across multiple disks to scale vertically, and you'll
  see much better performance this way.


Thanks
-Markk


> On Jul 3, 2016, at 8:27 PM, Stéphane Maarek <st...@gmail.com> wrote:
> 
> Hi Mark,
> 
> 1. I send Flowfile coming through a ListenUDP, with a batch of 100. So most of the time, the flowfiles are multiple lines long. Yet, after the route text, I get as many flowfiles as lines, regardless of the grouping parameter. Is that expected?
> 
> 2. I have opened a JIRA: https://issues.apache.org/jira/browse/NIFI-2169 <https://issues.apache.org/jira/browse/NIFI-2169> 
> 
> I have few questions:
> Regarding the fact that it's better to operate on text that have many lines, and if I manage to get RouteText to output many lines:
>  a) Can ExtractText, ReplaceText, PutMongo, ConvertJSONtoSQL, PutSQL operate on each individual line within a flowfile? (that's basically all the components in my flow)
> b) is satisfies expression: ${filename:contains('new'):and(filename:contains('2016'))} going to perform better than RegEx: .*new.*2016.* ?
> c) I have a lot of data coming in (1000 udp packets a second), and yes, the provenance database has been cramming because we have 6 processors dealing with this flow before the data exits NiFi. Are there any optimization I could deal with out of the box?
> 
> Thanks,
> Stephane
> 
> On Fri, Jul 1, 2016 at 10:48 PM Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> wrote:
> Hi Stephane,
> 
> For #1, when you say that you get as many output as lines of text, are you sending in FlowFiles that are only
> one line of text each? The Processor does not aggregate multiple FlowFiles together, so if you are sending in
> 1-line FlowFiles, it can only route that FlowFile in 1-line outputs.
> 
> Re #2: The regular expression is compiled every time. This is done, though, because the Regex allows the Expression
> Language to be used, so the Regex could actually be different for each FlowFile. That being said, it could certainly be
> improved by either (a) pre-compiling in the case that no Expression Language is used and/or (b) cache up to say 10
> Regex'es once they are compiled. Do you mind filing a JIRA to improve the efficiency of this processor?
> 
> Also, when you say that the processor is having trouble keeping up with a batch size of 1, there are a few thoughts that
> come to mind:
> 
> * How many concurrent tasks do you have assigned to the processor? Have you tried increasing it?
> * When processing text in NiFi it is is generally going to be much more efficient to process a single FlowFile with many lines,
> instead of many small FlowFiles, due to the expense of the Data Provenance that has to be generated. There are some things
> that we can do to improve efficiency of the data provenance as well, but those improvements have generally been made
> 'high' priority rather than 'extremely high priority' :) so i would expect to see them coming out possibly toward the end of this year,
> after 1.0 and a few other major features come out.
> * Rather than using a Regular Expression, the "Satisfies Expression" Matching Strategy is likely to be more efficient in many cases
> if it is able to provide the routing logic that you need. It also tends to be easier to read than regular expressions, which is nice when
> you (or someone else) goes back later to modify the flow.
> 
> Please let me know if anything here doesn't make sense or if you have any more questions.
> 
> Thanks!
> -Mark
> 
> 
> > On Jun 30, 2016, at 9:04 PM, Stéphane Maarek <stephane.maarek@gmail.com <ma...@gmail.com>> wrote:
> >
> > Hi,
> >
> > I have a question regarding RouteText. The processor works just fine for me but maybe I'm missing a couple subtleties:
> >
> > 1) I have a regex to group data by (a pair of IDs), but what do I use the grouping attribute for? I still get as many outputs as lines
> > 2) My data is coming from a listenUDP. If my batch size is 1, RouteText is having a lot of trouble processing all the data. I would guess that it compiles the regex everytime it is executed, is it correct? When I increase the batch size to 100, RouteText processes everything well. I was wondering if there could be some sort of optimization on the RouteText to keep the regex compile nonetheless of the state of the processor?
> >
> >
> > Thanks a lot!
> > Stephane
>

Re: RouteText questions (regex, grouping, performance)

Posted by Stéphane Maarek <st...@gmail.com>.

Hi Mark,

1. I send Flowfile coming through a ListenUDP, with a batch of 100. So most
of the time, the flowfiles are multiple lines long. Yet, after the route
text, I get as many flowfiles as lines, regardless of the grouping
parameter. Is that expected?

2. I have opened a JIRA: https://issues.apache.org/jira/browse/NIFI-2169

I have few questions:
Regarding the fact that it's better to operate on text that have many
lines, and if I manage to get RouteText to output many lines:
 a) Can ExtractText, ReplaceText, PutMongo, ConvertJSONtoSQL, PutSQL
operate on each individual line within a flowfile? (that's basically all
the components in my flow)
b) is satisfies expression:
${filename:contains('new'):and(filename:contains('2016'))} going to perform
better than RegEx: .*new.*2016.* ?
c) I have a lot of data coming in (1000 udp packets a second), and yes, the
provenance database has been cramming because we have 6 processors dealing
with this flow before the data exits NiFi. Are there any optimization I
could deal with out of the box?

Thanks,
Stephane

On Fri, Jul 1, 2016 at 10:48 PM Mark Payne <ma...@hotmail.com> wrote:

> Hi Stephane,
>
> For #1, when you say that you get as many output as lines of text, are you
> sending in FlowFiles that are only
> one line of text each? The Processor does not aggregate multiple FlowFiles
> together, so if you are sending in
> 1-line FlowFiles, it can only route that FlowFile in 1-line outputs.
>
> Re #2: The regular expression is compiled every time. This is done,
> though, because the Regex allows the Expression
> Language to be used, so the Regex could actually be different for each
> FlowFile. That being said, it could certainly be
> improved by either (a) pre-compiling in the case that no Expression
> Language is used and/or (b) cache up to say 10
> Regex'es once they are compiled. Do you mind filing a JIRA to improve the
> efficiency of this processor?
>
> Also, when you say that the processor is having trouble keeping up with a
> batch size of 1, there are a few thoughts that
> come to mind:
>
> * How many concurrent tasks do you have assigned to the processor? Have
> you tried increasing it?
> * When processing text in NiFi it is is generally going to be much more
> efficient to process a single FlowFile with many lines,
> instead of many small FlowFiles, due to the expense of the Data Provenance
> that has to be generated. There are some things
> that we can do to improve efficiency of the data provenance as well, but
> those improvements have generally been made
> 'high' priority rather than 'extremely high priority' :) so i would expect
> to see them coming out possibly toward the end of this year,
> after 1.0 and a few other major features come out.
> * Rather than using a Regular Expression, the "Satisfies Expression"
> Matching Strategy is likely to be more efficient in many cases
> if it is able to provide the routing logic that you need. It also tends to
> be easier to read than regular expressions, which is nice when
> you (or someone else) goes back later to modify the flow.
>
> Please let me know if anything here doesn't make sense or if you have any
> more questions.
>
> Thanks!
> -Mark
>
>
> > On Jun 30, 2016, at 9:04 PM, Stéphane Maarek <st...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I have a question regarding RouteText. The processor works just fine for
> me but maybe I'm missing a couple subtleties:
> >
> > 1) I have a regex to group data by (a pair of IDs), but what do I use
> the grouping attribute for? I still get as many outputs as lines
> > 2) My data is coming from a listenUDP. If my batch size is 1, RouteText
> is having a lot of trouble processing all the data. I would guess that it
> compiles the regex everytime it is executed, is it correct? When I increase
> the batch size to 100, RouteText processes everything well. I was wondering
> if there could be some sort of optimization on the RouteText to keep the
> regex compile nonetheless of the state of the processor?
> >
> >
> > Thanks a lot!
> > Stephane
>
>

Re: RouteText questions (regex, grouping, performance)

Posted by Mark Payne <ma...@hotmail.com>.

Hi Stephane,

For #1, when you say that you get as many output as lines of text, are you sending in FlowFiles that are only
one line of text each? The Processor does not aggregate multiple FlowFiles together, so if you are sending in
1-line FlowFiles, it can only route that FlowFile in 1-line outputs.

Re #2: The regular expression is compiled every time. This is done, though, because the Regex allows the Expression
Language to be used, so the Regex could actually be different for each FlowFile. That being said, it could certainly be
improved by either (a) pre-compiling in the case that no Expression Language is used and/or (b) cache up to say 10
Regex'es once they are compiled. Do you mind filing a JIRA to improve the efficiency of this processor?

Also, when you say that the processor is having trouble keeping up with a batch size of 1, there are a few thoughts that
come to mind:

* How many concurrent tasks do you have assigned to the processor? Have you tried increasing it?
* When processing text in NiFi it is is generally going to be much more efficient to process a single FlowFile with many lines,
instead of many small FlowFiles, due to the expense of the Data Provenance that has to be generated. There are some things
that we can do to improve efficiency of the data provenance as well, but those improvements have generally been made
'high' priority rather than 'extremely high priority' :) so i would expect to see them coming out possibly toward the end of this year,
after 1.0 and a few other major features come out.
* Rather than using a Regular Expression, the "Satisfies Expression" Matching Strategy is likely to be more efficient in many cases
if it is able to provide the routing logic that you need. It also tends to be easier to read than regular expressions, which is nice when
you (or someone else) goes back later to modify the flow.

Please let me know if anything here doesn't make sense or if you have any more questions.

Thanks!
-Mark

> On Jun 30, 2016, at 9:04 PM, Stéphane Maarek <st...@gmail.com> wrote:
> 
> Hi,
> 
> I have a question regarding RouteText. The processor works just fine for me but maybe I'm missing a couple subtleties:
> 
> 1) I have a regex to group data by (a pair of IDs), but what do I use the grouping attribute for? I still get as many outputs as lines 
> 2) My data is coming from a listenUDP. If my batch size is 1, RouteText is having a lot of trouble processing all the data. I would guess that it compiles the regex everytime it is executed, is it correct? When I increase the batch size to 100, RouteText processes everything well. I was wondering if there could be some sort of optimization on the RouteText to keep the regex compile nonetheless of the state of the processor? 
> 
> 
> Thanks a lot!
> Stephane