You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Phil H <gi...@gmail.com> on 2022/03/16 22:58:48 UTC

SplitContent doesn’t support regex?

Hi,

This seems like an odd omission - aside from performance (presumably?) is
there a reason why there isn’t a regex option for the byte sequence? I need
one but thought I’d ask before I built my own.

Thanks
Phil

Re: SplitContent doesn’t support regex?

Posted by Mark Payne <ma...@hotmail.com>.
Phil,

Yeah, that’s fine. We want to include the Jira number in the commit message, but you can include multiple by writing a message like:

NIFI-3470, NIFI-1517: Addressed Thing #1 and Thing #2

Thanks
-Mark


> On Apr 4, 2022, at 10:27 AM, Phil H <gi...@gmail.com> wrote:
> 
> Whilst I try and get NiFi to build, let's circle back to JIRA.  I found an
> open issue that matches my requirement (NIFI-1517), however to implement my
> solution, I'd also fix NIFI-3470 on the way (reading a configurable amount
> of data to run the regex over, rather than byte-by-byte).
> 
> So, what's the proper way to go about this from a JIRA perspective?  I
> assume my branch would be nifi-1517 as that's the feature I'm building, but
> it would also "solve" 3470?
> 
> TIA,
> Phil
> 
> 
> 
> 
> On Thu, Mar 17, 2022 at 9:12 AM Joe Witt <jo...@gmail.com> wrote:
> 
>> Phil
>> 
>> I'd say if you have a good implementation in mind you should go for it.
>> Sounds interesting.
>> 
>> Thanks
>> 
>> On Wed, Mar 16, 2022 at 3:59 PM Phil H <gi...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> This seems like an odd omission - aside from performance (presumably?) is
>>> there a reason why there isn’t a regex option for the byte sequence? I
>> need
>>> one but thought I’d ask before I built my own.
>>> 
>>> Thanks
>>> Phil
>>> 
>> 


Re: SplitContent doesn’t support regex?

Posted by Phil H <gi...@gmail.com>.
Whilst I try and get NiFi to build, let's circle back to JIRA.  I found an
open issue that matches my requirement (NIFI-1517), however to implement my
solution, I'd also fix NIFI-3470 on the way (reading a configurable amount
of data to run the regex over, rather than byte-by-byte).

So, what's the proper way to go about this from a JIRA perspective?  I
assume my branch would be nifi-1517 as that's the feature I'm building, but
it would also "solve" 3470?

TIA,
Phil




On Thu, Mar 17, 2022 at 9:12 AM Joe Witt <jo...@gmail.com> wrote:

> Phil
>
> I'd say if you have a good implementation in mind you should go for it.
> Sounds interesting.
>
> Thanks
>
> On Wed, Mar 16, 2022 at 3:59 PM Phil H <gi...@gmail.com> wrote:
>
> > Hi,
> >
> > This seems like an odd omission - aside from performance (presumably?) is
> > there a reason why there isn’t a regex option for the byte sequence? I
> need
> > one but thought I’d ask before I built my own.
> >
> > Thanks
> > Phil
> >
>

Re: SplitContent doesn’t support regex?

Posted by Otto Fowler <ot...@gmail.com>.
Joe, I don’t know if we can make the case for a stand alone processor for doing this on top of that, if so, I’d be willing to take a look at that.




From: Otto Fowler <ot...@gmail.com>
Reply: Otto Fowler <ot...@gmail.com>
Date: March 19, 2022 at 13:40:07
To: dev@nifi.apache.org <de...@nifi.apache.org>, Phil H <gi...@gmail.com>
Subject:  Re: SplitContent doesn’t support regex?  

In the Apache Metron Project (in the attic now) we used https://github.com/nishihatapalmer/byteseek to do pcap searches, maybe you can check that out.




From: Phil H <gi...@gmail.com>
Reply: dev@nifi.apache.org <de...@nifi.apache.org>
Date: March 16, 2022 at 20:04:58
To: dev@nifi.apache.org <de...@nifi.apache.org>
Subject:  Re: SplitContent doesn’t support regex?  

I dunno about a good implementation…  

I did a similar extension of GetTCP to allow for a regex EOM rather than a  
single byte. It works, but I don’t feel like it was done in the spirit of  
the existing processor!  

On Thu, 17 Mar 2022 at 09:12, Joe Witt <jo...@gmail.com> wrote:  

> Phil  
>  
> I'd say if you have a good implementation in mind you should go for it.  
> Sounds interesting.  
>  
> Thanks  
>  
> On Wed, Mar 16, 2022 at 3:59 PM Phil H <gi...@gmail.com> wrote:  
>  
> > Hi,  
> >  
> > This seems like an odd omission - aside from performance (presumably?) is  
> > there a reason why there isn’t a regex option for the byte sequence? I  
> need  
> > one but thought I’d ask before I built my own.  
> >  
> > Thanks  
> > Phil  
> >  
>  

Re: SplitContent doesn’t support regex?

Posted by Otto Fowler <ot...@gmail.com>.
In the Apache Metron Project (in the attic now) we used
https://github.com/nishihatapalmer/byteseek to do pcap searches, maybe you
can check that out.




From: Phil H <gi...@gmail.com> <gi...@gmail.com>
Reply: dev@nifi.apache.org <de...@nifi.apache.org> <de...@nifi.apache.org>
Date: March 16, 2022 at 20:04:58
To: dev@nifi.apache.org <de...@nifi.apache.org> <de...@nifi.apache.org>
Subject:  Re: SplitContent doesn’t support regex?

I dunno about a good implementation…

I did a similar extension of GetTCP to allow for a regex EOM rather than a
single byte. It works, but I don’t feel like it was done in the spirit of
the existing processor!

On Thu, 17 Mar 2022 at 09:12, Joe Witt <jo...@gmail.com> wrote:

> Phil
>
> I'd say if you have a good implementation in mind you should go for it.
> Sounds interesting.
>
> Thanks
>
> On Wed, Mar 16, 2022 at 3:59 PM Phil H <gi...@gmail.com> wrote:
>
> > Hi,
> >
> > This seems like an odd omission - aside from performance (presumably?)
is
> > there a reason why there isn’t a regex option for the byte sequence? I
> need
> > one but thought I’d ask before I built my own.
> >
> > Thanks
> > Phil
> >
>

Re: SplitContent doesn’t support regex?

Posted by Phil H <gi...@gmail.com>.
I dunno about a good implementation…

I did a similar extension of GetTCP to allow for a regex EOM rather than a
single byte. It works, but I don’t feel like it was done in the spirit of
the existing processor!

On Thu, 17 Mar 2022 at 09:12, Joe Witt <jo...@gmail.com> wrote:

> Phil
>
> I'd say if you have a good implementation in mind you should go for it.
> Sounds interesting.
>
> Thanks
>
> On Wed, Mar 16, 2022 at 3:59 PM Phil H <gi...@gmail.com> wrote:
>
> > Hi,
> >
> > This seems like an odd omission - aside from performance (presumably?) is
> > there a reason why there isn’t a regex option for the byte sequence? I
> need
> > one but thought I’d ask before I built my own.
> >
> > Thanks
> > Phil
> >
>

Re: SplitContent doesn’t support regex?

Posted by Joe Witt <jo...@gmail.com>.
Phil

I'd say if you have a good implementation in mind you should go for it.
Sounds interesting.

Thanks

On Wed, Mar 16, 2022 at 3:59 PM Phil H <gi...@gmail.com> wrote:

> Hi,
>
> This seems like an odd omission - aside from performance (presumably?) is
> there a reason why there isn’t a regex option for the byte sequence? I need
> one but thought I’d ask before I built my own.
>
> Thanks
> Phil
>