You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Vijay Chhipa <vc...@apple.com> on 2018/12/29 12:56:15 UTC

How to remove lines from a flow file that don't start with a certain prefix?

Hi All, 

I have an output from a processor that contains lines that are JSON structured, i.e. each line is a JSON by itself. 
In certain cases there are lines that are not JSON and I want to remove them. 

I tried the ReplaceText processor with the following patterns. 

((.|\n)*?)(?={)

Contents of the input file:
foobar
sdkfskdl
ksdfjlsdj
{"key":"value"}
{"key2":"value2"}

Desired output file contents: 
{"key":"value"}
{"key2":"value2"}


In the https://www.regextester.com/ <https://www.regextester.com/>  site above pattern gives me the following matched string: 




When I put the above in the ReplaceText Processor like this

I get an error that this is not a valid Java regular expression: 



Why is this not valid, and is there an online reg ex checked that I can validate with before putting it in NiFi?
Also should I be using a different processor than the ReplaceText for this purpose?


Thank you for your help and Happy New Year. 

Vijay





Re: How to remove lines from a flow file that don't start with a certain prefix?

Posted by Andy LoPresto <al...@apache.org>.
Vijay,

I think the SplitText processor with a delimiter of “<newline character>” (press Ctrl + Enter or Shift + Enter depending on OS) should solve this for you. 

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Dec 30, 2018, at 10:19 PM, Vijay Chhipa <vc...@apple.com> wrote:
> 
> Mark, 
> 
> 
> Thanks for the tip, I set up the RouteText processor like below and it worked beautifully. First I thought that each matched line will become its own flowfile but thats not the case. All lines that matched become part of a single new flow file. 
> 
> <Screen Shot 2018-12-31 at 12.15.51 AM.png>
> 
> 
> 
> Vijay
> 
> 
> 
>> On Dec 29, 2018, at 8:18 AM, Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> wrote:
>> 
>> Vijay,
>> 
>> I would recommend using RouteText for that use case. You can then use Expression Language against line of text to do something like ${line:startsWith(“{“)}
>> 
>> It avoids the complexities of regex and is much more efficient. 
>> 
>> Thanks
>> -Mark
>> 
>> Sent from my iPhone
>> 
>> On Dec 29, 2018, at 7:56 AM, Vijay Chhipa <vchhipa@apple.com <ma...@apple.com>> wrote:
>> 
>>> Hi All, 
>>> 
>>> I have an output from a processor that contains lines that are JSON structured, i.e. each line is a JSON by itself. 
>>> In certain cases there are lines that are not JSON and I want to remove them. 
>>> 
>>> I tried the ReplaceText processor with the following patterns. 
>>> 
>>> ((.|\n)*?)(?={)
>>> 
>>> Contents of the input file:
>>> foobar
>>> sdkfskdl
>>> ksdfjlsdj
>>> {"key":"value"}
>>> {"key2":"value2"}
>>> 
>>> Desired output file contents: 
>>> {"key":"value"}
>>> {"key2":"value2"}
>>> 
>>> 
>>> In the https://www.regextester.com/ <https://www.regextester.com/>  site above pattern gives me the following matched string: 
>>> 
>>> <PastedGraphic-1.png>
>>> 
>>> 
>>> When I put the above in the ReplaceText Processor like this
>>> <PastedGraphic-2.png>
>>> I get an error that this is not a valid Java regular expression: 
>>> 
>>> <PastedGraphic-3.png>
>>> 
>>> Why is this not valid, and is there an online reg ex checked that I can validate with before putting it in NiFi?
>>> Also should I be using a different processor than the ReplaceText for this purpose?
>>> 
>>> 
>>> Thank you for your help and Happy New Year. 
>>> 
>>> Vijay
>>> 
>>> 
>>> 
>>> 
>> <PastedGraphic-2.png><PastedGraphic-3.png><PastedGraphic-1.png>
> 


Re: How to remove lines from a flow file that don't start with a certain prefix?

Posted by Vijay Chhipa <vc...@apple.com>.
Mark, 


Thanks for the tip, I set up the RouteText processor like below and it worked beautifully. First I thought that each matched line will become its own flowfile but thats not the case. All lines that matched become part of a single new flow file. 





Vijay



> On Dec 29, 2018, at 8:18 AM, Mark Payne <ma...@hotmail.com> wrote:
> 
> Vijay,
> 
> I would recommend using RouteText for that use case. You can then use Expression Language against line of text to do something like ${line:startsWith(“{“)}
> 
> It avoids the complexities of regex and is much more efficient. 
> 
> Thanks
> -Mark
> 
> Sent from my iPhone
> 
> On Dec 29, 2018, at 7:56 AM, Vijay Chhipa <vchhipa@apple.com <ma...@apple.com>> wrote:
> 
>> Hi All, 
>> 
>> I have an output from a processor that contains lines that are JSON structured, i.e. each line is a JSON by itself. 
>> In certain cases there are lines that are not JSON and I want to remove them. 
>> 
>> I tried the ReplaceText processor with the following patterns. 
>> 
>> ((.|\n)*?)(?={)
>> 
>> Contents of the input file:
>> foobar
>> sdkfskdl
>> ksdfjlsdj
>> {"key":"value"}
>> {"key2":"value2"}
>> 
>> Desired output file contents: 
>> {"key":"value"}
>> {"key2":"value2"}
>> 
>> 
>> In the https://www.regextester.com/ <https://www.regextester.com/>  site above pattern gives me the following matched string: 
>> 
>> <PastedGraphic-1.png>
>> 
>> 
>> When I put the above in the ReplaceText Processor like this
>> <PastedGraphic-2.png>
>> I get an error that this is not a valid Java regular expression: 
>> 
>> <PastedGraphic-3.png>
>> 
>> Why is this not valid, and is there an online reg ex checked that I can validate with before putting it in NiFi?
>> Also should I be using a different processor than the ReplaceText for this purpose?
>> 
>> 
>> Thank you for your help and Happy New Year. 
>> 
>> Vijay
>> 
>> 
>> 
>> 
> <PastedGraphic-2.png><PastedGraphic-3.png><PastedGraphic-1.png>


Re: How to remove lines from a flow file that don't start with a certain prefix?

Posted by Mark Payne <ma...@hotmail.com>.
Vijay,

I would recommend using RouteText for that use case. You can then use Expression Language against line of text to do something like ${line:startsWith(“{“)}

It avoids the complexities of regex and is much more efficient.

Thanks
-Mark

Sent from my iPhone

On Dec 29, 2018, at 7:56 AM, Vijay Chhipa <vc...@apple.com>> wrote:

Hi All,

I have an output from a processor that contains lines that are JSON structured, i.e. each line is a JSON by itself.
In certain cases there are lines that are not JSON and I want to remove them.

I tried the ReplaceText processor with the following patterns.

((.|\n)*?)(?={)

Contents of the input file:
foobar
sdkfskdl
ksdfjlsdj
{"key":"value"}
{"key2":"value2"}

Desired output file contents:
{"key":"value"}
{"key2":"value2"}


In the https://www.regextester.com/  site above pattern gives me the following matched string:

<PastedGraphic-1.png>


When I put the above in the ReplaceText Processor like this
<PastedGraphic-2.png>
I get an error that this is not a valid Java regular expression:

<PastedGraphic-3.png>

Why is this not valid, and is there an online reg ex checked that I can validate with before putting it in NiFi?
Also should I be using a different processor than the ReplaceText for this purpose?


Thank you for your help and Happy New Year.

Vijay