You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Vijay Chhipa <vc...@apple.com> on 2018/12/29 12:56:15 UTC
How to remove lines from a flow file that don't start with a certain
prefix?
Hi All,
I have an output from a processor that contains lines that are JSON structured, i.e. each line is a JSON by itself.
In certain cases there are lines that are not JSON and I want to remove them.
I tried the ReplaceText processor with the following patterns.
((.|\n)*?)(?={)
Contents of the input file:
foobar
sdkfskdl
ksdfjlsdj
{"key":"value"}
{"key2":"value2"}
Desired output file contents:
{"key":"value"}
{"key2":"value2"}
In the https://www.regextester.com/ <https://www.regextester.com/> site above pattern gives me the following matched string:
When I put the above in the ReplaceText Processor like this
I get an error that this is not a valid Java regular expression:
Why is this not valid, and is there an online reg ex checked that I can validate with before putting it in NiFi?
Also should I be using a different processor than the ReplaceText for this purpose?
Thank you for your help and Happy New Year.
Vijay
Re: How to remove lines from a flow file that don't start with a
certain prefix?
Posted by Andy LoPresto <al...@apache.org>.
Vijay,
I think the SplitText processor with a delimiter of “<newline character>” (press Ctrl + Enter or Shift + Enter depending on OS) should solve this for you.
Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> On Dec 30, 2018, at 10:19 PM, Vijay Chhipa <vc...@apple.com> wrote:
>
> Mark,
>
>
> Thanks for the tip, I set up the RouteText processor like below and it worked beautifully. First I thought that each matched line will become its own flowfile but thats not the case. All lines that matched become part of a single new flow file.
>
> <Screen Shot 2018-12-31 at 12.15.51 AM.png>
>
>
>
> Vijay
>
>
>
>> On Dec 29, 2018, at 8:18 AM, Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> wrote:
>>
>> Vijay,
>>
>> I would recommend using RouteText for that use case. You can then use Expression Language against line of text to do something like ${line:startsWith(“{“)}
>>
>> It avoids the complexities of regex and is much more efficient.
>>
>> Thanks
>> -Mark
>>
>> Sent from my iPhone
>>
>> On Dec 29, 2018, at 7:56 AM, Vijay Chhipa <vchhipa@apple.com <ma...@apple.com>> wrote:
>>
>>> Hi All,
>>>
>>> I have an output from a processor that contains lines that are JSON structured, i.e. each line is a JSON by itself.
>>> In certain cases there are lines that are not JSON and I want to remove them.
>>>
>>> I tried the ReplaceText processor with the following patterns.
>>>
>>> ((.|\n)*?)(?={)
>>>
>>> Contents of the input file:
>>> foobar
>>> sdkfskdl
>>> ksdfjlsdj
>>> {"key":"value"}
>>> {"key2":"value2"}
>>>
>>> Desired output file contents:
>>> {"key":"value"}
>>> {"key2":"value2"}
>>>
>>>
>>> In the https://www.regextester.com/ <https://www.regextester.com/> site above pattern gives me the following matched string:
>>>
>>> <PastedGraphic-1.png>
>>>
>>>
>>> When I put the above in the ReplaceText Processor like this
>>> <PastedGraphic-2.png>
>>> I get an error that this is not a valid Java regular expression:
>>>
>>> <PastedGraphic-3.png>
>>>
>>> Why is this not valid, and is there an online reg ex checked that I can validate with before putting it in NiFi?
>>> Also should I be using a different processor than the ReplaceText for this purpose?
>>>
>>>
>>> Thank you for your help and Happy New Year.
>>>
>>> Vijay
>>>
>>>
>>>
>>>
>> <PastedGraphic-2.png><PastedGraphic-3.png><PastedGraphic-1.png>
>
Re: How to remove lines from a flow file that don't start with a
certain prefix?
Posted by Vijay Chhipa <vc...@apple.com>.
Mark,
Thanks for the tip, I set up the RouteText processor like below and it worked beautifully. First I thought that each matched line will become its own flowfile but thats not the case. All lines that matched become part of a single new flow file.
Vijay
> On Dec 29, 2018, at 8:18 AM, Mark Payne <ma...@hotmail.com> wrote:
>
> Vijay,
>
> I would recommend using RouteText for that use case. You can then use Expression Language against line of text to do something like ${line:startsWith(“{“)}
>
> It avoids the complexities of regex and is much more efficient.
>
> Thanks
> -Mark
>
> Sent from my iPhone
>
> On Dec 29, 2018, at 7:56 AM, Vijay Chhipa <vchhipa@apple.com <ma...@apple.com>> wrote:
>
>> Hi All,
>>
>> I have an output from a processor that contains lines that are JSON structured, i.e. each line is a JSON by itself.
>> In certain cases there are lines that are not JSON and I want to remove them.
>>
>> I tried the ReplaceText processor with the following patterns.
>>
>> ((.|\n)*?)(?={)
>>
>> Contents of the input file:
>> foobar
>> sdkfskdl
>> ksdfjlsdj
>> {"key":"value"}
>> {"key2":"value2"}
>>
>> Desired output file contents:
>> {"key":"value"}
>> {"key2":"value2"}
>>
>>
>> In the https://www.regextester.com/ <https://www.regextester.com/> site above pattern gives me the following matched string:
>>
>> <PastedGraphic-1.png>
>>
>>
>> When I put the above in the ReplaceText Processor like this
>> <PastedGraphic-2.png>
>> I get an error that this is not a valid Java regular expression:
>>
>> <PastedGraphic-3.png>
>>
>> Why is this not valid, and is there an online reg ex checked that I can validate with before putting it in NiFi?
>> Also should I be using a different processor than the ReplaceText for this purpose?
>>
>>
>> Thank you for your help and Happy New Year.
>>
>> Vijay
>>
>>
>>
>>
> <PastedGraphic-2.png><PastedGraphic-3.png><PastedGraphic-1.png>
Re: How to remove lines from a flow file that don't start with a
certain prefix?
Posted by Mark Payne <ma...@hotmail.com>.
Vijay,
I would recommend using RouteText for that use case. You can then use Expression Language against line of text to do something like ${line:startsWith(“{“)}
It avoids the complexities of regex and is much more efficient.
Thanks
-Mark
Sent from my iPhone
On Dec 29, 2018, at 7:56 AM, Vijay Chhipa <vc...@apple.com>> wrote:
Hi All,
I have an output from a processor that contains lines that are JSON structured, i.e. each line is a JSON by itself.
In certain cases there are lines that are not JSON and I want to remove them.
I tried the ReplaceText processor with the following patterns.
((.|\n)*?)(?={)
Contents of the input file:
foobar
sdkfskdl
ksdfjlsdj
{"key":"value"}
{"key2":"value2"}
Desired output file contents:
{"key":"value"}
{"key2":"value2"}
In the https://www.regextester.com/ site above pattern gives me the following matched string:
<PastedGraphic-1.png>
When I put the above in the ReplaceText Processor like this
<PastedGraphic-2.png>
I get an error that this is not a valid Java regular expression:
<PastedGraphic-3.png>
Why is this not valid, and is there an online reg ex checked that I can validate with before putting it in NiFi?
Also should I be using a different processor than the ReplaceText for this purpose?
Thank you for your help and Happy New Year.
Vijay