You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Matthieu Ré <re...@gmail.com> on 2021/07/21 14:14:42 UTC

[NiFi-8760] Processors fail to process flowfiles with VolatileContentRepository

Hi all,

Currently using NiFi 1.11.4, we face a blocking issue trying to switch to
NiFi 1.13.1+ due to the VolatileContentRepository : some processors we use
(and probably others that we didn't try) were not able to process
flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the Jira
ticket NiFi-8760 <https://issues.apache.org/jira/browse/NIFI-8760>).

I wanted to know if any of you guys are able to reproduce the issue, and if
this is not a misconfiguration from our side. The nifi.properties and
flow.xml.gz used are available in the ticket. If I am not missing anything,
we could identify that the issue could come from this commit
<https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89>
since
it appeared with the 1.13.1 and the flow is working fine with 1.13.0.

Open to contribute as much as I can if you confirm that this is not due to
a misconfiguration..

Thanks !
Matthieu

Re: [NiFi-8760] Processors fail to process flowfiles with VolatileContentRepository

Posted by Matthieu Ré <re...@gmail.com>.
Hi everyone,

Some weeks ago we had this conversation about the VolatileContentRepository
and the JIRA NIFI-8760 <https://issues.apache.org/jira/browse/NIFI-8760>.
This implementation seemed not to be a priority, but my team and probably
others could still want to use it, and so I worked on a fix. Let's recall
that it the bug was introduced in 1.13.1 when was introduced the
ResourceClaim especially for the FileSystemRepository (and it makes perfect
sense in that use case since it improves the way several ContentClaim could
be read using only one InputStream), and that doesn't seem to work for the
VolatileContentRepository.

Today I have two simple fixes equivalent in terms of performance (tested on
GenerateFF and MergeRecord, SplitJson, QueryRecord) :

   - First is to follow the idea of the first implementation
   <https://github.com/apache/nifi/blob/528fce2407d092d4ced1a58fcc14d0bc6e660b89/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/VolatileContentRepository.java#L473>,
   that was for a ResourceClaim to call the corresponding ContentClaim at the
   offset 0. It doesn't work when the searched ContentClaim has a length,
   because the ContentClaim implements an "equalsTo" that takes the length
   into account and its constructor called by read(ResourceClaim)
   initializes it to -1. So a fix could be to search for the ContentClaim in
   the map matching the ResourceClaim and the offset 0.
   As I said, even if this implementation seems poor since it does not
   benefit from the structure of the Map of Comparable keys to search for a
   ContentClaim, the performance of this solution seems equivalent to the
   second one.
   - Second is to simply consider the VolatileContentRepository as
   non-compatible with the read(ResourceClaim) and to only allow
   read(ContentClaim) as it is the case for the EncryptedFileSystemRepository.

Since the structure of the data storage(s) in this implementation is
Map<ContentClaim, ContentBlock>, I lake of experience to answer the
question :

   - Does it make sense to try to use the ResourceClaim to call
   ContentBlock(s) in case of a VolatileContentRepository ?
      - If yes, could there be a benefit to call ContentBlock from all the
      offset matching the ResourceClaim, instead of only the offset 0 as it
      intended to be ?
      - Else, the second fix is probably the good one

Please don't hesitate to correct me if I'm wrong or misunderstood something.

Thank you and have a nice day,
Matthieu

Le jeu. 5 août 2021 à 12:36, Matthieu Ré <re...@gmail.com> a écrit :

> Thank you very much for your answers !
>
> That's a surprise ! The VolatileContentRepository seemed to answer
> perfectly our need to treat a big amount of data with low resources and
> especially low I/O on mounted disks, with non critical data and potential
> data loss authorized.
>
> I just tried your solution @Mark mounting a tmpfs and FileSystemRepository
> (on 1.11.4), but it seems like for the same amount of data and same RAM
> space used, the VolatileContentRepository used a constant <5% of space,
> while the FileSystemRepository was using a very unstable amount of space,
> frequently running out of space. (I must add that we don't store any
> archive, nifi.content.repository.archive.enabled=false). Maybe am I missing
> a configuration that consumes a lot of space with the FileSystemRepository ?
>
> Stateless NiFi sounds very interesting ! Just had a look at
> pvillard's demo (https://github.com/pvillard31/nifi-stateless-demo) and
> the framework's readme (
> https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-stateless),
> but do you have any more resources about it ? I would like to understand a
> little bit more what differs from the standard framework and how it can fit
> our use case.
>
> Have a nice day,
> Matthieu
>
>
> Le ven. 23 juil. 2021 à 17:20, Joe Witt <jo...@gmail.com> a écrit :
>
>> It seems like any use case that we previously thought VolatileContentRepo
>> would be good for now we'd say Stateless NiFi is a dramatically better
>> approach.
>>
>> We need to doc this better but the capability is there now for sure.
>>
>> On Fri, Jul 23, 2021 at 8:13 AM Mark Payne <ma...@hotmail.com> wrote:
>>
>>> Matthieu,
>>>
>>> I would highly recommend against using VolatileContentRepository. You’re
>>> the first one I’ve heard of using it in a few years. Typically, the
>>> FileSystemRepository is sufficient. If you truly want to run with the
>>> content in RAM I would recommend creating a RAM Disk and pointing the
>>> FileSystemRepository to that.
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Jul 21, 2021, at 10:31 AM, Matthieu Ré <re...@gmail.com> wrote:
>>>
>>> Hi Chris, thank you for your quick response
>>>
>>> I tried the flow with 1.13.2 and 1.13.1, and 1.14.0 just before the
>>> first RC and it still had the problem, so I am not sure if this is related
>>> to the session handling you pointed out, that has been fixed in 1.13.2
>>>
>>> Le mer. 21 juil. 2021 à 16:22, Chris Sampson <ch...@naimuri.com>
>>> a écrit :
>>>
>>>> 1.13.1 was known to have problems with session handling - see the
>>>> Release Note "lowlights" for 1.13.1 [1]
>>>>
>>>> It is recommended to upgrade to version 1.13.2 (or the latest 1.14.0).
>>>> If you can't upgrade then 1.13.0 would be better than 1.13.1.
>>>>
>>>>
>>>> [1]
>>>> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1
>>>>
>>>> ---
>>>> *Chris Sampson*
>>>> IT Consultant
>>>> chris.sampson@naimuri.com
>>>> <https://www.naimuri.com/>
>>>>
>>>>
>>>> On Wed, 21 Jul 2021 at 15:14, Matthieu Ré <re...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Currently using NiFi 1.11.4, we face a blocking issue trying to switch
>>>>> to NiFi 1.13.1+ due to the VolatileContentRepository : some processors we
>>>>> use (and probably others that we didn't try) were not able to process
>>>>> flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the Jira
>>>>> ticket NiFi-8760 <https://issues.apache.org/jira/browse/NIFI-8760>).
>>>>>
>>>>> I wanted to know if any of you guys are able to reproduce the issue,
>>>>> and if this is not a misconfiguration from our side. The nifi.properties
>>>>> and flow.xml.gz used are available in the ticket. If I am not missing
>>>>> anything, we could identify that the issue could come from this commit
>>>>> <https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89> since
>>>>> it appeared with the 1.13.1 and the flow is working fine with 1.13.0.
>>>>>
>>>>> Open to contribute as much as I can if you confirm that this is not
>>>>> due to a misconfiguration..
>>>>>
>>>>> Thanks !
>>>>> Matthieu
>>>>>
>>>>
>>>
>>> --
>>>
>>> Matthieu RÉ
>>> Data Scientist - Machine Learning Engineer - Dassault Systèmes
>>>
>>> ENSIIE, M2 AIC (Université Paris-Saclay)
>>>
>>> Tel: 0631609755
>>>
>>> Email: re.matthieu@gmail.com
>>>
>>>
>>>

Re: [NiFi-8760] Processors fail to process flowfiles with VolatileContentRepository

Posted by Matthieu Ré <re...@gmail.com>.
Thank you very much for your answers !

That's a surprise ! The VolatileContentRepository seemed to answer
perfectly our need to treat a big amount of data with low resources and
especially low I/O on mounted disks, with non critical data and potential
data loss authorized.

I just tried your solution @Mark mounting a tmpfs and FileSystemRepository
(on 1.11.4), but it seems like for the same amount of data and same RAM
space used, the VolatileContentRepository used a constant <5% of space,
while the FileSystemRepository was using a very unstable amount of space,
frequently running out of space. (I must add that we don't store any
archive, nifi.content.repository.archive.enabled=false). Maybe am I missing
a configuration that consumes a lot of space with the FileSystemRepository ?

Stateless NiFi sounds very interesting ! Just had a look at pvillard's demo
(https://github.com/pvillard31/nifi-stateless-demo) and the framework's
readme (
https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-stateless),
but do you have any more resources about it ? I would like to understand a
little bit more what differs from the standard framework and how it can fit
our use case.

Have a nice day,
Matthieu


Le ven. 23 juil. 2021 à 17:20, Joe Witt <jo...@gmail.com> a écrit :

> It seems like any use case that we previously thought VolatileContentRepo
> would be good for now we'd say Stateless NiFi is a dramatically better
> approach.
>
> We need to doc this better but the capability is there now for sure.
>
> On Fri, Jul 23, 2021 at 8:13 AM Mark Payne <ma...@hotmail.com> wrote:
>
>> Matthieu,
>>
>> I would highly recommend against using VolatileContentRepository. You’re
>> the first one I’ve heard of using it in a few years. Typically, the
>> FileSystemRepository is sufficient. If you truly want to run with the
>> content in RAM I would recommend creating a RAM Disk and pointing the
>> FileSystemRepository to that.
>>
>> Thanks
>> -Mark
>>
>>
>> On Jul 21, 2021, at 10:31 AM, Matthieu Ré <re...@gmail.com> wrote:
>>
>> Hi Chris, thank you for your quick response
>>
>> I tried the flow with 1.13.2 and 1.13.1, and 1.14.0 just before the first
>> RC and it still had the problem, so I am not sure if this is related to the
>> session handling you pointed out, that has been fixed in 1.13.2
>>
>> Le mer. 21 juil. 2021 à 16:22, Chris Sampson <ch...@naimuri.com>
>> a écrit :
>>
>>> 1.13.1 was known to have problems with session handling - see the
>>> Release Note "lowlights" for 1.13.1 [1]
>>>
>>> It is recommended to upgrade to version 1.13.2 (or the latest 1.14.0).
>>> If you can't upgrade then 1.13.0 would be better than 1.13.1.
>>>
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1
>>>
>>> ---
>>> *Chris Sampson*
>>> IT Consultant
>>> chris.sampson@naimuri.com
>>> <https://www.naimuri.com/>
>>>
>>>
>>> On Wed, 21 Jul 2021 at 15:14, Matthieu Ré <re...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Currently using NiFi 1.11.4, we face a blocking issue trying to switch
>>>> to NiFi 1.13.1+ due to the VolatileContentRepository : some processors we
>>>> use (and probably others that we didn't try) were not able to process
>>>> flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the Jira
>>>> ticket NiFi-8760 <https://issues.apache.org/jira/browse/NIFI-8760>).
>>>>
>>>> I wanted to know if any of you guys are able to reproduce the issue,
>>>> and if this is not a misconfiguration from our side. The nifi.properties
>>>> and flow.xml.gz used are available in the ticket. If I am not missing
>>>> anything, we could identify that the issue could come from this commit
>>>> <https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89> since
>>>> it appeared with the 1.13.1 and the flow is working fine with 1.13.0.
>>>>
>>>> Open to contribute as much as I can if you confirm that this is not due
>>>> to a misconfiguration..
>>>>
>>>> Thanks !
>>>> Matthieu
>>>>
>>>
>>
>> --
>>
>> Matthieu RÉ
>> Data Scientist - Machine Learning Engineer - Dassault Systèmes
>>
>> ENSIIE, M2 AIC (Université Paris-Saclay)
>>
>> Tel: 0631609755
>>
>> Email: re.matthieu@gmail.com
>>
>>
>>

Re: [NiFi-8760] Processors fail to process flowfiles with VolatileContentRepository

Posted by Joe Witt <jo...@gmail.com>.
It seems like any use case that we previously thought VolatileContentRepo
would be good for now we'd say Stateless NiFi is a dramatically better
approach.

We need to doc this better but the capability is there now for sure.

On Fri, Jul 23, 2021 at 8:13 AM Mark Payne <ma...@hotmail.com> wrote:

> Matthieu,
>
> I would highly recommend against using VolatileContentRepository. You’re
> the first one I’ve heard of using it in a few years. Typically, the
> FileSystemRepository is sufficient. If you truly want to run with the
> content in RAM I would recommend creating a RAM Disk and pointing the
> FileSystemRepository to that.
>
> Thanks
> -Mark
>
>
> On Jul 21, 2021, at 10:31 AM, Matthieu Ré <re...@gmail.com> wrote:
>
> Hi Chris, thank you for your quick response
>
> I tried the flow with 1.13.2 and 1.13.1, and 1.14.0 just before the first
> RC and it still had the problem, so I am not sure if this is related to the
> session handling you pointed out, that has been fixed in 1.13.2
>
> Le mer. 21 juil. 2021 à 16:22, Chris Sampson <ch...@naimuri.com>
> a écrit :
>
>> 1.13.1 was known to have problems with session handling - see the Release
>> Note "lowlights" for 1.13.1 [1]
>>
>> It is recommended to upgrade to version 1.13.2 (or the latest 1.14.0). If
>> you can't upgrade then 1.13.0 would be better than 1.13.1.
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1
>>
>> ---
>> *Chris Sampson*
>> IT Consultant
>> chris.sampson@naimuri.com
>> <https://www.naimuri.com/>
>>
>>
>> On Wed, 21 Jul 2021 at 15:14, Matthieu Ré <re...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Currently using NiFi 1.11.4, we face a blocking issue trying to switch
>>> to NiFi 1.13.1+ due to the VolatileContentRepository : some processors we
>>> use (and probably others that we didn't try) were not able to process
>>> flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the Jira
>>> ticket NiFi-8760 <https://issues.apache.org/jira/browse/NIFI-8760>).
>>>
>>> I wanted to know if any of you guys are able to reproduce the issue, and
>>> if this is not a misconfiguration from our side. The nifi.properties and
>>> flow.xml.gz used are available in the ticket. If I am not missing anything,
>>> we could identify that the issue could come from this commit
>>> <https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89> since
>>> it appeared with the 1.13.1 and the flow is working fine with 1.13.0.
>>>
>>> Open to contribute as much as I can if you confirm that this is not due
>>> to a misconfiguration..
>>>
>>> Thanks !
>>> Matthieu
>>>
>>
>
> --
>
> Matthieu RÉ
> Data Scientist - Machine Learning Engineer - Dassault Systèmes
>
> ENSIIE, M2 AIC (Université Paris-Saclay)
>
> Tel: 0631609755
>
> Email: re.matthieu@gmail.com
>
>
>

Re: [NiFi-8760] Processors fail to process flowfiles with VolatileContentRepository

Posted by Mark Payne <ma...@hotmail.com>.
Matthieu,

I would highly recommend against using VolatileContentRepository. You’re the first one I’ve heard of using it in a few years. Typically, the FileSystemRepository is sufficient. If you truly want to run with the content in RAM I would recommend creating a RAM Disk and pointing the FileSystemRepository to that.

Thanks
-Mark


On Jul 21, 2021, at 10:31 AM, Matthieu Ré <re...@gmail.com>> wrote:

Hi Chris, thank you for your quick response

I tried the flow with 1.13.2 and 1.13.1, and 1.14.0 just before the first RC and it still had the problem, so I am not sure if this is related to the session handling you pointed out, that has been fixed in 1.13.2

Le mer. 21 juil. 2021 à 16:22, Chris Sampson <ch...@naimuri.com>> a écrit :
1.13.1 was known to have problems with session handling - see the Release Note "lowlights" for 1.13.1 [1]

It is recommended to upgrade to version 1.13.2 (or the latest 1.14.0). If you can't upgrade then 1.13.0 would be better than 1.13.1.


[1] https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1

---
Chris Sampson
IT Consultant
chris.sampson@naimuri.com<ma...@naimuri.com>
[https://docs.google.com/uc?export=download&id=1oPtzd0P7DqtuzpjiTRAa6h6coFitpqom&revid=0B9aXwC5rMc6lVlZ2OWpUaVlFVmUwTlZBdjQ0KzAxb1dZS2hJPQ]<https://www.naimuri.com/>


On Wed, 21 Jul 2021 at 15:14, Matthieu Ré <re...@gmail.com>> wrote:
Hi all,

Currently using NiFi 1.11.4, we face a blocking issue trying to switch to NiFi 1.13.1+ due to the VolatileContentRepository : some processors we use (and probably others that we didn't try) were not able to process flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the Jira ticket NiFi-8760<https://issues.apache.org/jira/browse/NIFI-8760>).

I wanted to know if any of you guys are able to reproduce the issue, and if this is not a misconfiguration from our side. The nifi.properties and flow.xml.gz used are available in the ticket. If I am not missing anything, we could identify that the issue could come from this commit<https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89> since it appeared with the 1.13.1 and the flow is working fine with 1.13.0.

Open to contribute as much as I can if you confirm that this is not due to a misconfiguration..

Thanks !
Matthieu


--
Matthieu RÉ
Data Scientist - Machine Learning Engineer - Dassault Systèmes
ENSIIE, M2 AIC (Université Paris-Saclay)
Tel: 0631609755
Email: re.matthieu@gmail.com<ma...@gmail.com>


Re: [NiFi-8760] Processors fail to process flowfiles with VolatileContentRepository

Posted by Matthieu Ré <re...@gmail.com>.
Hi Chris, thank you for your quick response

I tried the flow with 1.13.2 and 1.13.1, and 1.14.0 just before the first
RC and it still had the problem, so I am not sure if this is related to the
session handling you pointed out, that has been fixed in 1.13.2

Le mer. 21 juil. 2021 à 16:22, Chris Sampson <ch...@naimuri.com> a
écrit :

> 1.13.1 was known to have problems with session handling - see the Release
> Note "lowlights" for 1.13.1 [1]
>
> It is recommended to upgrade to version 1.13.2 (or the latest 1.14.0). If
> you can't upgrade then 1.13.0 would be better than 1.13.1.
>
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1
>
> ---
> *Chris Sampson*
> IT Consultant
> chris.sampson@naimuri.com
> <https://www.naimuri.com/>
>
>
> On Wed, 21 Jul 2021 at 15:14, Matthieu Ré <re...@gmail.com> wrote:
>
>> Hi all,
>>
>> Currently using NiFi 1.11.4, we face a blocking issue trying to switch to
>> NiFi 1.13.1+ due to the VolatileContentRepository : some processors we use
>> (and probably others that we didn't try) were not able to process
>> flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the Jira
>> ticket NiFi-8760 <https://issues.apache.org/jira/browse/NIFI-8760>).
>>
>> I wanted to know if any of you guys are able to reproduce the issue, and
>> if this is not a misconfiguration from our side. The nifi.properties and
>> flow.xml.gz used are available in the ticket. If I am not missing anything,
>> we could identify that the issue could come from this commit
>> <https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89> since
>> it appeared with the 1.13.1 and the flow is working fine with 1.13.0.
>>
>> Open to contribute as much as I can if you confirm that this is not due
>> to a misconfiguration..
>>
>> Thanks !
>> Matthieu
>>
>

-- 

Matthieu RÉ
Data Scientist - Machine Learning Engineer - Dassault Systèmes

ENSIIE, M2 AIC (Université Paris-Saclay)

Tel: 0631609755

Email: re.matthieu@gmail.com

Re: [NiFi-8760] Processors fail to process flowfiles with VolatileContentRepository

Posted by Chris Sampson <ch...@naimuri.com>.
1.13.1 was known to have problems with session handling - see the Release
Note "lowlights" for 1.13.1 [1]

It is recommended to upgrade to version 1.13.2 (or the latest 1.14.0). If
you can't upgrade then 1.13.0 would be better than 1.13.1.


[1]
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1

---
*Chris Sampson*
IT Consultant
chris.sampson@naimuri.com
<https://www.naimuri.com/>


On Wed, 21 Jul 2021 at 15:14, Matthieu Ré <re...@gmail.com> wrote:

> Hi all,
>
> Currently using NiFi 1.11.4, we face a blocking issue trying to switch to
> NiFi 1.13.1+ due to the VolatileContentRepository : some processors we use
> (and probably others that we didn't try) were not able to process
> flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the Jira
> ticket NiFi-8760 <https://issues.apache.org/jira/browse/NIFI-8760>).
>
> I wanted to know if any of you guys are able to reproduce the issue, and
> if this is not a misconfiguration from our side. The nifi.properties and
> flow.xml.gz used are available in the ticket. If I am not missing anything,
> we could identify that the issue could come from this commit
> <https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89> since
> it appeared with the 1.13.1 and the flow is working fine with 1.13.0.
>
> Open to contribute as much as I can if you confirm that this is not due to
> a misconfiguration..
>
> Thanks !
> Matthieu
>