You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Paresh Shah <Pa...@lifelock.com> on 2015/12/13 19:49:33 UTC

Handling of moving the contents of a large file through NIFI.

We have the following use case:
On a scheduled basis, reading of a large no of records from an external system and moving the records through the NIFI pipeline.

What we see is that the flowFiles are not moved to the relationship until the session is committed. And once the session is committed we are not able to transfer anything else on that session.

We see that in GetFileTransfer where the entire file contents are moved using the “importFrom” api on the session. But since we need to handle the individual records in the pipeline it does not work for our use case.

Is there a different mechanism to do what we want. Any insights will be appreciated.

Thanks
Paresh
________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Handling of moving the contents of a large file through NIFI.

Posted by Joe Witt <jo...@gmail.com>.
Paresh,

Ok understood.  Just keep in mind NiFi will not be loading the
flowfile content into memory and can handle data as large as the
content repo will allow.  From there you can split data into the
individual records and do so in a way that may allow for rather high
performance.

If you want to read in content in chunks it is probably best to make a
custom processor which will have sessions per chunk or set of chunks
(up to you) and you'll want to keep state about how far along you are
on that object (in case of a restart).

You should definitely be able to use the session over and over again
as described in the nifi-api description for a ProcessSession

 "A process session instance may be used continuously. That is, after
each commit or rollback, the session can be used again."

What are you seeing in logs when you try to use the session again?

Thanks
Joe

On Sun, Dec 13, 2015 at 3:09 PM, Paresh Shah <Pa...@lifelock.com> wrote:
> The file is quite large. So reading all of it into the flow file is not an
> option. We wanting to be processing the individual records of the file,
> that is why are creating ³n² flow files.
>
> Paresh
>
> On 12/13/15, 11:11 AM, "Joe Witt" <jo...@gmail.com> wrote:
>
>>Paresh
>>
>>Is it feasible to read the large object I to the flow and then split as
>>needed?  It sounds like you are reading from the orig file in place and
>>making flowfiles from it.
>>
>>Perhaps you can share a screenshot of the flow?
>>
>>Thanks
>>Joe
>>On Dec 13, 2015 1:49 PM, "Paresh Shah" <Pa...@lifelock.com> wrote:
>>
>>> We have the following use case:
>>> On a scheduled basis, reading of a large no of records from an external
>>> system and moving the records through the NIFI pipeline.
>>>
>>> What we see is that the flowFiles are not moved to the relationship
>>>until
>>> the session is committed. And once the session is committed we are not
>>>able
>>> to transfer anything else on that session.
>>>
>>> We see that in GetFileTransfer where the entire file contents are moved
>>> using the ³importFrom² api on the session. But since we need to handle
>>>the
>>> individual records in the pipeline it does not work for our use case.
>>>
>>> Is there a different mechanism to do what we want. Any insights will be
>>> appreciated.
>>>
>>> Thanks
>>> Paresh
>>> ________________________________
>>> The information contained in this transmission may contain privileged
>>>and
>>> confidential information. It is intended only for the use of the
>>>person(s)
>>> named above. If you are not the intended recipient, you are hereby
>>>notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender by reply email and destroy all
>>>copies
>>> of the original message.
>>> ________________________________
>>>
>
> ________________________________
>  The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
> ________________________________

Re: Handling of moving the contents of a large file through NIFI.

Posted by Paresh Shah <Pa...@lifelock.com>.
The file is quite large. So reading all of it into the flow file is not an
option. We wanting to be processing the individual records of the file,
that is why are creating ³n² flow files.

Paresh

On 12/13/15, 11:11 AM, "Joe Witt" <jo...@gmail.com> wrote:

>Paresh
>
>Is it feasible to read the large object I to the flow and then split as
>needed?  It sounds like you are reading from the orig file in place and
>making flowfiles from it.
>
>Perhaps you can share a screenshot of the flow?
>
>Thanks
>Joe
>On Dec 13, 2015 1:49 PM, "Paresh Shah" <Pa...@lifelock.com> wrote:
>
>> We have the following use case:
>> On a scheduled basis, reading of a large no of records from an external
>> system and moving the records through the NIFI pipeline.
>>
>> What we see is that the flowFiles are not moved to the relationship
>>until
>> the session is committed. And once the session is committed we are not
>>able
>> to transfer anything else on that session.
>>
>> We see that in GetFileTransfer where the entire file contents are moved
>> using the ³importFrom² api on the session. But since we need to handle
>>the
>> individual records in the pipeline it does not work for our use case.
>>
>> Is there a different mechanism to do what we want. Any insights will be
>> appreciated.
>>
>> Thanks
>> Paresh
>> ________________________________
>> The information contained in this transmission may contain privileged
>>and
>> confidential information. It is intended only for the use of the
>>person(s)
>> named above. If you are not the intended recipient, you are hereby
>>notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all
>>copies
>> of the original message.
>> ________________________________
>>

________________________________
 The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Handling of moving the contents of a large file through NIFI.

Posted by Joe Witt <jo...@gmail.com>.
Paresh

Is it feasible to read the large object I to the flow and then split as
needed?  It sounds like you are reading from the orig file in place and
making flowfiles from it.

Perhaps you can share a screenshot of the flow?

Thanks
Joe
On Dec 13, 2015 1:49 PM, "Paresh Shah" <Pa...@lifelock.com> wrote:

> We have the following use case:
> On a scheduled basis, reading of a large no of records from an external
> system and moving the records through the NIFI pipeline.
>
> What we see is that the flowFiles are not moved to the relationship until
> the session is committed. And once the session is committed we are not able
> to transfer anything else on that session.
>
> We see that in GetFileTransfer where the entire file contents are moved
> using the “importFrom” api on the session. But since we need to handle the
> individual records in the pipeline it does not work for our use case.
>
> Is there a different mechanism to do what we want. Any insights will be
> appreciated.
>
> Thanks
> Paresh
> ________________________________
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ________________________________
>