You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by naga satish <cn...@gmail.com> on 2020/12/16 05:38:44 UTC

files larger than queue size limit

My team designed a NiFi flow to handle CSV files of size around 15GB. But
later we realised that files can be upto 500 GB. I set the queue size limit
to 25GB. This is a one time data load to S3. I'm converting each CSV file
to parquet in NiFi using a convert record processor. What happens in these
situations? Can NiFi be able to handle this kind of scenario?

FYI, my NiFi has 40 gigs of memory and 2TB of storage.

Regards
Satish

Re: files larger than queue size limit

Posted by naga satish <cn...@gmail.com>.
Mike, I'm using snappy compression before loading files into S3.

On Wed, 16 Dec 2020, 8:28 pm Mike Thomsen, <mi...@gmail.com> wrote:

> To add to that, you should compress the content before loading into S3
> or you will be paying a lot more than you have to.
>
> On Wed, Dec 16, 2020 at 6:49 AM Pierre Villard
> <pi...@gmail.com> wrote:
> >
> > Yes it should work just fine. The relationship backpressure settings are
> just soft limits: if backpressure is not enabled, then the upstream
> processor can be triggered even if the processor generates a huge flow file
> that would cause the backpressure to be enabled. The backpressure mechanism
> is only at trigger time.
> >
> > Regarding memory, the record processors are processing data in a
> streaming fashion, the data will never get fully loaded into memory.
> >
> > Generally speaking, NiFi is agnostic of the data size and can deal with
> any kind of large/small files.
> >
> > Hope this helps,
> > Pierre
> >
> >
> > Le mer. 16 déc. 2020 à 06:39, naga satish <cn...@gmail.com> a
> écrit :
> >>
> >> My team designed a NiFi flow to handle CSV files of size around 15GB.
> But later we realised that files can be upto 500 GB. I set the queue size
> limit to 25GB. This is a one time data load to S3. I'm converting each CSV
> file to parquet in NiFi using a convert record processor. What happens in
> these situations? Can NiFi be able to handle this kind of scenario?
> >>
> >> FYI, my NiFi has 40 gigs of memory and 2TB of storage.
> >>
> >> Regards
> >> Satish
>

Re: files larger than queue size limit

Posted by Mike Thomsen <mi...@gmail.com>.
To add to that, you should compress the content before loading into S3
or you will be paying a lot more than you have to.

On Wed, Dec 16, 2020 at 6:49 AM Pierre Villard
<pi...@gmail.com> wrote:
>
> Yes it should work just fine. The relationship backpressure settings are just soft limits: if backpressure is not enabled, then the upstream processor can be triggered even if the processor generates a huge flow file that would cause the backpressure to be enabled. The backpressure mechanism is only at trigger time.
>
> Regarding memory, the record processors are processing data in a streaming fashion, the data will never get fully loaded into memory.
>
> Generally speaking, NiFi is agnostic of the data size and can deal with any kind of large/small files.
>
> Hope this helps,
> Pierre
>
>
> Le mer. 16 déc. 2020 à 06:39, naga satish <cn...@gmail.com> a écrit :
>>
>> My team designed a NiFi flow to handle CSV files of size around 15GB. But later we realised that files can be upto 500 GB. I set the queue size limit to 25GB. This is a one time data load to S3. I'm converting each CSV file to parquet in NiFi using a convert record processor. What happens in these situations? Can NiFi be able to handle this kind of scenario?
>>
>> FYI, my NiFi has 40 gigs of memory and 2TB of storage.
>>
>> Regards
>> Satish

Re: files larger than queue size limit

Posted by Pierre Villard <pi...@gmail.com>.
Yes it should work just fine. The relationship backpressure settings are
just soft limits: if backpressure is not enabled, then the upstream
processor can be triggered even if the processor generates a huge flow file
that would cause the backpressure to be enabled. The backpressure mechanism
is only at trigger time.

Regarding memory, the record processors are processing data in a streaming
fashion, the data will never get fully loaded into memory.

Generally speaking, NiFi is agnostic of the data size and can deal with any
kind of large/small files.

Hope this helps,
Pierre


Le mer. 16 déc. 2020 à 06:39, naga satish <cn...@gmail.com> a écrit :

> My team designed a NiFi flow to handle CSV files of size around 15GB. But
> later we realised that files can be upto 500 GB. I set the queue size limit
> to 25GB. This is a one time data load to S3. I'm converting each CSV file
> to parquet in NiFi using a convert record processor. What happens in these
> situations? Can NiFi be able to handle this kind of scenario?
>
> FYI, my NiFi has 40 gigs of memory and 2TB of storage.
>
> Regards
> Satish
>