You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Ryan Hendrickson <ry...@gmail.com> on 2021/04/22 00:34:29 UTC

NIFI-7646 - Improve performance of MergeContent

https://issues.apache.org/jira/browse/NIFI-7646 - Improve performance of
MergeContent / others that read content of many small FlowFiles

Hi,
   In reference to the ticket above, released in 1.13, the descriptions
says "if the FlowFile is small, say 200 bytes, the result is that we
perform 2+ disk accesses to read those 200 bytes (even though 4K - 8K is a
typical block size and could be read in the same amount of time as those
200 bytes)."

   To clarify, if the FlowFiles are never more than 1K, and the block size
is 4k, does that mean this improvement will read 4 FlowFiles with the
resources of 1?

   This would be a 4:1 improvement.  Or in the 200 byte scenario, it would
be a 20:1 improvement?

Thanks,
Ryan

Re: NIFI-7646 - Improve performance of MergeContent

Posted by Ryan Hendrickson <ry...@gmail.com>.
Thanks Mark!

On Wed, Apr 21, 2021 at 8:48 PM Mark Payne <ma...@hotmail.com> wrote:

> Ryan,
>
> It gets a bit more complex than this, because the flowfiles may not always
> be accessed/read sequentially in exactly the same order that they live on
> disk, there’s concurrent threads/disk accessed to consider, etc. But in the
> best case scenarios, yes that is accurate.
>
> Keep in mind, though, that what you are comparing there is the performance
> of the disk accesses/reads, and that is, of course, not the entire picture.
> Lots more going on under the covers, so if you see a performance
> improvement of 20x in reading the content, that won’t mean a 20x
> improvement in overall throughout.
>
> But it sure won’t hurt! :)
>
> -Mark
>
> Sent from my iPhone
>
> > On Apr 21, 2021, at 8:34 PM, Ryan Hendrickson <
> ryan.andrew.hendrickson@gmail.com> wrote:
> >
> > https://issues.apache.org/jira/browse/NIFI-7646 - Improve performance
> of
> > MergeContent / others that read content of many small FlowFiles
> >
> > Hi,
> >   In reference to the ticket above, released in 1.13, the descriptions
> > says "if the FlowFile is small, say 200 bytes, the result is that we
> > perform 2+ disk accesses to read those 200 bytes (even though 4K - 8K is
> a
> > typical block size and could be read in the same amount of time as those
> > 200 bytes)."
> >
> >   To clarify, if the FlowFiles are never more than 1K, and the block size
> > is 4k, does that mean this improvement will read 4 FlowFiles with the
> > resources of 1?
> >
> >   This would be a 4:1 improvement.  Or in the 200 byte scenario, it would
> > be a 20:1 improvement?
> >
> > Thanks,
> > Ryan
>

Re: NIFI-7646 - Improve performance of MergeContent

Posted by Mark Payne <ma...@hotmail.com>.
Ryan,

It gets a bit more complex than this, because the flowfiles may not always be accessed/read sequentially in exactly the same order that they live on disk, there’s concurrent threads/disk accessed to consider, etc. But in the best case scenarios, yes that is accurate.

Keep in mind, though, that what you are comparing there is the performance of the disk accesses/reads, and that is, of course, not the entire picture. Lots more going on under the covers, so if you see a performance improvement of 20x in reading the content, that won’t mean a 20x improvement in overall throughout.

But it sure won’t hurt! :)

-Mark

Sent from my iPhone

> On Apr 21, 2021, at 8:34 PM, Ryan Hendrickson <ry...@gmail.com> wrote:
> 
> https://issues.apache.org/jira/browse/NIFI-7646 - Improve performance of
> MergeContent / others that read content of many small FlowFiles
> 
> Hi,
>   In reference to the ticket above, released in 1.13, the descriptions
> says "if the FlowFile is small, say 200 bytes, the result is that we
> perform 2+ disk accesses to read those 200 bytes (even though 4K - 8K is a
> typical block size and could be read in the same amount of time as those
> 200 bytes)."
> 
>   To clarify, if the FlowFiles are never more than 1K, and the block size
> is 4k, does that mean this improvement will read 4 FlowFiles with the
> resources of 1?
> 
>   This would be a 4:1 improvement.  Or in the 200 byte scenario, it would
> be a 20:1 improvement?
> 
> Thanks,
> Ryan