You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Joseph Witt (Jira)" <ji...@apache.org> on 2019/09/20 03:13:00 UTC

[jira] [Commented] (NIFI-6313) PutGCSObject performance seems slow

    [ https://issues.apache.org/jira/browse/NIFI-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933999#comment-16933999 ] 

Joseph Witt commented on NIFI-6313:
-----------------------------------

[~markap14] the GCE massive scale testing you did - was it with this processor?

[~jasperknulst] this is important to progress but it needs a PR and review traction before we set the fix version.  

> PutGCSObject performance seems slow
> -----------------------------------
>
>                 Key: NIFI-6313
>                 URL: https://issues.apache.org/jira/browse/NIFI-6313
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>    Affects Versions: 1.9.2
>            Reporter: Jasper Knulst
>            Priority: Major
>             Fix For: 1.10.0
>
>
> The PutGCSObject processor to transfer files to Google Cloud Platform bucket has bad transfer speeds.
> It is impossible to put any hard figures on the throughput as it seems dependent on:
> -Network location of the Nifi node (situated in GC or not)
> -Network bandwidth
> -Nifi node specs
>  
> After performing benchmarks on multiple Nifi clusters (ranging from test setups to prod. sites) the throughput can range from 8MB/s to 800MB/s. 
> Slow really means, slow in comparison to gsutil. If you run gsutil directly from the Nifi node the throughput speed goes up 5 to 8 times (without 'parallel_composite_upload') and up to 16 times faster with 'parallel_composite_upload'.
>  
> The GC Java API on which Nifi's GCS processors are built, does not have the same optimizations as gsutil and maybe isn't supported/maintained. The Storage.create method is even deprecated.
> Still there must be ways to speed up transfers the GCS by implementing parallel composite uploads in chuncks and config options on the GCS processors 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)