You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matt Burgess (JIRA)" <ji...@apache.org> on 2018/06/14 01:13:00 UTC

[jira] [Commented] (NIFI-5307) Select Hive Processor takes longer time to write into CSV files

    [ https://issues.apache.org/jira/browse/NIFI-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511872#comment-16511872 ] 

Matt Burgess commented on NIFI-5307:
------------------------------------

The code in SelectHiveQL for CSV output currently builds rows as strings in memory, I think we can safely just write them directly to the output stream in most cases (exceptions may be fields that need escaping, etc., but individual fields should not need to be joined into a large string for output.

> Select Hive Processor takes longer time to write into CSV files
> ---------------------------------------------------------------
>
>                 Key: NIFI-5307
>                 URL: https://issues.apache.org/jira/browse/NIFI-5307
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Configuration
>    Affects Versions: 1.6.0
>            Reporter: Tanmay Deshpande
>            Priority: Major
>
> Team,
> Current Select Hive SQL processor reads each line from result set and writes into the file. This can be improved to directly write records in bulk to improve the performance. When tried with millions of rows the reading from Hive only takes few milliseconds but writing to file takes several minutes. 
>  
> Thanks,
> Tanmay
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)