You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Wang Qingming (Jira)" <ji...@apache.org> on 2021/11/24 09:37:00 UTC
[jira] [Comment Edited] (NIFI-9224) 按文件分片读取或写文件

    [ https://issues.apache.org/jira/browse/NIFI-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448470#comment-17448470 ] 

Wang Qingming edited comment on NIFI-9224 at 11/24/21, 9:36 AM:
----------------------------------------------------------------

Hello, I am also a user in China.We have developed a processor to split large csv files.We plan to contribute to the nifi project later.Use commons-csv open source project:
 
您好，我们也是中国的nifi用户。我们开发过一个拆分大的csv文件的组件，我们计划稍后贡献到nifi项目。使用commons-csv开源项目：
 
 
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.5</version>
</dependency>
 
 
 The core code is as follows, you can refer to it.
核心代码如下，可以参考。
 

try (InputStream in = session.read(incomingCSV)) {
    InputStreamReader isr = new InputStreamReader(in, charset);
    Reader reader = new BufferedReader(isr);
    CSVParser parser = CSVFormat.EXCEL.withHeader(headers).withQuote(null).parse(reader);
    Iterator<CSVRecord> csvIterator = parser.iterator();

    //to read the csv file one by one
    //逐条读取csv文件
    while (csvIterator.hasNext()) {
        CSVRecord record = csvIterator.next();
        //other Handle other business logic
        //处理其他业务逻辑

    }

} catch (IOException e) {
    //log 
}


was (Author: wangqingming):
Hello, I am also a user in China.We have developed a processor to split large csv files.We plan to contribute to the nifi project later.Use commons-csv open source project:
 
您好，我们也是中国的nifi用户。我们开发过一个拆分大的csv文件的组件，我们计划稍后贡献到nifi项目。使用commons-csv开源项目：
 
 
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.5</version>
</dependency>
 
 
 The core code is as follows, you can refer to it.
核心代码如下，可以参考。
 
try (InputStream in = session.read(incomingCSV)) {
    InputStreamReader isr = new InputStreamReader(in, charset);
    Reader reader = new BufferedReader(isr);
    CSVParser parser = CSVFormat.EXCEL.withHeader(headers).withQuote(null).parse(reader);
    Iterator<CSVRecord> csvIterator = parser.iterator();
 
    //to read the csv file one by one
    //逐条读取csv文件
    while (csvIterator.hasNext()) {
        CSVRecord record = csvIterator.next();
        //other Handle other business logic
        //处理其他业务逻辑
 
    }
 
} catch (IOException e) {
  //log 
}
 
 
 

> 按文件分片读取或写文件
> -----------
>
>                 Key: NIFI-9224
>                 URL: https://issues.apache.org/jira/browse/NIFI-9224
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>    Affects Versions: 1.15.0
>            Reporter: Every
>            Priority: Major
>              Labels: fetchfile, fragment
>
> For very large files, fetchfile cannot be read to nifi at once, and no suitable processor shard read files are found, so expand the service and processor that reads files by line shards, writes files by shards, and makes it easy to process very large files with smaller resources.
> 针对超大文件，无法使用fetchfile一次性读取到nifi中，也没有找到合适的处理器分片读取文件，因此扩展按行分片读取文件，按分片写入文件的服务及处理器，便于使用较小的资源处理超大文件。



--
This message was sent by Atlassian Jira
(v8.20.1#820001)