You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Dmitry Mashkov (JIRA)" <ji...@apache.org> on 2019/03/14 06:50:00 UTC

[jira] [Comment Edited] (NIFI-6093) SplitRecord processor doesn't propagate fragment* attributes to original relationship

    [ https://issues.apache.org/jira/browse/NIFI-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792403#comment-16792403 ] 

Dmitry Mashkov edited comment on NIFI-6093 at 3/14/19 6:49 AM:
---------------------------------------------------------------

Hi Matt,

 

My case is, I receive very huge XML files at once, one file 8GB, another 27GB. To make parsing process more reliable, I did chunk by chunk process, as I assume file contains millions of records, first step, I split it by 1million records, next step, each of 1m records split by 100k records, next step, each of 100k records split by 1000 records. Of course I need control when each step of chunking is complete, I use Wait/Notify. Wait processor needs info how many Notifications should expect to pass through 1 chunk. Please take a look to other "brothers" Split processors, whey are all copy info about splits to original relationship, exactly for these purposes. Of course, you are right, fragment.index useless on original relationship, but _count_ and _id_ should be present.

If you have more questions, you are welcome. 


was (Author: dreadolph):
Hi Matt,

 

My case is, I received very huge XML files at once, one file 8GB, another 27GB. To make parsing process more reliable, I did chunk by chunk process, as I assume file contains millions of records, first step, I split it by 1million records, next step, each of 1m records split by 100k records, next step, each of 100k records split by 1000 records. Of course I need control when each step of chunking is complete, I use Wait/Notify. Wait processor needs info how many Notifications should expect to pass through 1 chunk. Please take a look to other "brothers" Split processors, whey are all copy info about splits to original relationship, exactly for these purposes. Of course, you are right, fragment.index useless on original relationship, but _count_ and _id_ should be present.

If you have more questions, you are welcome. 

> SplitRecord processor doesn't propagate fragment* attributes to original relationship
> -------------------------------------------------------------------------------------
>
>                 Key: NIFI-6093
>                 URL: https://issues.apache.org/jira/browse/NIFI-6093
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.9.0
>            Reporter: Dmitry Mashkov
>            Priority: Major
>
> Hello Team, 
> As I already described in summary, SplitRecord processor missed fragment* attributes as result it is impossible to use Wait/Notify pattern to wait splits processing. 
> I think follow patch can be applied 
> {code:java}
> Index: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitRecord.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitRecord.java (date 1550371815000)
> +++ nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitRecord.java (date 1551441180000)
> @@ -206,7 +206,8 @@
> return;
> }
> - session.transfer(original, REL_ORIGINAL);
> + final FlowFile originalFlowFile = FragmentAttributes.copyAttributesToOriginal(session, original, fragmentId, splits.size());
> + session.transfer(originalFlowFile, REL_ORIGINAL);
> // Add the fragment count to each split
> for(FlowFile split : splits) {
> session.putAttribute(split, FRAGMENT_COUNT, String.valueOf(splits.size()));
> {code}
>  
> Sincerely,
> Dmitry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)