You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Andrew Hulbert (JIRA)" <ji...@apache.org> on 2016/07/08 01:59:10 UTC

[jira] [Commented] (NIFI-1438) Unexpected results using MergeProcessor

    [ https://issues.apache.org/jira/browse/NIFI-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367093#comment-15367093 ] 

Andrew Hulbert commented on NIFI-1438:
--------------------------------------

I believe I'm noticing a similar issue. I would like to simply aggregate an hour-worth of files into a single file to write to disk and I don't seem to be achieving that behavior or else my processor is incorrectly configured. Either way its not as obvious how to do this as perhaps it could be. I could always create the exactly behavior...(i.e. bin/concat files in order for x amount of time) this in a new processor but was hoping there was an easier way.

> Unexpected results using MergeProcessor 
> ----------------------------------------
>
>                 Key: NIFI-1438
>                 URL: https://issues.apache.org/jira/browse/NIFI-1438
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 0.4.1
>         Environment: OSX 10.10.5, Java 8u45
>            Reporter: Josh Harrison
>         Attachments: NIFI-1438-template.xml, nifi-merge-problem.xml, nifi-problem.tgz
>
>
> Hello, I'm opening a ticket in reference to the stack overflow question I had at http://stackoverflow.com/questions/34958347/mergecontent-with-nifi-inconsistent-length 
> To summarize, despite Aldrin's help, I have been unable to get the expected merge behavior out of a template like the one attached, ingesting data like is attached. 
> The goal is to ingest all of the zips in /tmp/nifidemo/source, extract the zip files contained therein, each line being a json object. With json routing, I extract and route for further processing ONLY items where the "tags" item contains the tag "xyz".
> These routed files should be aggregated by "mergeContent" into a bucket with, at minimum, 1000 lines – or after being starved for 30 seconds, whatever occurs first.
> The behavior observed in my real template is replicated in this example – merge content appears to be routing to buckets based on the original file name, and not aggregating 1000 lines at a time as expected. Within a few seconds of the template being run, many files are written with unexpected line counts.
> More confusingly, this isn't a consistent pattern - files may be run repeatedly and do not generate the same number of lines in the result each time.
> The content of the input files was randomly generated so that approximately 10% of the objects would contain the tag "xyz" (5000 lines in each input file, there should be approximately 500 lines of – there are result files that contain over 400 lines, but many contain 15-30 lines. There are also a number of files with a "uuid.json" style name, all containing one line. 
> The attached contains a generic template that replicates the problem – it seems to throw some errors but they don't appear to be related to the problem I'm working on (and my real template doesn't throw the failures, but still exhibits the same behavior).
> I am running Nifi 0.4.1 on a Mac OSX 10.10.5 system and JRE 8u45.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)