You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Adam Williams <aa...@outlook.com> on 2015/09/24 22:36:30 UTC

Array into MongoDB

I have an array of JSON object I am trying to put into Mongo, but I keep hitting this on the PutMongo processor:
ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, section=1], offset=0, length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.

I tried to use the splitJson processor to split the array into segments, but to my experience I can't pull out each Json Obect.  The splitjson processor just hangs and never produces logs or any output at all.  The structure of my data is:
[{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
The JSON file itself is pretty large (>100mb).
Thank you

Re: NiFi flow provides 0 output on large files

Posted by Ryan Ward <ry...@gmail.com>.

Do you have any file expiration set on any of the queues?

On Fri, Sep 25, 2015 at 8:30 AM, Aldrin Piri <al...@gmail.com> wrote:

> Jeff,
>
> With regards to:
>
> "Anything over, the GetFile and DDA_Processor shows data movement but the
> no other downstream processor shows movement."
>
> Are you referencing downstream processors starting immediately after the
> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
> ConvertJsonToAvro processor?
>
> In the case of starting immediately after the DDA Processor, as it is a
> custom processor, we would need some additional information as to how this
> processor is behaving.  In the case of the second condition, if you have
> some additional context as to the format of the data that is problematic to
> what you are seeing (the effective "schema" of the JSON) would be helpful
> in tracking down the issue.
>
> Thanks!
> Aldrin
>
> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j....@gmail.com> wrote:
>
>> Hi Adam,
>>
>>
>> I have a flow that does the following;
>>
>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>
>> My source file has 182897 rows at 1001 bytes per row.  If I do any number
>> of rows under ~15000 an output file is created.  Anything over, the GetFile
>> and DDA_Processor shows data movement but the no other downstream processor
>> shows movement.
>>
>> I confirmed that it is not a data problem by processing a 10,000 row file
>> successfully, then concatenating 10,000 rows into one file twice.
>>
>> Thanks for your insight.
>>
>> Jeff
>>
>>
>>
>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
>>
>> Jeff,
>>
>> This seems to be a bit different as the processor is showing data as
>> having been written and there is a listing of one FlowFile of 381 MB being
>> transferred out from the processor.  Could you provide additional
>> information as to how data is not being sent out in the manner
>> anticipated?  If you can track the issue down more, let us know.  May be
>> helpful to create another message to help us track the issues separately as
>> we work through them.
>>
>> Thanks!
>>
>> Adam,
>>
>> Found a sizable JSON file to work against and have been doing some
>> initial exploration.  With the large files, it certainly is a nontrivial
>> process.  At cursory inspection, a good portion of processing seems to be
>> spent on validation.  There are some ways to tweak the strictness of this
>> with the supporting library, but will have to dive in a bit more.
>>
>>
>>
>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:
>>
>>>
>>>
>>>
>>> I’m having a very similar problem.  The process picks up the file, a
>>> custom processor does it’s thing but no data is sent out.
>>>
>>> <unknown.gif>
>>>
>>>
>>>
>>
>

Re: NiFi flow provides 0 output on large files

Posted by Aldrin Piri <al...@gmail.com>.

Ryan,

Certainly something that needs to be addressed and has been voiced by a
number of the members in the community.  The flushing of a queue is an item
under the feature proposal for Interactive Queue Management [1].

[1]
https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management

On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <ry...@gmail.com> wrote:

> This is actually very easy to overlook and miss. Often times we change the
> file expiration on a queue to simply empty the queue.
>
> Could we add in a right click empty queue option, with an are you sure
> prompt? Is there already a JIRA for this feature?
>
> Thanks,
> Ryan
>
> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j....@gmail.com> wrote:
>
>>
>> That was a rookie mistake.
>>
>> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in
>> a log that states a flow file was expired?
>>
>> My ultimate goal is to put all of this data into a Confluent Kafka topic,
>> taking advantage of the schema registry. I do not believe the current
>> PutToKafka provides the ability to use this registry correct?   I’m curious
>> if anyone is working on PutToConfluentKafka processor?
>>
>> Thanks for your help.
>>
>> Jeff
>>
>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <ma...@gmail.com> wrote:
>>
>> Jeff,
>>
>> What is the expiration setting on your connections? The little clock icon
>> indicates that they are configured to automatically expire flowfiles of a
>> certain age.
>>
>> Matt
>>
>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j....@gmail.com> wrote:
>>
>>>
>>> Hi Aldrin,
>>>
>>> After the DDA_Processor
>>>
>>> The below image shows that the GetFile Processed 174.6 MB and the
>>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>>> DDA_Processor box)
>>>
>>> <unknown.gif>
>>>
>>> The below image shows that the DDA_Processor is complete but data did
>>> not make it to ConvertJSONtoAvro.  No errors are being generated.
>>> DDA_Processor takes fixed width data and converts it to JSON.
>>>
>>> <unknown.gif>
>>>
>>> Thanks
>>>
>>>
>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <al...@gmail.com> wrote:
>>>
>>> Jeff,
>>>
>>> With regards to:
>>>
>>> "Anything over, the GetFile and DDA_Processor shows data movement but
>>> the no other downstream processor shows movement."
>>>
>>> Are you referencing downstream processors starting immediately after the
>>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>>> ConvertJsonToAvro processor?
>>>
>>> In the case of starting immediately after the DDA Processor, as it is a
>>> custom processor, we would need some additional information as to how this
>>> processor is behaving.  In the case of the second condition, if you have
>>> some additional context as to the format of the data that is problematic to
>>> what you are seeing (the effective "schema" of the JSON) would be helpful
>>> in tracking down the issue.
>>>
>>> Thanks!
>>> Aldrin
>>>
>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j....@gmail.com> wrote:
>>>
>>>> Hi Adam,
>>>>
>>>>
>>>> I have a flow that does the following;
>>>>
>>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>>>
>>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>>> number of rows under ~15000 an output file is created.  Anything over, the
>>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>>> processor shows movement.
>>>>
>>>> I confirmed that it is not a data problem by processing a 10,000 row
>>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>>
>>>> Thanks for your insight.
>>>>
>>>> Jeff
>>>> <Mail Attachment.gif>
>>>>
>>>>
>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
>>>>
>>>> Jeff,
>>>>
>>>> This seems to be a bit different as the processor is showing data as
>>>> having been written and there is a listing of one FlowFile of 381 MB being
>>>> transferred out from the processor.  Could you provide additional
>>>> information as to how data is not being sent out in the manner
>>>> anticipated?  If you can track the issue down more, let us know.  May be
>>>> helpful to create another message to help us track the issues separately as
>>>> we work through them.
>>>>
>>>> Thanks!
>>>>
>>>> Adam,
>>>>
>>>> Found a sizable JSON file to work against and have been doing some
>>>> initial exploration.  With the large files, it certainly is a nontrivial
>>>> process.  At cursory inspection, a good portion of processing seems to be
>>>> spent on validation.  There are some ways to tweak the strictness of this
>>>> with the supporting library, but will have to dive in a bit more.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> I’m having a very similar problem.  The process picks up the file, a
>>>>> custom processor does it’s thing but no data is sent out.
>>>>>
>>>>> <unknown.gif>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Re: NiFi flow provides 0 output on large files

Posted by Jeff <j....@gmail.com>.

Hi Joe, 

The Confluent Kafka platform I’ve been working with is open source.  

Below are some links but I’m not sure if this is what you are looking for. 

https://github.com/confluentinc <https://github.com/confluentinc>

http://www.confluent.io/product <http://www.confluent.io/product>

http://docs.confluent.io/1.0.1/platform.html <http://docs.confluent.io/1.0.1/platform.html>


> On Sep 25, 2015, at 11:38 AM, Joe Witt <jo...@gmail.com> wrote:
> 
> If whatever it would mean is open source friendly it sounds like a
> fine idea.  Seems unlikely we'd need to have something vendor
> specific.  Jeff are there are docs you can direct us to for this?
> 
> On Fri, Sep 25, 2015 at 11:33 AM, Jeff <j....@gmail.com> wrote:
>> 
>> Thanks of this info on the JIRA
>> 
>> Does anyone have any input on the PutToConfluentKafka idea?
>> 
>> 
>> On Sep 25, 2015, at 8:55 AM, Matt Gilman <ma...@gmail.com> wrote:
>> 
>> Yep. JIRA is already created [1] as well as other features we'll be
>> supporting regarding queue management [2].
>> 
>> Matt
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI-730
>> [2] https://issues.apache.org/jira/browse/NIFI-108
>> 
>> On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <ry...@gmail.com> wrote:
>>> 
>>> This is actually very easy to overlook and miss. Often times we change the
>>> file expiration on a queue to simply empty the queue.
>>> 
>>> Could we add in a right click empty queue option, with an are you sure
>>> prompt? Is there already a JIRA for this feature?
>>> 
>>> Thanks,
>>> Ryan
>>> 
>>> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j....@gmail.com> wrote:
>>>> 
>>>> 
>>>> That was a rookie mistake.
>>>> 
>>>> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in
>>>> a log that states a flow file was expired?
>>>> 
>>>> My ultimate goal is to put all of this data into a Confluent Kafka topic,
>>>> taking advantage of the schema registry. I do not believe the current
>>>> PutToKafka provides the ability to use this registry correct?   I’m curious
>>>> if anyone is working on PutToConfluentKafka processor?
>>>> 
>>>> Thanks for your help.
>>>> 
>>>> Jeff
>>>> 
>>>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <ma...@gmail.com> wrote:
>>>> 
>>>> Jeff,
>>>> 
>>>> What is the expiration setting on your connections? The little clock icon
>>>> indicates that they are configured to automatically expire flowfiles of a
>>>> certain age.
>>>> 
>>>> Matt
>>>> 
>>>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j....@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Hi Aldrin,
>>>>> 
>>>>> After the DDA_Processor
>>>>> 
>>>>> The below image shows that the GetFile Processed 174.6 MB and the
>>>>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>>>>> DDA_Processor box)
>>>>> 
>>>>> <unknown.gif>
>>>>> 
>>>>> The below image shows that the DDA_Processor is complete but data did
>>>>> not make it to ConvertJSONtoAvro.  No errors are being generated.
>>>>> DDA_Processor takes fixed width data and converts it to JSON.
>>>>> 
>>>>> <unknown.gif>
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <al...@gmail.com> wrote:
>>>>> 
>>>>> Jeff,
>>>>> 
>>>>> With regards to:
>>>>> 
>>>>> "Anything over, the GetFile and DDA_Processor shows data movement but
>>>>> the no other downstream processor shows movement."
>>>>> 
>>>>> Are you referencing downstream processors starting immediately after the
>>>>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>>>>> ConvertJsonToAvro processor?
>>>>> 
>>>>> In the case of starting immediately after the DDA Processor, as it is a
>>>>> custom processor, we would need some additional information as to how this
>>>>> processor is behaving.  In the case of the second condition, if you have
>>>>> some additional context as to the format of the data that is problematic to
>>>>> what you are seeing (the effective "schema" of the JSON) would be helpful in
>>>>> tracking down the issue.
>>>>> 
>>>>> Thanks!
>>>>> Aldrin
>>>>> 
>>>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j....@gmail.com> wrote:
>>>>>> 
>>>>>> Hi Adam,
>>>>>> 
>>>>>> 
>>>>>> I have a flow that does the following;
>>>>>> 
>>>>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>>>>> 
>>>>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>>>>> number of rows under ~15000 an output file is created.  Anything over, the
>>>>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>>>>> processor shows movement.
>>>>>> 
>>>>>> I confirmed that it is not a data problem by processing a 10,000 row
>>>>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>>>> 
>>>>>> Thanks for your insight.
>>>>>> 
>>>>>> Jeff
>>>>>> <Mail Attachment.gif>
>>>>>> 
>>>>>> 
>>>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
>>>>>> 
>>>>>> Jeff,
>>>>>> 
>>>>>> This seems to be a bit different as the processor is showing data as
>>>>>> having been written and there is a listing of one FlowFile of 381 MB being
>>>>>> transferred out from the processor.  Could you provide additional
>>>>>> information as to how data is not being sent out in the manner anticipated?
>>>>>> If you can track the issue down more, let us know.  May be helpful to create
>>>>>> another message to help us track the issues separately as we work through
>>>>>> them.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Adam,
>>>>>> 
>>>>>> Found a sizable JSON file to work against and have been doing some
>>>>>> initial exploration.  With the large files, it certainly is a nontrivial
>>>>>> process.  At cursory inspection, a good portion of processing seems to be
>>>>>> spent on validation.  There are some ways to tweak the strictness of this
>>>>>> with the supporting library, but will have to dive in a bit more.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I’m having a very similar problem.  The process picks up the file, a
>>>>>>> custom processor does it’s thing but no data is sent out.
>>>>>>> 
>>>>>>> <unknown.gif>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>>

Re: NiFi flow provides 0 output on large files

Posted by Joe Witt <jo...@gmail.com>.

If whatever it would mean is open source friendly it sounds like a
fine idea.  Seems unlikely we'd need to have something vendor
specific.  Jeff are there are docs you can direct us to for this?

On Fri, Sep 25, 2015 at 11:33 AM, Jeff <j....@gmail.com> wrote:
>
> Thanks of this info on the JIRA
>
> Does anyone have any input on the PutToConfluentKafka idea?
>
>
> On Sep 25, 2015, at 8:55 AM, Matt Gilman <ma...@gmail.com> wrote:
>
> Yep. JIRA is already created [1] as well as other features we'll be
> supporting regarding queue management [2].
>
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-730
> [2] https://issues.apache.org/jira/browse/NIFI-108
>
> On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <ry...@gmail.com> wrote:
>>
>> This is actually very easy to overlook and miss. Often times we change the
>> file expiration on a queue to simply empty the queue.
>>
>> Could we add in a right click empty queue option, with an are you sure
>> prompt? Is there already a JIRA for this feature?
>>
>> Thanks,
>> Ryan
>>
>> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j....@gmail.com> wrote:
>>>
>>>
>>> That was a rookie mistake.
>>>
>>> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in
>>> a log that states a flow file was expired?
>>>
>>> My ultimate goal is to put all of this data into a Confluent Kafka topic,
>>> taking advantage of the schema registry. I do not believe the current
>>> PutToKafka provides the ability to use this registry correct?   I’m curious
>>> if anyone is working on PutToConfluentKafka processor?
>>>
>>> Thanks for your help.
>>>
>>> Jeff
>>>
>>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <ma...@gmail.com> wrote:
>>>
>>> Jeff,
>>>
>>> What is the expiration setting on your connections? The little clock icon
>>> indicates that they are configured to automatically expire flowfiles of a
>>> certain age.
>>>
>>> Matt
>>>
>>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j....@gmail.com> wrote:
>>>>
>>>>
>>>> Hi Aldrin,
>>>>
>>>> After the DDA_Processor
>>>>
>>>> The below image shows that the GetFile Processed 174.6 MB and the
>>>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>>>> DDA_Processor box)
>>>>
>>>> <unknown.gif>
>>>>
>>>> The below image shows that the DDA_Processor is complete but data did
>>>> not make it to ConvertJSONtoAvro.  No errors are being generated.
>>>> DDA_Processor takes fixed width data and converts it to JSON.
>>>>
>>>> <unknown.gif>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <al...@gmail.com> wrote:
>>>>
>>>> Jeff,
>>>>
>>>> With regards to:
>>>>
>>>> "Anything over, the GetFile and DDA_Processor shows data movement but
>>>> the no other downstream processor shows movement."
>>>>
>>>> Are you referencing downstream processors starting immediately after the
>>>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>>>> ConvertJsonToAvro processor?
>>>>
>>>> In the case of starting immediately after the DDA Processor, as it is a
>>>> custom processor, we would need some additional information as to how this
>>>> processor is behaving.  In the case of the second condition, if you have
>>>> some additional context as to the format of the data that is problematic to
>>>> what you are seeing (the effective "schema" of the JSON) would be helpful in
>>>> tracking down the issue.
>>>>
>>>> Thanks!
>>>> Aldrin
>>>>
>>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j....@gmail.com> wrote:
>>>>>
>>>>> Hi Adam,
>>>>>
>>>>>
>>>>> I have a flow that does the following;
>>>>>
>>>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>>>>
>>>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>>>> number of rows under ~15000 an output file is created.  Anything over, the
>>>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>>>> processor shows movement.
>>>>>
>>>>> I confirmed that it is not a data problem by processing a 10,000 row
>>>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>>>
>>>>> Thanks for your insight.
>>>>>
>>>>> Jeff
>>>>> <Mail Attachment.gif>
>>>>>
>>>>>
>>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
>>>>>
>>>>> Jeff,
>>>>>
>>>>> This seems to be a bit different as the processor is showing data as
>>>>> having been written and there is a listing of one FlowFile of 381 MB being
>>>>> transferred out from the processor.  Could you provide additional
>>>>> information as to how data is not being sent out in the manner anticipated?
>>>>> If you can track the issue down more, let us know.  May be helpful to create
>>>>> another message to help us track the issues separately as we work through
>>>>> them.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Adam,
>>>>>
>>>>> Found a sizable JSON file to work against and have been doing some
>>>>> initial exploration.  With the large files, it certainly is a nontrivial
>>>>> process.  At cursory inspection, a good portion of processing seems to be
>>>>> spent on validation.  There are some ways to tweak the strictness of this
>>>>> with the supporting library, but will have to dive in a bit more.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I’m having a very similar problem.  The process picks up the file, a
>>>>>> custom processor does it’s thing but no data is sent out.
>>>>>>
>>>>>> <unknown.gif>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Re: NiFi flow provides 0 output on large files

Posted by Jeff <j....@gmail.com>.

Thanks of this info on the JIRA

Does anyone have any input on the PutToConfluentKafka idea?


> On Sep 25, 2015, at 8:55 AM, Matt Gilman <ma...@gmail.com> wrote:
> 
> Yep. JIRA is already created [1] as well as other features we'll be supporting regarding queue management [2].
> 
> Matt
> 
> [1] https://issues.apache.org/jira/browse/NIFI-730 <https://issues.apache.org/jira/browse/NIFI-730>
> [2] https://issues.apache.org/jira/browse/NIFI-108 <https://issues.apache.org/jira/browse/NIFI-108>
> 
> On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <ryan.ward2@gmail.com <ma...@gmail.com>> wrote:
> This is actually very easy to overlook and miss. Often times we change the file expiration on a queue to simply empty the queue. 
> 
> Could we add in a right click empty queue option, with an are you sure prompt? Is there already a JIRA for this feature?
> 
> Thanks,
> Ryan
> 
> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
> 
> That was a rookie mistake.
> 
> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in a log that states a flow file was expired?  
> 
> My ultimate goal is to put all of this data into a Confluent Kafka topic, taking advantage of the schema registry. I do not believe the current PutToKafka provides the ability to use this registry correct?   I’m curious if anyone is working on PutToConfluentKafka processor?
> 
> Thanks for your help.
> 
> Jeff
> 
>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <matt.c.gilman@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Jeff,
>> 
>> What is the expiration setting on your connections? The little clock icon indicates that they are configured to automatically expire flowfiles of a certain age.
>> 
>> Matt
>> 
>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi Aldrin, 
>> 
>> After the DDA_Processor
>> 
>> The below image shows that the GetFile Processed 174.6 MB and the DDA_Processor is working on 1 file (the 1 in the upper right of the DDA_Processor box)
>> 
>> <unknown.gif>
>> 
>> The below image shows that the DDA_Processor is complete but data did not make it to ConvertJSONtoAvro.  No errors are being generated.  DDA_Processor takes fixed width data and converts it to JSON.  
>> 
>> <unknown.gif>
>> 
>> Thanks
>> 
>> 
>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <aldrinpiri@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Jeff,
>>> 
>>> With regards to:
>>> 
>>> "Anything over, the GetFile and DDA_Processor shows data movement but the no other downstream processor shows movement."
>>> 
>>> Are you referencing downstream processors starting immediately after the DDA_Processor (ConvertJsonToAvro) or starting immediately after the ConvertJsonToAvro processor?
>>> 
>>> In the case of starting immediately after the DDA Processor, as it is a custom processor, we would need some additional information as to how this processor is behaving.  In the case of the second condition, if you have some additional context as to the format of the data that is problematic to what you are seeing (the effective "schema" of the JSON) would be helpful in tracking down the issue.
>>> 
>>> Thanks!
>>> Aldrin
>>> 
>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
>>> Hi Adam,
>>> 
>>> 
>>> I have a flow that does the following;
>>> 
>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>> 
>>> My source file has 182897 rows at 1001 bytes per row.  If I do any number of rows under ~15000 an output file is created.  Anything over, the GetFile and DDA_Processor shows data movement but the no other downstream processor shows movement.  
>>> 
>>> I confirmed that it is not a data problem by processing a 10,000 row file successfully, then concatenating 10,000 rows into one file twice.  
>>> 
>>> Thanks for your insight.
>>> 
>>> Jeff
>>> <Mail Attachment.gif> 
>>> 
>>> 
>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <aldrinpiri@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> Jeff,
>>>> 
>>>> This seems to be a bit different as the processor is showing data as having been written and there is a listing of one FlowFile of 381 MB being transferred out from the processor.  Could you provide additional information as to how data is not being sent out in the manner anticipated?  If you can track the issue down more, let us know.  May be helpful to create another message to help us track the issues separately as we work through them.
>>>> 
>>>> Thanks!
>>>> 
>>>> Adam,
>>>> 
>>>> Found a sizable JSON file to work against and have been doing some initial exploration.  With the large files, it certainly is a nontrivial process.  At cursory inspection, a good portion of processing seems to be spent on validation.  There are some ways to tweak the strictness of this with the supporting library, but will have to dive in a bit more.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> 
>>>> I’m having a very similar problem.  The process picks up the file, a custom processor does it’s thing but no data is sent out.
>>>> 
>>>> <unknown.gif>
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
>

Re: NiFi flow provides 0 output on large files

Posted by Matt Gilman <ma...@gmail.com>.

Yep. JIRA is already created [1] as well as other features we'll be
supporting regarding queue management [2].

Matt

[1] https://issues.apache.org/jira/browse/NIFI-730
[2] https://issues.apache.org/jira/browse/NIFI-108

On Fri, Sep 25, 2015 at 9:52 AM, Ryan Ward <ry...@gmail.com> wrote:

> This is actually very easy to overlook and miss. Often times we change the
> file expiration on a queue to simply empty the queue.
>
> Could we add in a right click empty queue option, with an are you sure
> prompt? Is there already a JIRA for this feature?
>
> Thanks,
> Ryan
>
> On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j....@gmail.com> wrote:
>
>>
>> That was a rookie mistake.
>>
>> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in
>> a log that states a flow file was expired?
>>
>> My ultimate goal is to put all of this data into a Confluent Kafka topic,
>> taking advantage of the schema registry. I do not believe the current
>> PutToKafka provides the ability to use this registry correct?   I’m curious
>> if anyone is working on PutToConfluentKafka processor?
>>
>> Thanks for your help.
>>
>> Jeff
>>
>> On Sep 25, 2015, at 7:52 AM, Matt Gilman <ma...@gmail.com> wrote:
>>
>> Jeff,
>>
>> What is the expiration setting on your connections? The little clock icon
>> indicates that they are configured to automatically expire flowfiles of a
>> certain age.
>>
>> Matt
>>
>> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j....@gmail.com> wrote:
>>
>>>
>>> Hi Aldrin,
>>>
>>> After the DDA_Processor
>>>
>>> The below image shows that the GetFile Processed 174.6 MB and the
>>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>>> DDA_Processor box)
>>>
>>> <unknown.gif>
>>>
>>> The below image shows that the DDA_Processor is complete but data did
>>> not make it to ConvertJSONtoAvro.  No errors are being generated.
>>> DDA_Processor takes fixed width data and converts it to JSON.
>>>
>>> <unknown.gif>
>>>
>>> Thanks
>>>
>>>
>>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <al...@gmail.com> wrote:
>>>
>>> Jeff,
>>>
>>> With regards to:
>>>
>>> "Anything over, the GetFile and DDA_Processor shows data movement but
>>> the no other downstream processor shows movement."
>>>
>>> Are you referencing downstream processors starting immediately after the
>>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>>> ConvertJsonToAvro processor?
>>>
>>> In the case of starting immediately after the DDA Processor, as it is a
>>> custom processor, we would need some additional information as to how this
>>> processor is behaving.  In the case of the second condition, if you have
>>> some additional context as to the format of the data that is problematic to
>>> what you are seeing (the effective "schema" of the JSON) would be helpful
>>> in tracking down the issue.
>>>
>>> Thanks!
>>> Aldrin
>>>
>>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j....@gmail.com> wrote:
>>>
>>>> Hi Adam,
>>>>
>>>>
>>>> I have a flow that does the following;
>>>>
>>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>>>
>>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>>> number of rows under ~15000 an output file is created.  Anything over, the
>>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>>> processor shows movement.
>>>>
>>>> I confirmed that it is not a data problem by processing a 10,000 row
>>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>>
>>>> Thanks for your insight.
>>>>
>>>> Jeff
>>>> <Mail Attachment.gif>
>>>>
>>>>
>>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
>>>>
>>>> Jeff,
>>>>
>>>> This seems to be a bit different as the processor is showing data as
>>>> having been written and there is a listing of one FlowFile of 381 MB being
>>>> transferred out from the processor.  Could you provide additional
>>>> information as to how data is not being sent out in the manner
>>>> anticipated?  If you can track the issue down more, let us know.  May be
>>>> helpful to create another message to help us track the issues separately as
>>>> we work through them.
>>>>
>>>> Thanks!
>>>>
>>>> Adam,
>>>>
>>>> Found a sizable JSON file to work against and have been doing some
>>>> initial exploration.  With the large files, it certainly is a nontrivial
>>>> process.  At cursory inspection, a good portion of processing seems to be
>>>> spent on validation.  There are some ways to tweak the strictness of this
>>>> with the supporting library, but will have to dive in a bit more.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> I’m having a very similar problem.  The process picks up the file, a
>>>>> custom processor does it’s thing but no data is sent out.
>>>>>
>>>>> <unknown.gif>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Re: NiFi flow provides 0 output on large files

Posted by Ryan Ward <ry...@gmail.com>.

This is actually very easy to overlook and miss. Often times we change the
file expiration on a queue to simply empty the queue.

Could we add in a right click empty queue option, with an are you sure
prompt? Is there already a JIRA for this feature?

Thanks,
Ryan

On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j....@gmail.com> wrote:

>
> That was a rookie mistake.
>
> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in a
> log that states a flow file was expired?
>
> My ultimate goal is to put all of this data into a Confluent Kafka topic,
> taking advantage of the schema registry. I do not believe the current
> PutToKafka provides the ability to use this registry correct?   I’m curious
> if anyone is working on PutToConfluentKafka processor?
>
> Thanks for your help.
>
> Jeff
>
> On Sep 25, 2015, at 7:52 AM, Matt Gilman <ma...@gmail.com> wrote:
>
> Jeff,
>
> What is the expiration setting on your connections? The little clock icon
> indicates that they are configured to automatically expire flowfiles of a
> certain age.
>
> Matt
>
> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j....@gmail.com> wrote:
>
>>
>> Hi Aldrin,
>>
>> After the DDA_Processor
>>
>> The below image shows that the GetFile Processed 174.6 MB and the
>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>> DDA_Processor box)
>>
>> <unknown.gif>
>>
>> The below image shows that the DDA_Processor is complete but data did not
>> make it to ConvertJSONtoAvro.  No errors are being generated.
>> DDA_Processor takes fixed width data and converts it to JSON.
>>
>> <unknown.gif>
>>
>> Thanks
>>
>>
>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <al...@gmail.com> wrote:
>>
>> Jeff,
>>
>> With regards to:
>>
>> "Anything over, the GetFile and DDA_Processor shows data movement but
>> the no other downstream processor shows movement."
>>
>> Are you referencing downstream processors starting immediately after the
>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>> ConvertJsonToAvro processor?
>>
>> In the case of starting immediately after the DDA Processor, as it is a
>> custom processor, we would need some additional information as to how this
>> processor is behaving.  In the case of the second condition, if you have
>> some additional context as to the format of the data that is problematic to
>> what you are seeing (the effective "schema" of the JSON) would be helpful
>> in tracking down the issue.
>>
>> Thanks!
>> Aldrin
>>
>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j....@gmail.com> wrote:
>>
>>> Hi Adam,
>>>
>>>
>>> I have a flow that does the following;
>>>
>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>>
>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>> number of rows under ~15000 an output file is created.  Anything over, the
>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>> processor shows movement.
>>>
>>> I confirmed that it is not a data problem by processing a 10,000 row
>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>
>>> Thanks for your insight.
>>>
>>> Jeff
>>> <Mail Attachment.gif>
>>>
>>>
>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
>>>
>>> Jeff,
>>>
>>> This seems to be a bit different as the processor is showing data as
>>> having been written and there is a listing of one FlowFile of 381 MB being
>>> transferred out from the processor.  Could you provide additional
>>> information as to how data is not being sent out in the manner
>>> anticipated?  If you can track the issue down more, let us know.  May be
>>> helpful to create another message to help us track the issues separately as
>>> we work through them.
>>>
>>> Thanks!
>>>
>>> Adam,
>>>
>>> Found a sizable JSON file to work against and have been doing some
>>> initial exploration.  With the large files, it certainly is a nontrivial
>>> process.  At cursory inspection, a good portion of processing seems to be
>>> spent on validation.  There are some ways to tweak the strictness of this
>>> with the supporting library, but will have to dive in a bit more.
>>>
>>>
>>>
>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:
>>>
>>>>
>>>>
>>>>
>>>> I’m having a very similar problem.  The process picks up the file, a
>>>> custom processor does it’s thing but no data is sent out.
>>>>
>>>> <unknown.gif>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: NiFi flow provides 0 output on large files

Posted by Jeff <j....@gmail.com>.

That was a rookie mistake.

Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in a log that states a flow file was expired?  

My ultimate goal is to put all of this data into a Confluent Kafka topic, taking advantage of the schema registry. I do not believe the current PutToKafka provides the ability to use this registry correct?   I’m curious if anyone is working on PutToConfluentKafka processor?

Thanks for your help.

Jeff

> On Sep 25, 2015, at 7:52 AM, Matt Gilman <ma...@gmail.com> wrote:
> 
> Jeff,
> 
> What is the expiration setting on your connections? The little clock icon indicates that they are configured to automatically expire flowfiles of a certain age.
> 
> Matt
> 
> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
> 
> Hi Aldrin, 
> 
> After the DDA_Processor
> 
> The below image shows that the GetFile Processed 174.6 MB and the DDA_Processor is working on 1 file (the 1 in the upper right of the DDA_Processor box)
> 
> <unknown.gif>
> 
> The below image shows that the DDA_Processor is complete but data did not make it to ConvertJSONtoAvro.  No errors are being generated.  DDA_Processor takes fixed width data and converts it to JSON.  
> 
> <unknown.gif>
> 
> Thanks
> 
> 
>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <aldrinpiri@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Jeff,
>> 
>> With regards to:
>> 
>> "Anything over, the GetFile and DDA_Processor shows data movement but the no other downstream processor shows movement."
>> 
>> Are you referencing downstream processors starting immediately after the DDA_Processor (ConvertJsonToAvro) or starting immediately after the ConvertJsonToAvro processor?
>> 
>> In the case of starting immediately after the DDA Processor, as it is a custom processor, we would need some additional information as to how this processor is behaving.  In the case of the second condition, if you have some additional context as to the format of the data that is problematic to what you are seeing (the effective "schema" of the JSON) would be helpful in tracking down the issue.
>> 
>> Thanks!
>> Aldrin
>> 
>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
>> Hi Adam,
>> 
>> 
>> I have a flow that does the following;
>> 
>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>> 
>> My source file has 182897 rows at 1001 bytes per row.  If I do any number of rows under ~15000 an output file is created.  Anything over, the GetFile and DDA_Processor shows data movement but the no other downstream processor shows movement.  
>> 
>> I confirmed that it is not a data problem by processing a 10,000 row file successfully, then concatenating 10,000 rows into one file twice.  
>> 
>> Thanks for your insight.
>> 
>> Jeff
>> <Mail Attachment.gif> 
>> 
>> 
>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <aldrinpiri@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Jeff,
>>> 
>>> This seems to be a bit different as the processor is showing data as having been written and there is a listing of one FlowFile of 381 MB being transferred out from the processor.  Could you provide additional information as to how data is not being sent out in the manner anticipated?  If you can track the issue down more, let us know.  May be helpful to create another message to help us track the issues separately as we work through them.
>>> 
>>> Thanks!
>>> 
>>> Adam,
>>> 
>>> Found a sizable JSON file to work against and have been doing some initial exploration.  With the large files, it certainly is a nontrivial process.  At cursory inspection, a good portion of processing seems to be spent on validation.  There are some ways to tweak the strictness of this with the supporting library, but will have to dive in a bit more.
>>> 
>>> 
>>> 
>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> I’m having a very similar problem.  The process picks up the file, a custom processor does it’s thing but no data is sent out.
>>> 
>>> <unknown.gif>
>>> 
>>> 
>> 
>> 
> 
>

Re: NiFi flow provides 0 output on large files

Posted by Matt Gilman <ma...@gmail.com>.

Jeff,

What is the expiration setting on your connections? The little clock icon
indicates that they are configured to automatically expire flowfiles of a
certain age.

Matt

On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j....@gmail.com> wrote:

>
> Hi Aldrin,
>
> After the DDA_Processor
>
> The below image shows that the GetFile Processed 174.6 MB and the
> DDA_Processor is working on 1 file (the 1 in the upper right of the
> DDA_Processor box)
>
> [image: unknown.gif]
>
> The below image shows that the DDA_Processor is complete but data did not
> make it to ConvertJSONtoAvro.  No errors are being generated.
> DDA_Processor takes fixed width data and converts it to JSON.
>
> [image: unknown.gif]
>
> Thanks
>
>
> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <al...@gmail.com> wrote:
>
> Jeff,
>
> With regards to:
>
> "Anything over, the GetFile and DDA_Processor shows data movement but the
> no other downstream processor shows movement."
>
> Are you referencing downstream processors starting immediately after the
> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
> ConvertJsonToAvro processor?
>
> In the case of starting immediately after the DDA Processor, as it is a
> custom processor, we would need some additional information as to how this
> processor is behaving.  In the case of the second condition, if you have
> some additional context as to the format of the data that is problematic to
> what you are seeing (the effective "schema" of the JSON) would be helpful
> in tracking down the issue.
>
> Thanks!
> Aldrin
>
> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j....@gmail.com> wrote:
>
>> Hi Adam,
>>
>>
>> I have a flow that does the following;
>>
>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>
>> My source file has 182897 rows at 1001 bytes per row.  If I do any number
>> of rows under ~15000 an output file is created.  Anything over, the GetFile
>> and DDA_Processor shows data movement but the no other downstream processor
>> shows movement.
>>
>> I confirmed that it is not a data problem by processing a 10,000 row file
>> successfully, then concatenating 10,000 rows into one file twice.
>>
>> Thanks for your insight.
>>
>> Jeff
>> <Mail Attachment.gif>
>>
>>
>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
>>
>> Jeff,
>>
>> This seems to be a bit different as the processor is showing data as
>> having been written and there is a listing of one FlowFile of 381 MB being
>> transferred out from the processor.  Could you provide additional
>> information as to how data is not being sent out in the manner
>> anticipated?  If you can track the issue down more, let us know.  May be
>> helpful to create another message to help us track the issues separately as
>> we work through them.
>>
>> Thanks!
>>
>> Adam,
>>
>> Found a sizable JSON file to work against and have been doing some
>> initial exploration.  With the large files, it certainly is a nontrivial
>> process.  At cursory inspection, a good portion of processing seems to be
>> spent on validation.  There are some ways to tweak the strictness of this
>> with the supporting library, but will have to dive in a bit more.
>>
>>
>>
>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:
>>
>>>
>>>
>>>
>>> I’m having a very similar problem.  The process picks up the file, a
>>> custom processor does it’s thing but no data is sent out.
>>>
>>> <unknown.gif>
>>>
>>>
>>>
>>
>
>

Re: NiFi flow provides 0 output on large files

Posted by Jeff <j....@gmail.com>.

Hi Aldrin, 

After the DDA_Processor

The below image shows that the GetFile Processed 174.6 MB and the DDA_Processor is working on 1 file (the 1 in the upper right of the DDA_Processor box)



The below image shows that the DDA_Processor is complete but data did not make it to ConvertJSONtoAvro.  No errors are being generated.  DDA_Processor takes fixed width data and converts it to JSON.  



Thanks


> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <al...@gmail.com> wrote:
> 
> Jeff,
> 
> With regards to:
> 
> "Anything over, the GetFile and DDA_Processor shows data movement but the no other downstream processor shows movement."
> 
> Are you referencing downstream processors starting immediately after the DDA_Processor (ConvertJsonToAvro) or starting immediately after the ConvertJsonToAvro processor?
> 
> In the case of starting immediately after the DDA Processor, as it is a custom processor, we would need some additional information as to how this processor is behaving.  In the case of the second condition, if you have some additional context as to the format of the data that is problematic to what you are seeing (the effective "schema" of the JSON) would be helpful in tracking down the issue.
> 
> Thanks!
> Aldrin
> 
> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
> Hi Adam,
> 
> 
> I have a flow that does the following;
> 
> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
> 
> My source file has 182897 rows at 1001 bytes per row.  If I do any number of rows under ~15000 an output file is created.  Anything over, the GetFile and DDA_Processor shows data movement but the no other downstream processor shows movement.  
> 
> I confirmed that it is not a data problem by processing a 10,000 row file successfully, then concatenating 10,000 rows into one file twice.  
> 
> Thanks for your insight.
> 
> Jeff
> <Mail Attachment.gif> 
> 
> 
>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <aldrinpiri@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Jeff,
>> 
>> This seems to be a bit different as the processor is showing data as having been written and there is a listing of one FlowFile of 381 MB being transferred out from the processor.  Could you provide additional information as to how data is not being sent out in the manner anticipated?  If you can track the issue down more, let us know.  May be helpful to create another message to help us track the issues separately as we work through them.
>> 
>> Thanks!
>> 
>> Adam,
>> 
>> Found a sizable JSON file to work against and have been doing some initial exploration.  With the large files, it certainly is a nontrivial process.  At cursory inspection, a good portion of processing seems to be spent on validation.  There are some ways to tweak the strictness of this with the supporting library, but will have to dive in a bit more.
>> 
>> 
>> 
>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>> 
>> I’m having a very similar problem.  The process picks up the file, a custom processor does it’s thing but no data is sent out.
>> 
>> <unknown.gif>
>> 
>> 
> 
>

Re: NiFi flow provides 0 output on large files

Posted by Aldrin Piri <al...@gmail.com>.

Jeff,

With regards to:

"Anything over, the GetFile and DDA_Processor shows data movement but the
no other downstream processor shows movement."

Are you referencing downstream processors starting immediately after the
DDA_Processor (ConvertJsonToAvro) or starting immediately after the
ConvertJsonToAvro processor?

In the case of starting immediately after the DDA Processor, as it is a
custom processor, we would need some additional information as to how this
processor is behaving.  In the case of the second condition, if you have
some additional context as to the format of the data that is problematic to
what you are seeing (the effective "schema" of the JSON) would be helpful
in tracking down the issue.

Thanks!
Aldrin

On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j....@gmail.com> wrote:

> Hi Adam,
>
>
> I have a flow that does the following;
>
> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>
> My source file has 182897 rows at 1001 bytes per row.  If I do any number
> of rows under ~15000 an output file is created.  Anything over, the GetFile
> and DDA_Processor shows data movement but the no other downstream processor
> shows movement.
>
> I confirmed that it is not a data problem by processing a 10,000 row file
> successfully, then concatenating 10,000 rows into one file twice.
>
> Thanks for your insight.
>
> Jeff
>
>
>
> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
>
> Jeff,
>
> This seems to be a bit different as the processor is showing data as
> having been written and there is a listing of one FlowFile of 381 MB being
> transferred out from the processor.  Could you provide additional
> information as to how data is not being sent out in the manner
> anticipated?  If you can track the issue down more, let us know.  May be
> helpful to create another message to help us track the issues separately as
> we work through them.
>
> Thanks!
>
> Adam,
>
> Found a sizable JSON file to work against and have been doing some initial
> exploration.  With the large files, it certainly is a nontrivial process.
> At cursory inspection, a good portion of processing seems to be spent on
> validation.  There are some ways to tweak the strictness of this with the
> supporting library, but will have to dive in a bit more.
>
>
>
> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:
>
>>
>>
>>
>> I’m having a very similar problem.  The process picks up the file, a
>> custom processor does it’s thing but no data is sent out.
>>
>> <unknown.gif>
>>
>>
>>
>

NiFi flow provides 0 output on large files

Posted by Jeff <j....@gmail.com>.

Hi Adam,


I have a flow that does the following;

GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile

My source file has 182897 rows at 1001 bytes per row.  If I do any number of rows under ~15000 an output file is created.  Anything over, the GetFile and DDA_Processor shows data movement but the no other downstream processor shows movement.  

I confirmed that it is not a data problem by processing a 10,000 row file successfully, then concatenating 10,000 rows into one file twice.  

Thanks for your insight.

Jeff
 


> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <al...@gmail.com> wrote:
> 
> Jeff,
> 
> This seems to be a bit different as the processor is showing data as having been written and there is a listing of one FlowFile of 381 MB being transferred out from the processor.  Could you provide additional information as to how data is not being sent out in the manner anticipated?  If you can track the issue down more, let us know.  May be helpful to create another message to help us track the issues separately as we work through them.
> 
> Thanks!
> 
> Adam,
> 
> Found a sizable JSON file to work against and have been doing some initial exploration.  With the large files, it certainly is a nontrivial process.  At cursory inspection, a good portion of processing seems to be spent on validation.  There are some ways to tweak the strictness of this with the supporting library, but will have to dive in a bit more.
> 
> 
> 
> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j.007ba7@gmail.com <ma...@gmail.com>> wrote:
> 
> 
> 
> I’m having a very similar problem.  The process picks up the file, a custom processor does it’s thing but no data is sent out.
> 
> <unknown.gif>
> 
>

Re: Array into MongoDB

Posted by Aldrin Piri <al...@gmail.com>.

Jeff,

This seems to be a bit different as the processor is showing data as having
been written and there is a listing of one FlowFile of 381 MB being
transferred out from the processor.  Could you provide additional
information as to how data is not being sent out in the manner
anticipated?  If you can track the issue down more, let us know.  May be
helpful to create another message to help us track the issues separately as
we work through them.

Thanks!

Adam,

Found a sizable JSON file to work against and have been doing some initial
exploration.  With the large files, it certainly is a nontrivial process.
At cursory inspection, a good portion of processing seems to be spent on
validation.  There are some ways to tweak the strictness of this with the
supporting library, but will have to dive in a bit more.



On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j....@gmail.com> wrote:

>
>
>
> I’m having a very similar problem.  The process picks up the file, a
> custom processor does it’s thing but no data is sent out.
>
> [image: unknown.gif]
>
>
>
> On Sep 24, 2015, at 5:56 PM, Adam Williams <aa...@outlook.com>
> wrote:
>
> For JsonSplit i am using just "$" to try and get the array into individual
> objects.  It worked on a small subset, but a large seems to just hang.
>
> ------------------------------
> From: aldrinpiri@gmail.com
> Date: Thu, 24 Sep 2015 18:54:06 -0400
> Subject: Re: Array into MongoDB
> To: users@nifi.apache.org
>
> Bryan is correct about the backing library reading everything into memory
> to do the evaluation.
>
> Might I ask what the expression you are using?
>
> On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <aaronfwilliams@outlook.com
> > wrote:
>
> I tried it even with 6GB and no luck.  It's receiving the flowfiles, but
> nothing is happening after.  If i do it with a small subset (3 JSON
> objects) it works perfect.  When i throw the 180MB file it just spins, no
> logging, errors etc very odd.  Any thoughts?
>
> Thanks
>
> ------------------------------
> From: aaronfwilliams@outlook.com
> To: users@nifi.apache.org
> Subject: RE: Array into MongoDB
> Date: Thu, 24 Sep 2015 21:23:35 +0000
>
>
> Bryan,
>
> I think that is whats happening, fans spinning like crazy, this is my
> current bootstrap.conf.  I will bump it up, are there any other settings i
> should bump too?
>
> java.arg.2=-Xms512m
> java.arg.3=-Xmx2048m
>
> Thanks
>
> ------------------------------
> Date: Thu, 24 Sep 2015 17:20:27 -0400
> Subject: Re: Array into MongoDB
> From: bbende@gmail.com
> To: users@nifi.apache.org
>
> One other thing I thought of... I think the JSON processors read the
> entire FlowFile content into memory to do the splitting/evaluating, so I
> wonder if you are running into a memory issue with a 180MB JSON file.
>
> Are you running with the default configuration of 512mb set in
> conf/bootstrap.conf ?  If so it would be interesting to see what happens if
> you bump that up.
>
> On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bb...@gmail.com> wrote:
>
> Adam,
>
> Based on that message I suspect that MongoDB does not support sending in
> an array of documents since it looks like it expect the first character to
> be the start of a document and not an array.
>
> With regards to the SplitJson processor, if you set the JSON Path to $
> then it should split at the top-level and send out each of your two
> documents on the splits relationship.
>
> -Bryan
>
>
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aaronfwilliams@outlook.com
> > wrote:
>
> I have an array of JSON object I am trying to put into Mongo, but I keep
> hitting this on the PutMongo processor:
>
> ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo
> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert
> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default,
> section=1], offset=0,
> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due
> to org.bson.BsonInvalidOperationException: readStartDocument can only be
> called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only
> be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.
>
>
> I tried to use the splitJson processor to split the array into segments,
> but to my experience I can't pull out each Json Obect.  The splitjson
> processor just hangs and never produces logs or any output at all.  The
> structure of my data is:
>
> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
>
> The JSON file itself is pretty large (>100mb).
>
> Thank you
>
>
>

Re: Array into MongoDB

Posted by Jeff <j....@gmail.com>.



I’m having a very similar problem.  The process picks up the file, a custom processor does it’s thing but no data is sent out.





> On Sep 24, 2015, at 5:56 PM, Adam Williams <aa...@outlook.com> wrote:
> 
> For JsonSplit i am using just "$" to try and get the array into individual objects.  It worked on a small subset, but a large seems to just hang.
> 
> From: aldrinpiri@gmail.com
> Date: Thu, 24 Sep 2015 18:54:06 -0400
> Subject: Re: Array into MongoDB
> To: users@nifi.apache.org
> 
> Bryan is correct about the backing library reading everything into memory to do the evaluation.
> 
> Might I ask what the expression you are using?
> 
> On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <aaronfwilliams@outlook.com <ma...@outlook.com>> wrote:
> I tried it even with 6GB and no luck.  It's receiving the flowfiles, but nothing is happening after.  If i do it with a small subset (3 JSON objects) it works perfect.  When i throw the 180MB file it just spins, no logging, errors etc very odd.  Any thoughts?
> 
> Thanks
> 
> From: aaronfwilliams@outlook.com <ma...@outlook.com>
> To: users@nifi.apache.org <ma...@nifi.apache.org>
> Subject: RE: Array into MongoDB
> Date: Thu, 24 Sep 2015 21:23:35 +0000
> 
> 
> Bryan,
> 
> I think that is whats happening, fans spinning like crazy, this is my current bootstrap.conf.  I will bump it up, are there any other settings i should bump too?
> 
> java.arg.2=-Xms512m
> java.arg.3=-Xmx2048m
> 
> Thanks
> 
> Date: Thu, 24 Sep 2015 17:20:27 -0400
> Subject: Re: Array into MongoDB
> From: bbende@gmail.com <ma...@gmail.com>
> To: users@nifi.apache.org <ma...@nifi.apache.org>
> 
> One other thing I thought of... I think the JSON processors read the entire FlowFile content into memory to do the splitting/evaluating, so I wonder if you are running into a memory issue with a 180MB JSON file.
> 
> Are you running with the default configuration of 512mb set in conf/bootstrap.conf ?  If so it would be interesting to see what happens if you bump that up.
> 
> On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bbende@gmail.com <ma...@gmail.com>> wrote:
> Adam,
> 
> Based on that message I suspect that MongoDB does not support sending in an array of documents since it looks like it expect the first character to be the start of a document and not an array.
> 
> With regards to the SplitJson processor, if you set the JSON Path to $ then it should split at the top-level and send out each of your two documents on the splits relationship.
> 
> -Bryan
> 
> 
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aaronfwilliams@outlook.com <ma...@outlook.com>> wrote:
> I have an array of JSON object I am trying to put into Mongo, but I keep hitting this on the PutMongo processor:
> 
> ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, section=1], offset=0, length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.
> 
> 
> I tried to use the splitJson processor to split the array into segments, but to my experience I can't pull out each Json Obect.  The splitjson processor just hangs and never produces logs or any output at all.  The structure of my data is:
> 
> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
> 
> The JSON file itself is pretty large (>100mb).
> 
> Thank you

RE: Array into MongoDB

Posted by Adam Williams <aa...@outlook.com>.

For JsonSplit i am using just "$" to try and get the array into individual objects.  It worked on a small subset, but a large seems to just hang.

From: aldrinpiri@gmail.com
Date: Thu, 24 Sep 2015 18:54:06 -0400
Subject: Re: Array into MongoDB
To: users@nifi.apache.org

Bryan is correct about the backing library reading everything into memory to do the evaluation.
Might I ask what the expression you are using?
On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <aa...@outlook.com> wrote:

I tried it even with 6GB and no luck.  It's receiving the flowfiles, but nothing is happening after.  If i do it with a small subset (3 JSON objects) it works perfect.  When i throw the 180MB file it just spins, no logging, errors etc very odd.  Any thoughts?
Thanks

From: aaronfwilliams@outlook.com
To: users@nifi.apache.org
Subject: RE: Array into MongoDB
Date: Thu, 24 Sep 2015 21:23:35 +0000

Bryan,
I think that is whats happening, fans spinning like crazy, this is my current bootstrap.conf.  I will bump it up, are there any other settings i should bump too?
java.arg.2=-Xms512m
java.arg.3=-Xmx2048m
Thanks

Date: Thu, 24 Sep 2015 17:20:27 -0400
Subject: Re: Array into MongoDB
From: bbende@gmail.com
To: users@nifi.apache.org

One other thing I thought of... I think the JSON processors read the entire FlowFile content into memory to do the splitting/evaluating, so I wonder if you are running into a memory issue with a 180MB JSON file.
Are you running with the default configuration of 512mb set in conf/bootstrap.conf ?  If so it would be interesting to see what happens if you bump that up.
On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bb...@gmail.com> wrote:
Adam,
Based on that message I suspect that MongoDB does not support sending in an array of documents since it looks like it expect the first character to be the start of a document and not an array.
With regards to the SplitJson processor, if you set the JSON Path to $ then it should split at the top-level and send out each of your two documents on the splits relationship.
-Bryan

On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aa...@outlook.com> wrote:

I have an array of JSON object I am trying to put into Mongo, but I keep hitting this on the PutMongo processor:
ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, section=1], offset=0, length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.

I tried to use the splitJson processor to split the array into segments, but to my experience I can't pull out each Json Obect.  The splitjson processor just hangs and never produces logs or any output at all.  The structure of my data is:
[{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
The JSON file itself is pretty large (>100mb).
Thank you

Re: Array into MongoDB

Posted by Aldrin Piri <al...@gmail.com>.

Bryan is correct about the backing library reading everything into memory
to do the evaluation.

Might I ask what the expression you are using?

On Thu, Sep 24, 2015 at 6:44 PM, Adam Williams <aa...@outlook.com>
wrote:

> I tried it even with 6GB and no luck.  It's receiving the flowfiles, but
> nothing is happening after.  If i do it with a small subset (3 JSON
> objects) it works perfect.  When i throw the 180MB file it just spins, no
> logging, errors etc very odd.  Any thoughts?
>
> Thanks
>
> ------------------------------
> From: aaronfwilliams@outlook.com
> To: users@nifi.apache.org
> Subject: RE: Array into MongoDB
> Date: Thu, 24 Sep 2015 21:23:35 +0000
>
>
> Bryan,
>
> I think that is whats happening, fans spinning like crazy, this is my
> current bootstrap.conf.  I will bump it up, are there any other settings i
> should bump too?
>
> java.arg.2=-Xms512m
>
> java.arg.3=-Xmx2048m
>
> Thanks
>
> ------------------------------
> Date: Thu, 24 Sep 2015 17:20:27 -0400
> Subject: Re: Array into MongoDB
> From: bbende@gmail.com
> To: users@nifi.apache.org
>
> One other thing I thought of... I think the JSON processors read the
> entire FlowFile content into memory to do the splitting/evaluating, so I
> wonder if you are running into a memory issue with a 180MB JSON file.
>
> Are you running with the default configuration of 512mb set in
> conf/bootstrap.conf ?  If so it would be interesting to see what happens if
> you bump that up.
>
> On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bb...@gmail.com> wrote:
>
> Adam,
>
> Based on that message I suspect that MongoDB does not support sending in
> an array of documents since it looks like it expect the first character to
> be the start of a document and not an array.
>
> With regards to the SplitJson processor, if you set the JSON Path to $
> then it should split at the top-level and send out each of your two
> documents on the splits relationship.
>
> -Bryan
>
>
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aaronfwilliams@outlook.com
> > wrote:
>
> I have an array of JSON object I am trying to put into Mongo, but I keep
> hitting this on the PutMongo processor:
>
> ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo
> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert
> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default,
> section=1], offset=0,
> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due
> to org.bson.BsonInvalidOperationException: readStartDocument can only be
> called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only
> be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.
>
>
>
> I tried to use the splitJson processor to split the array into segments,
> but to my experience I can't pull out each Json Obect.  The splitjson
> processor just hangs and never produces logs or any output at all.  The
> structure of my data is:
>
>
> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
>
>
> The JSON file itself is pretty large (>100mb).
>
>
> Thank you
>
>
>
>

RE: Array into MongoDB

Posted by Adam Williams <aa...@outlook.com>.

I tried it even with 6GB and no luck.  It's receiving the flowfiles, but nothing is happening after.  If i do it with a small subset (3 JSON objects) it works perfect.  When i throw the 180MB file it just spins, no logging, errors etc very odd.  Any thoughts?
Thanks

From: aaronfwilliams@outlook.com
To: users@nifi.apache.org
Subject: RE: Array into MongoDB
Date: Thu, 24 Sep 2015 21:23:35 +0000

Bryan,
I think that is whats happening, fans spinning like crazy, this is my current bootstrap.conf.  I will bump it up, are there any other settings i should bump too?
java.arg.2=-Xms512m
java.arg.3=-Xmx2048m
Thanks

Date: Thu, 24 Sep 2015 17:20:27 -0400
Subject: Re: Array into MongoDB
From: bbende@gmail.com
To: users@nifi.apache.org

One other thing I thought of... I think the JSON processors read the entire FlowFile content into memory to do the splitting/evaluating, so I wonder if you are running into a memory issue with a 180MB JSON file.
Are you running with the default configuration of 512mb set in conf/bootstrap.conf ?  If so it would be interesting to see what happens if you bump that up.
On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bb...@gmail.com> wrote:
Adam,
Based on that message I suspect that MongoDB does not support sending in an array of documents since it looks like it expect the first character to be the start of a document and not an array.
With regards to the SplitJson processor, if you set the JSON Path to $ then it should split at the top-level and send out each of your two documents on the splits relationship.
-Bryan

On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aa...@outlook.com> wrote:

I have an array of JSON object I am trying to put into Mongo, but I keep hitting this on the PutMongo processor:
ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, section=1], offset=0, length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.

I tried to use the splitJson processor to split the array into segments, but to my experience I can't pull out each Json Obect.  The splitjson processor just hangs and never produces logs or any output at all.  The structure of my data is:
[{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
The JSON file itself is pretty large (>100mb).
Thank you

RE: Array into MongoDB

Posted by Adam Williams <aa...@outlook.com>.

Bryan,
I think that is whats happening, fans spinning like crazy, this is my current bootstrap.conf.  I will bump it up, are there any other settings i should bump too?
java.arg.2=-Xms512m
java.arg.3=-Xmx2048m
Thanks

Date: Thu, 24 Sep 2015 17:20:27 -0400
Subject: Re: Array into MongoDB
From: bbende@gmail.com
To: users@nifi.apache.org

One other thing I thought of... I think the JSON processors read the entire FlowFile content into memory to do the splitting/evaluating, so I wonder if you are running into a memory issue with a 180MB JSON file.
Are you running with the default configuration of 512mb set in conf/bootstrap.conf ?  If so it would be interesting to see what happens if you bump that up.
On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bb...@gmail.com> wrote:
Adam,
Based on that message I suspect that MongoDB does not support sending in an array of documents since it looks like it expect the first character to be the start of a document and not an array.
With regards to the SplitJson processor, if you set the JSON Path to $ then it should split at the top-level and send out each of your two documents on the splits relationship.
-Bryan

On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aa...@outlook.com> wrote:

I have an array of JSON object I am trying to put into Mongo, but I keep hitting this on the PutMongo processor:
ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default, section=1], offset=0, length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due to org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is ARRAY.

I tried to use the splitJson processor to split the array into segments, but to my experience I can't pull out each Json Obect.  The splitjson processor just hangs and never produces logs or any output at all.  The structure of my data is:
[{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
The JSON file itself is pretty large (>100mb).
Thank you

Re: Array into MongoDB

Posted by Bryan Bende <bb...@gmail.com>.

One other thing I thought of... I think the JSON processors read the entire
FlowFile content into memory to do the splitting/evaluating, so I wonder if
you are running into a memory issue with a 180MB JSON file.

Are you running with the default configuration of 512mb set in
conf/bootstrap.conf ?  If so it would be interesting to see what happens if
you bump that up.

On Thu, Sep 24, 2015 at 5:06 PM, Bryan Bende <bb...@gmail.com> wrote:

> Adam,
>
> Based on that message I suspect that MongoDB does not support sending in
> an array of documents since it looks like it expect the first character to
> be the start of a document and not an array.
>
> With regards to the SplitJson processor, if you set the JSON Path to $
> then it should split at the top-level and send out each of your two
> documents on the splits relationship.
>
> -Bryan
>
>
> On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aaronfwilliams@outlook.com
> > wrote:
>
>> I have an array of JSON object I am trying to put into Mongo, but I keep
>> hitting this on the PutMongo processor:
>>
>> ERROR [Timer-Driven Process Thread-1]
>> o.a.nifi.processors.mongodb.PutMongo
>> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert
>> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default,
>> section=1], offset=0,
>> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due
>> to org.bson.BsonInvalidOperationException: readStartDocument can only be
>> called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
>> ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only
>> be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
>> ARRAY.
>>
>>
>>
>> I tried to use the splitJson processor to split the array into segments,
>> but to my experience I can't pull out each Json Obect.  The splitjson
>> processor just hangs and never produces logs or any output at all.  The
>> structure of my data is:
>>
>>
>> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
>>
>>
>> The JSON file itself is pretty large (>100mb).
>>
>>
>> Thank you
>>
>
>

Re: Array into MongoDB

Posted by Bryan Bende <bb...@gmail.com>.

Adam,

Based on that message I suspect that MongoDB does not support sending in an
array of documents since it looks like it expect the first character to be
the start of a document and not an array.

With regards to the SplitJson processor, if you set the JSON Path to $ then
it should split at the top-level and send out each of your two documents on
the splits relationship.

-Bryan


On Thu, Sep 24, 2015 at 4:36 PM, Adam Williams <aa...@outlook.com>
wrote:

> I have an array of JSON object I am trying to put into Mongo, but I keep
> hitting this on the PutMongo processor:
>
> ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.mongodb.PutMongo
> PutMongo[id=c576f8cc-6e21-4881-a7cd-6e3881838a91] Failed to insert
> StandardFlowFileRecord[uuid=2c670a40-7934-4bc6-b054-1cba23fe7b0f,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1443125646319-1, container=default,
> section=1], offset=0,
> length=208380820],offset=0,name=test.json,size=208380820] into MongoDB due
> to org.bson.BsonInvalidOperationException: readStartDocument can only be
> called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.: org.bson.BsonInvalidOperationException: readStartDocument can only
> be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is
> ARRAY.
>
>
>
> I tried to use the splitJson processor to split the array into segments,
> but to my experience I can't pull out each Json Obect.  The splitjson
> processor just hangs and never produces logs or any output at all.  The
> structure of my data is:
>
>
> [{"id":1, "stat":"something"},{"id":2, "stat":"anothersomething"}]
>
>
> The JSON file itself is pretty large (>100mb).
>
>
> Thank you
>