You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Adam Williams <aa...@outlook.com> on 2015/09/21 20:17:48 UTC

CSV to Mongo

Hello,

I'm moving from storm to NiFi and trying to do a simple test with getting a large CSV file dumped into MongoDB.  The CSV file has a header with column names and it is structured, my only problem is dumping it into MongoDB.  At a high level, do the following processor steps look correct?  All i want is to just pull the whole CSV file over the MongoDB without a regex or anything fancy (yet).  I eventually always seem to hit trouble with array index problems with the putmongo processor:
GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> PutMongo.
Does that seem to be the right way to do this in NiFi?
Thank you,Adam 		 	   		   		 	   		  

RE: CSV to Mongo

Posted by Adam Williams <aa...@outlook.com>.
Thank you Bryan & everyone.  I will check out that template, looks perfect for me!

Date: Tue, 22 Sep 2015 09:36:27 -0400
Subject: Re: CSV to Mongo
From: joe.witt@gmail.com
To: users@nifi.apache.org

There aren't any plans.  But is an awesome idea and great JIRA.
Thanks

Joe
On Sep 22, 2015 9:31 AM, "Jonathan Lyons" <jo...@jaroop.com> wrote:
Speaking of CSV to JSON conversion, is there any interest in implementing schema inference in general, and specifically schema inference for CSV files? This is something that was added to spark-csv recently (https://github.com/databricks/spark-csv/pull/93). Any thoughts?
On Tue, Sep 22, 2015 at 9:16 AM, Bryan Bende <bb...@gmail.com> wrote:
Andrew,
If you are interested in the ExtractText+ReplaceText approach, I posted an example template that shows how to convert a line from a CSV file to a JSON document [1].
The first part of the flow is just for testing and generates a flow file with the content set to "a,b,c,d", then the ExtractText pulls those values into attributes (csv.1, csv.2, csv.3, csv.4) and ReplaceText uses them to build a JSON document.
-Bryan
[1] https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates  (CsvToJson)

On Mon, Sep 21, 2015 at 4:40 PM, Bryan Bende <bb...@gmail.com> wrote:
Yup, Joe beat me too it, but was going to suggest those options... 
In the second case, you would probably use SplitText to get each line of the CSV as a FlowFile, then ExtractText to pull out every value of the line into attributes, then ReplaceText would construct a JSON document using expression language to access the attributes from ExtractText.
On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <jo...@gmail.com> wrote:
Adam, Bryan,



Could do the CSV to Avro processor and then follow it with the Avro to

JSON processor.  Alternatively, could use ExtractText to pull the

fields as attributes and then use ReplaceText to produce a JSON

output.



Thanks

Joe



On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams

<aa...@outlook.com> wrote:

> Bryan,

>

> Thanks for the feedback.  I stripped the ExtractText and tried routing all

> unmatched traffic to Mongo as well, hence the CSV import problems.  Off the

> top of my head i do not think MongoDB allows CSV inserts through the java

> client, we've always had to work with the JSON/document model for it.  For a

> CSV format, it would have to be similar to this idea:

> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java

>

> So looking at the other processors in NiFi, is there a way then to move from

> a CSV format to JSON before putting to Mongo?

>

> ________________________________

> Date: Mon, 21 Sep 2015 16:09:10 -0400

>

> Subject: Re: CSV to Mongo

> From: bbende@gmail.com

> To: users@nifi.apache.org

>

> Adam,

>

> I was able import the full template, thanks. A couple of things...

>

> The ExtractText processor works by adding user-defined properties  (the +

> icon in the top-right of the properties window) where the property name is a

> destination attribute and the value is a regular expression.

> Right now there weren't any regular expressions defined so that processor

> will always route the file to 'unmatched'. Generally you would probably want

> to route the matched files to the next processor, and then auto-terminate

> the unmatched relationship (assuming you want to filter out non-matches).

>

> Do you know if MongoDB supports inserting a CSV file through their Java

> client? do you have similar code that already does this in Storm?

>

> I am honestly not that familiar with MongoDB, but in the PutMongo processor

> it takes the incoming data and calls:

> Document doc = Document.parse(new String(content, charset));

>

> Looking at that Document.parse() method, it looks like it expects a JSON

> document, so I just want to make sure that we expect CSV insertions to work

> here.

> In researching this, it looks Mongo has some kind of bulkimport utility that

> handles CSV [1], but this is a command line utility.

>

> -Bryan

>

> [1] http://docs.mongodb.org/manual/reference/program/mongoimport/

>

>

> On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <aa...@outlook.com>

> wrote:

>

> Sorry about that, this should work.  Attached the template and the below

> error:

>

> 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]

> o.a.nifi.processors.mongodb.PutMongo

> PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert

> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]

> into MongoDB due to java.lang.NegativeArraySizeException:

> java.lang.NegativeArraySizeException

>

> ________________________________

> Date: Mon, 21 Sep 2015 15:12:43 -0400

> Subject: Re: CSV to Mongo

> From: bbende@gmail.com

> To: users@nifi.apache.org

>

>

> Adam,

>

> I imported the template and it looks like it only captured the PutMongo

> processor. Can you try deselecting everything on the graph and creating the

> template again so we can take a look at the rest of the flow? or if you have

> other stuff on your graph, select all of the processors you described so

> they all get captured.

>

> Also, can you provide any of the stacktrace for the exception you are

> seeing? The log is in NIFI_HOME/logs/nifi-app.log

>

> Thanks,

>

> Bryan

>

>

> On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:

>

> Adam,

>

> Thanks for attaching the template, we will take a look and see what is going

> on.

>

> Thanks,

>

> Bryan

>

>

> On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <aa...@outlook.com>

> wrote:

>

> Hey Joe,

>

> Sure thing.  I attached the template, I'm just taking the GDELT data set for

> the getFile Processor which works.  The error i get is a negative array.

>

>

>

>> Date: Mon, 21 Sep 2015 14:24:50 -0400

>> Subject: Re: CSV to Mongo

>> From: joe.witt@gmail.com

>> To: users@nifi.apache.org

>

>>

>> Adam,

>>

>> Regarding moving from Storm to NiFi i'd say they make better teammates

>> than competitors. The use case outlines above should be quite easy

>> for NiFi but there are analytic/processing functions Storm is probably

>> a better answer for. We're happy to help explore that with you as you

>> progress.

>>

>> If you ever run into an ArrayIndexBoundsException.. then it will

>> always be 100% a coding error. Would you mind sending your

>> flow.xml.gz over or making a template of the flow (assuming it

>> contains nothing sensitive)? If at all possible sample data which

>> exposes the issue would be ideal. As an alternative can you go ahead

>> and send us the resulting stack trace/error that comes out?

>>

>> We'll get this addressed.

>>

>> Thanks

>> Joe

>>

>> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams

>> <aa...@outlook.com> wrote:

>> > Hello,

>> >

>> > I'm moving from storm to NiFi and trying to do a simple test with

>> > getting a

>> > large CSV file dumped into MongoDB. The CSV file has a header with

>> > column

>> > names and it is structured, my only problem is dumping it into MongoDB.

>> > At

>> > a high level, do the following processor steps look correct? All i want

>> > is

>> > to just pull the whole CSV file over the MongoDB without a regex or

>> > anything

>> > fancy (yet). I eventually always seem to hit trouble with array index

>> > problems with the putmongo processor:

>> >

>> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->

>> > PutMongo.

>> >

>> > Does that seem to be the right way to do this in NiFi?

>> >

>> > Thank you,

>> > Adam

>

>

>

>







 		 	   		  

Re: CSV to Mongo

Posted by Joe Witt <jo...@gmail.com>.
There aren't any plans.  But is an awesome idea and great JIRA.

Thanks
Joe
On Sep 22, 2015 9:31 AM, "Jonathan Lyons" <jo...@jaroop.com> wrote:

> Speaking of CSV to JSON conversion, is there any interest in implementing
> schema inference in general, and specifically schema inference for CSV
> files? This is something that was added to spark-csv recently (
> https://github.com/databricks/spark-csv/pull/93). Any thoughts?
>
> On Tue, Sep 22, 2015 at 9:16 AM, Bryan Bende <bb...@gmail.com> wrote:
>
>> Andrew,
>>
>> If you are interested in the ExtractText+ReplaceText approach, I posted
>> an example template that shows how to convert a line from a CSV file to a
>> JSON document [1].
>>
>> The first part of the flow is just for testing and generates a flow file
>> with the content set to "a,b,c,d", then the ExtractText pulls those values
>> into attributes (csv.1, csv.2, csv.3, csv.4) and ReplaceText uses them to
>> build a JSON document.
>>
>> -Bryan
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
>>  (CsvToJson)
>>
>>
>> On Mon, Sep 21, 2015 at 4:40 PM, Bryan Bende <bb...@gmail.com> wrote:
>>
>>> Yup, Joe beat me too it, but was going to suggest those options...
>>>
>>> In the second case, you would probably use SplitText to get each line of
>>> the CSV as a FlowFile, then ExtractText to pull out every value of the line
>>> into attributes, then ReplaceText would construct a JSON document using
>>> expression language to access the attributes from ExtractText.
>>>
>>> On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <jo...@gmail.com> wrote:
>>>
>>>> Adam, Bryan,
>>>>
>>>> Could do the CSV to Avro processor and then follow it with the Avro to
>>>> JSON processor.  Alternatively, could use ExtractText to pull the
>>>> fields as attributes and then use ReplaceText to produce a JSON
>>>> output.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams
>>>> <aa...@outlook.com> wrote:
>>>> > Bryan,
>>>> >
>>>> > Thanks for the feedback.  I stripped the ExtractText and tried
>>>> routing all
>>>> > unmatched traffic to Mongo as well, hence the CSV import problems.
>>>> Off the
>>>> > top of my head i do not think MongoDB allows CSV inserts through the
>>>> java
>>>> > client, we've always had to work with the JSON/document model for
>>>> it.  For a
>>>> > CSV format, it would have to be similar to this idea:
>>>> >
>>>> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
>>>> >
>>>> > So looking at the other processors in NiFi, is there a way then to
>>>> move from
>>>> > a CSV format to JSON before putting to Mongo?
>>>> >
>>>> > ________________________________
>>>> > Date: Mon, 21 Sep 2015 16:09:10 -0400
>>>> >
>>>> > Subject: Re: CSV to Mongo
>>>> > From: bbende@gmail.com
>>>> > To: users@nifi.apache.org
>>>> >
>>>> > Adam,
>>>> >
>>>> > I was able import the full template, thanks. A couple of things...
>>>> >
>>>> > The ExtractText processor works by adding user-defined properties
>>>> (the +
>>>> > icon in the top-right of the properties window) where the property
>>>> name is a
>>>> > destination attribute and the value is a regular expression.
>>>> > Right now there weren't any regular expressions defined so that
>>>> processor
>>>> > will always route the file to 'unmatched'. Generally you would
>>>> probably want
>>>> > to route the matched files to the next processor, and then
>>>> auto-terminate
>>>> > the unmatched relationship (assuming you want to filter out
>>>> non-matches).
>>>> >
>>>> > Do you know if MongoDB supports inserting a CSV file through their
>>>> Java
>>>> > client? do you have similar code that already does this in Storm?
>>>> >
>>>> > I am honestly not that familiar with MongoDB, but in the PutMongo
>>>> processor
>>>> > it takes the incoming data and calls:
>>>> > Document doc = Document.parse(new String(content, charset));
>>>> >
>>>> > Looking at that Document.parse() method, it looks like it expects a
>>>> JSON
>>>> > document, so I just want to make sure that we expect CSV insertions
>>>> to work
>>>> > here.
>>>> > In researching this, it looks Mongo has some kind of bulkimport
>>>> utility that
>>>> > handles CSV [1], but this is a command line utility.
>>>> >
>>>> > -Bryan
>>>> >
>>>> > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/
>>>> >
>>>> >
>>>> > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <
>>>> aaronfwilliams@outlook.com>
>>>> > wrote:
>>>> >
>>>> > Sorry about that, this should work.  Attached the template and the
>>>> below
>>>> > error:
>>>> >
>>>> > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]
>>>> > o.a.nifi.processors.mongodb.PutMongo
>>>> > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert
>>>> >
>>>> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38
>>>> ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
>>>> > into MongoDB due to java.lang.NegativeArraySizeException:
>>>> > java.lang.NegativeArraySizeException
>>>> >
>>>> > ________________________________
>>>> > Date: Mon, 21 Sep 2015 15:12:43 -0400
>>>> > Subject: Re: CSV to Mongo
>>>> > From: bbende@gmail.com
>>>> > To: users@nifi.apache.org
>>>> >
>>>> >
>>>> > Adam,
>>>> >
>>>> > I imported the template and it looks like it only captured the
>>>> PutMongo
>>>> > processor. Can you try deselecting everything on the graph and
>>>> creating the
>>>> > template again so we can take a look at the rest of the flow? or if
>>>> you have
>>>> > other stuff on your graph, select all of the processors you described
>>>> so
>>>> > they all get captured.
>>>> >
>>>> > Also, can you provide any of the stacktrace for the exception you are
>>>> > seeing? The log is in NIFI_HOME/logs/nifi-app.log
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Bryan
>>>> >
>>>> >
>>>> > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Adam,
>>>> >
>>>> > Thanks for attaching the template, we will take a look and see what
>>>> is going
>>>> > on.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Bryan
>>>> >
>>>> >
>>>> > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <
>>>> aaronfwilliams@outlook.com>
>>>> > wrote:
>>>> >
>>>> > Hey Joe,
>>>> >
>>>> > Sure thing.  I attached the template, I'm just taking the GDELT data
>>>> set for
>>>> > the getFile Processor which works.  The error i get is a negative
>>>> array.
>>>> >
>>>> >
>>>> >
>>>> >> Date: Mon, 21 Sep 2015 14:24:50 -0400
>>>> >> Subject: Re: CSV to Mongo
>>>> >> From: joe.witt@gmail.com
>>>> >> To: users@nifi.apache.org
>>>> >
>>>> >>
>>>> >> Adam,
>>>> >>
>>>> >> Regarding moving from Storm to NiFi i'd say they make better
>>>> teammates
>>>> >> than competitors. The use case outlines above should be quite easy
>>>> >> for NiFi but there are analytic/processing functions Storm is
>>>> probably
>>>> >> a better answer for. We're happy to help explore that with you as you
>>>> >> progress.
>>>> >>
>>>> >> If you ever run into an ArrayIndexBoundsException.. then it will
>>>> >> always be 100% a coding error. Would you mind sending your
>>>> >> flow.xml.gz over or making a template of the flow (assuming it
>>>> >> contains nothing sensitive)? If at all possible sample data which
>>>> >> exposes the issue would be ideal. As an alternative can you go ahead
>>>> >> and send us the resulting stack trace/error that comes out?
>>>> >>
>>>> >> We'll get this addressed.
>>>> >>
>>>> >> Thanks
>>>> >> Joe
>>>> >>
>>>> >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
>>>> >> <aa...@outlook.com> wrote:
>>>> >> > Hello,
>>>> >> >
>>>> >> > I'm moving from storm to NiFi and trying to do a simple test with
>>>> >> > getting a
>>>> >> > large CSV file dumped into MongoDB. The CSV file has a header with
>>>> >> > column
>>>> >> > names and it is structured, my only problem is dumping it into
>>>> MongoDB.
>>>> >> > At
>>>> >> > a high level, do the following processor steps look correct? All i
>>>> want
>>>> >> > is
>>>> >> > to just pull the whole CSV file over the MongoDB without a regex or
>>>> >> > anything
>>>> >> > fancy (yet). I eventually always seem to hit trouble with array
>>>> index
>>>> >> > problems with the putmongo processor:
>>>> >> >
>>>> >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
>>>> >> > PutMongo.
>>>> >> >
>>>> >> > Does that seem to be the right way to do this in NiFi?
>>>> >> >
>>>> >> > Thank you,
>>>> >> > Adam
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: CSV to Mongo

Posted by Jonathan Lyons <jo...@jaroop.com>.
Speaking of CSV to JSON conversion, is there any interest in implementing
schema inference in general, and specifically schema inference for CSV
files? This is something that was added to spark-csv recently (
https://github.com/databricks/spark-csv/pull/93). Any thoughts?

On Tue, Sep 22, 2015 at 9:16 AM, Bryan Bende <bb...@gmail.com> wrote:

> Andrew,
>
> If you are interested in the ExtractText+ReplaceText approach, I posted an
> example template that shows how to convert a line from a CSV file to a JSON
> document [1].
>
> The first part of the flow is just for testing and generates a flow file
> with the content set to "a,b,c,d", then the ExtractText pulls those values
> into attributes (csv.1, csv.2, csv.3, csv.4) and ReplaceText uses them to
> build a JSON document.
>
> -Bryan
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
>  (CsvToJson)
>
>
> On Mon, Sep 21, 2015 at 4:40 PM, Bryan Bende <bb...@gmail.com> wrote:
>
>> Yup, Joe beat me too it, but was going to suggest those options...
>>
>> In the second case, you would probably use SplitText to get each line of
>> the CSV as a FlowFile, then ExtractText to pull out every value of the line
>> into attributes, then ReplaceText would construct a JSON document using
>> expression language to access the attributes from ExtractText.
>>
>> On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <jo...@gmail.com> wrote:
>>
>>> Adam, Bryan,
>>>
>>> Could do the CSV to Avro processor and then follow it with the Avro to
>>> JSON processor.  Alternatively, could use ExtractText to pull the
>>> fields as attributes and then use ReplaceText to produce a JSON
>>> output.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams
>>> <aa...@outlook.com> wrote:
>>> > Bryan,
>>> >
>>> > Thanks for the feedback.  I stripped the ExtractText and tried routing
>>> all
>>> > unmatched traffic to Mongo as well, hence the CSV import problems.
>>> Off the
>>> > top of my head i do not think MongoDB allows CSV inserts through the
>>> java
>>> > client, we've always had to work with the JSON/document model for it.
>>> For a
>>> > CSV format, it would have to be similar to this idea:
>>> >
>>> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
>>> >
>>> > So looking at the other processors in NiFi, is there a way then to
>>> move from
>>> > a CSV format to JSON before putting to Mongo?
>>> >
>>> > ________________________________
>>> > Date: Mon, 21 Sep 2015 16:09:10 -0400
>>> >
>>> > Subject: Re: CSV to Mongo
>>> > From: bbende@gmail.com
>>> > To: users@nifi.apache.org
>>> >
>>> > Adam,
>>> >
>>> > I was able import the full template, thanks. A couple of things...
>>> >
>>> > The ExtractText processor works by adding user-defined properties
>>> (the +
>>> > icon in the top-right of the properties window) where the property
>>> name is a
>>> > destination attribute and the value is a regular expression.
>>> > Right now there weren't any regular expressions defined so that
>>> processor
>>> > will always route the file to 'unmatched'. Generally you would
>>> probably want
>>> > to route the matched files to the next processor, and then
>>> auto-terminate
>>> > the unmatched relationship (assuming you want to filter out
>>> non-matches).
>>> >
>>> > Do you know if MongoDB supports inserting a CSV file through their Java
>>> > client? do you have similar code that already does this in Storm?
>>> >
>>> > I am honestly not that familiar with MongoDB, but in the PutMongo
>>> processor
>>> > it takes the incoming data and calls:
>>> > Document doc = Document.parse(new String(content, charset));
>>> >
>>> > Looking at that Document.parse() method, it looks like it expects a
>>> JSON
>>> > document, so I just want to make sure that we expect CSV insertions to
>>> work
>>> > here.
>>> > In researching this, it looks Mongo has some kind of bulkimport
>>> utility that
>>> > handles CSV [1], but this is a command line utility.
>>> >
>>> > -Bryan
>>> >
>>> > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/
>>> >
>>> >
>>> > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <
>>> aaronfwilliams@outlook.com>
>>> > wrote:
>>> >
>>> > Sorry about that, this should work.  Attached the template and the
>>> below
>>> > error:
>>> >
>>> > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]
>>> > o.a.nifi.processors.mongodb.PutMongo
>>> > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert
>>> >
>>> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38
>>> ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
>>> > into MongoDB due to java.lang.NegativeArraySizeException:
>>> > java.lang.NegativeArraySizeException
>>> >
>>> > ________________________________
>>> > Date: Mon, 21 Sep 2015 15:12:43 -0400
>>> > Subject: Re: CSV to Mongo
>>> > From: bbende@gmail.com
>>> > To: users@nifi.apache.org
>>> >
>>> >
>>> > Adam,
>>> >
>>> > I imported the template and it looks like it only captured the PutMongo
>>> > processor. Can you try deselecting everything on the graph and
>>> creating the
>>> > template again so we can take a look at the rest of the flow? or if
>>> you have
>>> > other stuff on your graph, select all of the processors you described
>>> so
>>> > they all get captured.
>>> >
>>> > Also, can you provide any of the stacktrace for the exception you are
>>> > seeing? The log is in NIFI_HOME/logs/nifi-app.log
>>> >
>>> > Thanks,
>>> >
>>> > Bryan
>>> >
>>> >
>>> > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:
>>> >
>>> > Adam,
>>> >
>>> > Thanks for attaching the template, we will take a look and see what is
>>> going
>>> > on.
>>> >
>>> > Thanks,
>>> >
>>> > Bryan
>>> >
>>> >
>>> > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <
>>> aaronfwilliams@outlook.com>
>>> > wrote:
>>> >
>>> > Hey Joe,
>>> >
>>> > Sure thing.  I attached the template, I'm just taking the GDELT data
>>> set for
>>> > the getFile Processor which works.  The error i get is a negative
>>> array.
>>> >
>>> >
>>> >
>>> >> Date: Mon, 21 Sep 2015 14:24:50 -0400
>>> >> Subject: Re: CSV to Mongo
>>> >> From: joe.witt@gmail.com
>>> >> To: users@nifi.apache.org
>>> >
>>> >>
>>> >> Adam,
>>> >>
>>> >> Regarding moving from Storm to NiFi i'd say they make better teammates
>>> >> than competitors. The use case outlines above should be quite easy
>>> >> for NiFi but there are analytic/processing functions Storm is probably
>>> >> a better answer for. We're happy to help explore that with you as you
>>> >> progress.
>>> >>
>>> >> If you ever run into an ArrayIndexBoundsException.. then it will
>>> >> always be 100% a coding error. Would you mind sending your
>>> >> flow.xml.gz over or making a template of the flow (assuming it
>>> >> contains nothing sensitive)? If at all possible sample data which
>>> >> exposes the issue would be ideal. As an alternative can you go ahead
>>> >> and send us the resulting stack trace/error that comes out?
>>> >>
>>> >> We'll get this addressed.
>>> >>
>>> >> Thanks
>>> >> Joe
>>> >>
>>> >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
>>> >> <aa...@outlook.com> wrote:
>>> >> > Hello,
>>> >> >
>>> >> > I'm moving from storm to NiFi and trying to do a simple test with
>>> >> > getting a
>>> >> > large CSV file dumped into MongoDB. The CSV file has a header with
>>> >> > column
>>> >> > names and it is structured, my only problem is dumping it into
>>> MongoDB.
>>> >> > At
>>> >> > a high level, do the following processor steps look correct? All i
>>> want
>>> >> > is
>>> >> > to just pull the whole CSV file over the MongoDB without a regex or
>>> >> > anything
>>> >> > fancy (yet). I eventually always seem to hit trouble with array
>>> index
>>> >> > problems with the putmongo processor:
>>> >> >
>>> >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
>>> >> > PutMongo.
>>> >> >
>>> >> > Does that seem to be the right way to do this in NiFi?
>>> >> >
>>> >> > Thank you,
>>> >> > Adam
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>

Re: CSV to Mongo

Posted by Bryan Bende <bb...@gmail.com>.
Andrew,

If you are interested in the ExtractText+ReplaceText approach, I posted an
example template that shows how to convert a line from a CSV file to a JSON
document [1].

The first part of the flow is just for testing and generates a flow file
with the content set to "a,b,c,d", then the ExtractText pulls those values
into attributes (csv.1, csv.2, csv.3, csv.4) and ReplaceText uses them to
build a JSON document.

-Bryan

[1]
https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
 (CsvToJson)


On Mon, Sep 21, 2015 at 4:40 PM, Bryan Bende <bb...@gmail.com> wrote:

> Yup, Joe beat me too it, but was going to suggest those options...
>
> In the second case, you would probably use SplitText to get each line of
> the CSV as a FlowFile, then ExtractText to pull out every value of the line
> into attributes, then ReplaceText would construct a JSON document using
> expression language to access the attributes from ExtractText.
>
> On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <jo...@gmail.com> wrote:
>
>> Adam, Bryan,
>>
>> Could do the CSV to Avro processor and then follow it with the Avro to
>> JSON processor.  Alternatively, could use ExtractText to pull the
>> fields as attributes and then use ReplaceText to produce a JSON
>> output.
>>
>> Thanks
>> Joe
>>
>> On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams
>> <aa...@outlook.com> wrote:
>> > Bryan,
>> >
>> > Thanks for the feedback.  I stripped the ExtractText and tried routing
>> all
>> > unmatched traffic to Mongo as well, hence the CSV import problems.  Off
>> the
>> > top of my head i do not think MongoDB allows CSV inserts through the
>> java
>> > client, we've always had to work with the JSON/document model for it.
>> For a
>> > CSV format, it would have to be similar to this idea:
>> >
>> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
>> >
>> > So looking at the other processors in NiFi, is there a way then to move
>> from
>> > a CSV format to JSON before putting to Mongo?
>> >
>> > ________________________________
>> > Date: Mon, 21 Sep 2015 16:09:10 -0400
>> >
>> > Subject: Re: CSV to Mongo
>> > From: bbende@gmail.com
>> > To: users@nifi.apache.org
>> >
>> > Adam,
>> >
>> > I was able import the full template, thanks. A couple of things...
>> >
>> > The ExtractText processor works by adding user-defined properties  (the
>> +
>> > icon in the top-right of the properties window) where the property name
>> is a
>> > destination attribute and the value is a regular expression.
>> > Right now there weren't any regular expressions defined so that
>> processor
>> > will always route the file to 'unmatched'. Generally you would probably
>> want
>> > to route the matched files to the next processor, and then
>> auto-terminate
>> > the unmatched relationship (assuming you want to filter out
>> non-matches).
>> >
>> > Do you know if MongoDB supports inserting a CSV file through their Java
>> > client? do you have similar code that already does this in Storm?
>> >
>> > I am honestly not that familiar with MongoDB, but in the PutMongo
>> processor
>> > it takes the incoming data and calls:
>> > Document doc = Document.parse(new String(content, charset));
>> >
>> > Looking at that Document.parse() method, it looks like it expects a JSON
>> > document, so I just want to make sure that we expect CSV insertions to
>> work
>> > here.
>> > In researching this, it looks Mongo has some kind of bulkimport utility
>> that
>> > handles CSV [1], but this is a command line utility.
>> >
>> > -Bryan
>> >
>> > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/
>> >
>> >
>> > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <
>> aaronfwilliams@outlook.com>
>> > wrote:
>> >
>> > Sorry about that, this should work.  Attached the template and the below
>> > error:
>> >
>> > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]
>> > o.a.nifi.processors.mongodb.PutMongo
>> > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert
>> >
>> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38
>> ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
>> > into MongoDB due to java.lang.NegativeArraySizeException:
>> > java.lang.NegativeArraySizeException
>> >
>> > ________________________________
>> > Date: Mon, 21 Sep 2015 15:12:43 -0400
>> > Subject: Re: CSV to Mongo
>> > From: bbende@gmail.com
>> > To: users@nifi.apache.org
>> >
>> >
>> > Adam,
>> >
>> > I imported the template and it looks like it only captured the PutMongo
>> > processor. Can you try deselecting everything on the graph and creating
>> the
>> > template again so we can take a look at the rest of the flow? or if you
>> have
>> > other stuff on your graph, select all of the processors you described so
>> > they all get captured.
>> >
>> > Also, can you provide any of the stacktrace for the exception you are
>> > seeing? The log is in NIFI_HOME/logs/nifi-app.log
>> >
>> > Thanks,
>> >
>> > Bryan
>> >
>> >
>> > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:
>> >
>> > Adam,
>> >
>> > Thanks for attaching the template, we will take a look and see what is
>> going
>> > on.
>> >
>> > Thanks,
>> >
>> > Bryan
>> >
>> >
>> > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <
>> aaronfwilliams@outlook.com>
>> > wrote:
>> >
>> > Hey Joe,
>> >
>> > Sure thing.  I attached the template, I'm just taking the GDELT data
>> set for
>> > the getFile Processor which works.  The error i get is a negative array.
>> >
>> >
>> >
>> >> Date: Mon, 21 Sep 2015 14:24:50 -0400
>> >> Subject: Re: CSV to Mongo
>> >> From: joe.witt@gmail.com
>> >> To: users@nifi.apache.org
>> >
>> >>
>> >> Adam,
>> >>
>> >> Regarding moving from Storm to NiFi i'd say they make better teammates
>> >> than competitors. The use case outlines above should be quite easy
>> >> for NiFi but there are analytic/processing functions Storm is probably
>> >> a better answer for. We're happy to help explore that with you as you
>> >> progress.
>> >>
>> >> If you ever run into an ArrayIndexBoundsException.. then it will
>> >> always be 100% a coding error. Would you mind sending your
>> >> flow.xml.gz over or making a template of the flow (assuming it
>> >> contains nothing sensitive)? If at all possible sample data which
>> >> exposes the issue would be ideal. As an alternative can you go ahead
>> >> and send us the resulting stack trace/error that comes out?
>> >>
>> >> We'll get this addressed.
>> >>
>> >> Thanks
>> >> Joe
>> >>
>> >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
>> >> <aa...@outlook.com> wrote:
>> >> > Hello,
>> >> >
>> >> > I'm moving from storm to NiFi and trying to do a simple test with
>> >> > getting a
>> >> > large CSV file dumped into MongoDB. The CSV file has a header with
>> >> > column
>> >> > names and it is structured, my only problem is dumping it into
>> MongoDB.
>> >> > At
>> >> > a high level, do the following processor steps look correct? All i
>> want
>> >> > is
>> >> > to just pull the whole CSV file over the MongoDB without a regex or
>> >> > anything
>> >> > fancy (yet). I eventually always seem to hit trouble with array index
>> >> > problems with the putmongo processor:
>> >> >
>> >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
>> >> > PutMongo.
>> >> >
>> >> > Does that seem to be the right way to do this in NiFi?
>> >> >
>> >> > Thank you,
>> >> > Adam
>> >
>> >
>> >
>> >
>>
>
>

Re: CSV to Mongo

Posted by Bryan Bende <bb...@gmail.com>.
Yup, Joe beat me too it, but was going to suggest those options...

In the second case, you would probably use SplitText to get each line of
the CSV as a FlowFile, then ExtractText to pull out every value of the line
into attributes, then ReplaceText would construct a JSON document using
expression language to access the attributes from ExtractText.

On Mon, Sep 21, 2015 at 4:33 PM, Joe Witt <jo...@gmail.com> wrote:

> Adam, Bryan,
>
> Could do the CSV to Avro processor and then follow it with the Avro to
> JSON processor.  Alternatively, could use ExtractText to pull the
> fields as attributes and then use ReplaceText to produce a JSON
> output.
>
> Thanks
> Joe
>
> On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams
> <aa...@outlook.com> wrote:
> > Bryan,
> >
> > Thanks for the feedback.  I stripped the ExtractText and tried routing
> all
> > unmatched traffic to Mongo as well, hence the CSV import problems.  Off
> the
> > top of my head i do not think MongoDB allows CSV inserts through the java
> > client, we've always had to work with the JSON/document model for it.
> For a
> > CSV format, it would have to be similar to this idea:
> >
> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
> >
> > So looking at the other processors in NiFi, is there a way then to move
> from
> > a CSV format to JSON before putting to Mongo?
> >
> > ________________________________
> > Date: Mon, 21 Sep 2015 16:09:10 -0400
> >
> > Subject: Re: CSV to Mongo
> > From: bbende@gmail.com
> > To: users@nifi.apache.org
> >
> > Adam,
> >
> > I was able import the full template, thanks. A couple of things...
> >
> > The ExtractText processor works by adding user-defined properties  (the +
> > icon in the top-right of the properties window) where the property name
> is a
> > destination attribute and the value is a regular expression.
> > Right now there weren't any regular expressions defined so that processor
> > will always route the file to 'unmatched'. Generally you would probably
> want
> > to route the matched files to the next processor, and then auto-terminate
> > the unmatched relationship (assuming you want to filter out non-matches).
> >
> > Do you know if MongoDB supports inserting a CSV file through their Java
> > client? do you have similar code that already does this in Storm?
> >
> > I am honestly not that familiar with MongoDB, but in the PutMongo
> processor
> > it takes the incoming data and calls:
> > Document doc = Document.parse(new String(content, charset));
> >
> > Looking at that Document.parse() method, it looks like it expects a JSON
> > document, so I just want to make sure that we expect CSV insertions to
> work
> > here.
> > In researching this, it looks Mongo has some kind of bulkimport utility
> that
> > handles CSV [1], but this is a command line utility.
> >
> > -Bryan
> >
> > [1] http://docs.mongodb.org/manual/reference/program/mongoimport/
> >
> >
> > On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <
> aaronfwilliams@outlook.com>
> > wrote:
> >
> > Sorry about that, this should work.  Attached the template and the below
> > error:
> >
> > 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]
> > o.a.nifi.processors.mongodb.PutMongo
> > PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert
> >
> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38
> ,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
> > into MongoDB due to java.lang.NegativeArraySizeException:
> > java.lang.NegativeArraySizeException
> >
> > ________________________________
> > Date: Mon, 21 Sep 2015 15:12:43 -0400
> > Subject: Re: CSV to Mongo
> > From: bbende@gmail.com
> > To: users@nifi.apache.org
> >
> >
> > Adam,
> >
> > I imported the template and it looks like it only captured the PutMongo
> > processor. Can you try deselecting everything on the graph and creating
> the
> > template again so we can take a look at the rest of the flow? or if you
> have
> > other stuff on your graph, select all of the processors you described so
> > they all get captured.
> >
> > Also, can you provide any of the stacktrace for the exception you are
> > seeing? The log is in NIFI_HOME/logs/nifi-app.log
> >
> > Thanks,
> >
> > Bryan
> >
> >
> > On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:
> >
> > Adam,
> >
> > Thanks for attaching the template, we will take a look and see what is
> going
> > on.
> >
> > Thanks,
> >
> > Bryan
> >
> >
> > On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <
> aaronfwilliams@outlook.com>
> > wrote:
> >
> > Hey Joe,
> >
> > Sure thing.  I attached the template, I'm just taking the GDELT data set
> for
> > the getFile Processor which works.  The error i get is a negative array.
> >
> >
> >
> >> Date: Mon, 21 Sep 2015 14:24:50 -0400
> >> Subject: Re: CSV to Mongo
> >> From: joe.witt@gmail.com
> >> To: users@nifi.apache.org
> >
> >>
> >> Adam,
> >>
> >> Regarding moving from Storm to NiFi i'd say they make better teammates
> >> than competitors. The use case outlines above should be quite easy
> >> for NiFi but there are analytic/processing functions Storm is probably
> >> a better answer for. We're happy to help explore that with you as you
> >> progress.
> >>
> >> If you ever run into an ArrayIndexBoundsException.. then it will
> >> always be 100% a coding error. Would you mind sending your
> >> flow.xml.gz over or making a template of the flow (assuming it
> >> contains nothing sensitive)? If at all possible sample data which
> >> exposes the issue would be ideal. As an alternative can you go ahead
> >> and send us the resulting stack trace/error that comes out?
> >>
> >> We'll get this addressed.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
> >> <aa...@outlook.com> wrote:
> >> > Hello,
> >> >
> >> > I'm moving from storm to NiFi and trying to do a simple test with
> >> > getting a
> >> > large CSV file dumped into MongoDB. The CSV file has a header with
> >> > column
> >> > names and it is structured, my only problem is dumping it into
> MongoDB.
> >> > At
> >> > a high level, do the following processor steps look correct? All i
> want
> >> > is
> >> > to just pull the whole CSV file over the MongoDB without a regex or
> >> > anything
> >> > fancy (yet). I eventually always seem to hit trouble with array index
> >> > problems with the putmongo processor:
> >> >
> >> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
> >> > PutMongo.
> >> >
> >> > Does that seem to be the right way to do this in NiFi?
> >> >
> >> > Thank you,
> >> > Adam
> >
> >
> >
> >
>

Re: CSV to Mongo

Posted by Joe Witt <jo...@gmail.com>.
Adam, Bryan,

Could do the CSV to Avro processor and then follow it with the Avro to
JSON processor.  Alternatively, could use ExtractText to pull the
fields as attributes and then use ReplaceText to produce a JSON
output.

Thanks
Joe

On Mon, Sep 21, 2015 at 4:21 PM, Adam Williams
<aa...@outlook.com> wrote:
> Bryan,
>
> Thanks for the feedback.  I stripped the ExtractText and tried routing all
> unmatched traffic to Mongo as well, hence the CSV import problems.  Off the
> top of my head i do not think MongoDB allows CSV inserts through the java
> client, we've always had to work with the JSON/document model for it.  For a
> CSV format, it would have to be similar to this idea:
> https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
>
> So looking at the other processors in NiFi, is there a way then to move from
> a CSV format to JSON before putting to Mongo?
>
> ________________________________
> Date: Mon, 21 Sep 2015 16:09:10 -0400
>
> Subject: Re: CSV to Mongo
> From: bbende@gmail.com
> To: users@nifi.apache.org
>
> Adam,
>
> I was able import the full template, thanks. A couple of things...
>
> The ExtractText processor works by adding user-defined properties  (the +
> icon in the top-right of the properties window) where the property name is a
> destination attribute and the value is a regular expression.
> Right now there weren't any regular expressions defined so that processor
> will always route the file to 'unmatched'. Generally you would probably want
> to route the matched files to the next processor, and then auto-terminate
> the unmatched relationship (assuming you want to filter out non-matches).
>
> Do you know if MongoDB supports inserting a CSV file through their Java
> client? do you have similar code that already does this in Storm?
>
> I am honestly not that familiar with MongoDB, but in the PutMongo processor
> it takes the incoming data and calls:
> Document doc = Document.parse(new String(content, charset));
>
> Looking at that Document.parse() method, it looks like it expects a JSON
> document, so I just want to make sure that we expect CSV insertions to work
> here.
> In researching this, it looks Mongo has some kind of bulkimport utility that
> handles CSV [1], but this is a command line utility.
>
> -Bryan
>
> [1] http://docs.mongodb.org/manual/reference/program/mongoimport/
>
>
> On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <aa...@outlook.com>
> wrote:
>
> Sorry about that, this should work.  Attached the template and the below
> error:
>
> 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]
> o.a.nifi.processors.mongodb.PutMongo
> PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert
> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
> into MongoDB due to java.lang.NegativeArraySizeException:
> java.lang.NegativeArraySizeException
>
> ________________________________
> Date: Mon, 21 Sep 2015 15:12:43 -0400
> Subject: Re: CSV to Mongo
> From: bbende@gmail.com
> To: users@nifi.apache.org
>
>
> Adam,
>
> I imported the template and it looks like it only captured the PutMongo
> processor. Can you try deselecting everything on the graph and creating the
> template again so we can take a look at the rest of the flow? or if you have
> other stuff on your graph, select all of the processors you described so
> they all get captured.
>
> Also, can you provide any of the stacktrace for the exception you are
> seeing? The log is in NIFI_HOME/logs/nifi-app.log
>
> Thanks,
>
> Bryan
>
>
> On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:
>
> Adam,
>
> Thanks for attaching the template, we will take a look and see what is going
> on.
>
> Thanks,
>
> Bryan
>
>
> On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <aa...@outlook.com>
> wrote:
>
> Hey Joe,
>
> Sure thing.  I attached the template, I'm just taking the GDELT data set for
> the getFile Processor which works.  The error i get is a negative array.
>
>
>
>> Date: Mon, 21 Sep 2015 14:24:50 -0400
>> Subject: Re: CSV to Mongo
>> From: joe.witt@gmail.com
>> To: users@nifi.apache.org
>
>>
>> Adam,
>>
>> Regarding moving from Storm to NiFi i'd say they make better teammates
>> than competitors. The use case outlines above should be quite easy
>> for NiFi but there are analytic/processing functions Storm is probably
>> a better answer for. We're happy to help explore that with you as you
>> progress.
>>
>> If you ever run into an ArrayIndexBoundsException.. then it will
>> always be 100% a coding error. Would you mind sending your
>> flow.xml.gz over or making a template of the flow (assuming it
>> contains nothing sensitive)? If at all possible sample data which
>> exposes the issue would be ideal. As an alternative can you go ahead
>> and send us the resulting stack trace/error that comes out?
>>
>> We'll get this addressed.
>>
>> Thanks
>> Joe
>>
>> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
>> <aa...@outlook.com> wrote:
>> > Hello,
>> >
>> > I'm moving from storm to NiFi and trying to do a simple test with
>> > getting a
>> > large CSV file dumped into MongoDB. The CSV file has a header with
>> > column
>> > names and it is structured, my only problem is dumping it into MongoDB.
>> > At
>> > a high level, do the following processor steps look correct? All i want
>> > is
>> > to just pull the whole CSV file over the MongoDB without a regex or
>> > anything
>> > fancy (yet). I eventually always seem to hit trouble with array index
>> > problems with the putmongo processor:
>> >
>> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
>> > PutMongo.
>> >
>> > Does that seem to be the right way to do this in NiFi?
>> >
>> > Thank you,
>> > Adam
>
>
>
>

RE: CSV to Mongo

Posted by Adam Williams <aa...@outlook.com>.
Bryan,
Thanks for the feedback.  I stripped the ExtractText and tried routing all unmatched traffic to Mongo as well, hence the CSV import problems.  Off the top of my head i do not think MongoDB allows CSV inserts through the java client, we've always had to work with the JSON/document model for it.  For a CSV format, it would have to be similar to this idea: https://github.com/AdoptOpenJDK/javacountdown/blob/master/src/main/java/org/adoptopenjdk/javacountdown/ImportGeoData.java
So looking at the other processors in NiFi, is there a way then to move from a CSV format to JSON before putting to Mongo?

Date: Mon, 21 Sep 2015 16:09:10 -0400
Subject: Re: CSV to Mongo
From: bbende@gmail.com
To: users@nifi.apache.org

Adam,
I was able import the full template, thanks. A couple of things...
The ExtractText processor works by adding user-defined properties  (the + icon in the top-right of the properties window) where the property name is a destination attribute and the value is a regular expression. Right now there weren't any regular expressions defined so that processor will always route the file to 'unmatched'. Generally you would probably want to route the matched files to the next processor, and then auto-terminate the unmatched relationship (assuming you want to filter out non-matches).
Do you know if MongoDB supports inserting a CSV file through their Java client? do you have similar code that already does this in Storm?
I am honestly not that familiar with MongoDB, but in the PutMongo processor it takes the incoming data and calls: Document doc = Document.parse(new String(content, charset));
Looking at that Document.parse() method, it looks like it expects a JSON document, so I just want to make sure that we expect CSV insertions to work here.In researching this, it looks Mongo has some kind of bulkimport utility that handles CSV [1], but this is a command line utility.
-Bryan
[1] http://docs.mongodb.org/manual/reference/program/mongoimport/


On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <aa...@outlook.com> wrote:



Sorry about that, this should work.  Attached the template and the below error:
2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10] o.a.nifi.processors.mongodb.PutMongo PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407] into MongoDB due to java.lang.NegativeArraySizeException: java.lang.NegativeArraySizeException

Date: Mon, 21 Sep 2015 15:12:43 -0400
Subject: Re: CSV to Mongo
From: bbende@gmail.com
To: users@nifi.apache.org

Adam, 
I imported the template and it looks like it only captured the PutMongo processor. Can you try deselecting everything on the graph and creating the template again so we can take a look at the rest of the flow? or if you have other stuff on your graph, select all of the processors you described so they all get captured.
Also, can you provide any of the stacktrace for the exception you are seeing? The log is in NIFI_HOME/logs/nifi-app.log
Thanks,
Bryan

On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:
Adam,
Thanks for attaching the template, we will take a look and see what is going on.
Thanks,
Bryan

On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <aa...@outlook.com> wrote:



Hey Joe,
Sure thing.  I attached the template, I'm just taking the GDELT data set for the getFile Processor which works.  The error i get is a negative array.


> Date: Mon, 21 Sep 2015 14:24:50 -0400
> Subject: Re: CSV to Mongo
> From: joe.witt@gmail.com
> To: users@nifi.apache.org
> 
> Adam,
> 
> Regarding moving from Storm to NiFi i'd say they make better teammates
> than competitors.  The use case outlines above should be quite easy
> for NiFi but there are analytic/processing functions Storm is probably
> a better answer for.  We're happy to help explore that with you as you
> progress.
> 
> If you ever run into an ArrayIndexBoundsException.. then it will
> always be 100% a coding error.  Would you mind sending your
> flow.xml.gz over or making a template of the flow (assuming it
> contains nothing sensitive)?  If at all possible sample data which
> exposes the issue would be ideal.  As an alternative can you go ahead
> and send us the resulting stack trace/error that comes out?
> 
> We'll get this addressed.
> 
> Thanks
> Joe
> 
> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
> <aa...@outlook.com> wrote:
> > Hello,
> >
> > I'm moving from storm to NiFi and trying to do a simple test with getting a
> > large CSV file dumped into MongoDB.  The CSV file has a header with column
> > names and it is structured, my only problem is dumping it into MongoDB.  At
> > a high level, do the following processor steps look correct?  All i want is
> > to just pull the whole CSV file over the MongoDB without a regex or anything
> > fancy (yet).  I eventually always seem to hit trouble with array index
> > problems with the putmongo processor:
> >
> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> PutMongo.
> >
> > Does that seem to be the right way to do this in NiFi?
> >
> > Thank you,
> > Adam
 		 	   		  



 		 	   		  

 		 	   		  

Re: CSV to Mongo

Posted by Bryan Bende <bb...@gmail.com>.
Adam,

I was able import the full template, thanks. A couple of things...

The ExtractText processor works by adding user-defined properties  (the +
icon in the top-right of the properties window) where the property name is
a destination attribute and the value is a regular expression.
Right now there weren't any regular expressions defined so that processor
will always route the file to 'unmatched'. Generally you would probably
want to route the matched files to the next processor, and then
auto-terminate the unmatched relationship (assuming you want to filter out
non-matches).

Do you know if MongoDB supports inserting a CSV file through their Java
client? do you have similar code that already does this in Storm?

I am honestly not that familiar with MongoDB, but in the PutMongo processor
it takes the incoming data and calls:
Document doc = Document.parse(new String(content, charset));

Looking at that Document.parse() method, it looks like it expects a JSON
document, so I just want to make sure that we expect CSV insertions to work
here.
In researching this, it looks Mongo has some kind of bulkimport utility
that handles CSV [1], but this is a command line utility.

-Bryan

[1] http://docs.mongodb.org/manual/reference/program/mongoimport/


On Mon, Sep 21, 2015 at 3:19 PM, Adam Williams <aa...@outlook.com>
wrote:

> Sorry about that, this should work.  Attached the template and the below
> error:
>
> 2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10]
> o.a.nifi.processors.mongodb.PutMongo
> PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert
> StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407]
> into MongoDB due to java.lang.NegativeArraySizeException:
> java.lang.NegativeArraySizeException
>
> ------------------------------
> Date: Mon, 21 Sep 2015 15:12:43 -0400
> Subject: Re: CSV to Mongo
> From: bbende@gmail.com
> To: users@nifi.apache.org
>
>
> Adam,
>
> I imported the template and it looks like it only captured the PutMongo
> processor. Can you try deselecting everything on the graph and creating the
> template again so we can take a look at the rest of the flow? or if you
> have other stuff on your graph, select all of the processors you described
> so they all get captured.
>
> Also, can you provide any of the stacktrace for the exception you are
> seeing? The log is in NIFI_HOME/logs/nifi-app.log
>
> Thanks,
>
> Bryan
>
>
> On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:
>
> Adam,
>
> Thanks for attaching the template, we will take a look and see what is
> going on.
>
> Thanks,
>
> Bryan
>
>
> On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <aaronfwilliams@outlook.com
> > wrote:
>
> Hey Joe,
>
> Sure thing.  I attached the template, I'm just taking the GDELT data set
> for the getFile Processor which works.  The error i get is a negative array.
>
>
>
> > Date: Mon, 21 Sep 2015 14:24:50 -0400
> > Subject: Re: CSV to Mongo
> > From: joe.witt@gmail.com
> > To: users@nifi.apache.org
>
> >
> > Adam,
> >
> > Regarding moving from Storm to NiFi i'd say they make better teammates
> > than competitors. The use case outlines above should be quite easy
> > for NiFi but there are analytic/processing functions Storm is probably
> > a better answer for. We're happy to help explore that with you as you
> > progress.
> >
> > If you ever run into an ArrayIndexBoundsException.. then it will
> > always be 100% a coding error. Would you mind sending your
> > flow.xml.gz over or making a template of the flow (assuming it
> > contains nothing sensitive)? If at all possible sample data which
> > exposes the issue would be ideal. As an alternative can you go ahead
> > and send us the resulting stack trace/error that comes out?
> >
> > We'll get this addressed.
> >
> > Thanks
> > Joe
> >
> > On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
> > <aa...@outlook.com> wrote:
> > > Hello,
> > >
> > > I'm moving from storm to NiFi and trying to do a simple test with
> getting a
> > > large CSV file dumped into MongoDB. The CSV file has a header with
> column
> > > names and it is structured, my only problem is dumping it into
> MongoDB. At
> > > a high level, do the following processor steps look correct? All i
> want is
> > > to just pull the whole CSV file over the MongoDB without a regex or
> anything
> > > fancy (yet). I eventually always seem to hit trouble with array index
> > > problems with the putmongo processor:
> > >
> > > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
> PutMongo.
> > >
> > > Does that seem to be the right way to do this in NiFi?
> > >
> > > Thank you,
> > > Adam
>
>
>
>

RE: CSV to Mongo

Posted by Adam Williams <aa...@outlook.com>.
Sorry about that, this should work.  Attached the template and the below error:
2015-09-21 14:36:02,821 ERROR [Timer-Driven Process Thread-10] o.a.nifi.processors.mongodb.PutMongo PutMongo[id=480877a4-f349-4ef7-9538-8e3e3e108e06] Failed to insert StandardFlowFileRecord[uuid=bbd7048f-d5a1-4db4-b938-da64b67e810e,claim=org.apache.nifi.controller.repository.claim.StandardContentClaim@8893ae38,offset=0,name=GDELT.MASTERREDUCEDV2.TXT,size=6581409407] into MongoDB due to java.lang.NegativeArraySizeException: java.lang.NegativeArraySizeException

Date: Mon, 21 Sep 2015 15:12:43 -0400
Subject: Re: CSV to Mongo
From: bbende@gmail.com
To: users@nifi.apache.org

Adam, 
I imported the template and it looks like it only captured the PutMongo processor. Can you try deselecting everything on the graph and creating the template again so we can take a look at the rest of the flow? or if you have other stuff on your graph, select all of the processors you described so they all get captured.
Also, can you provide any of the stacktrace for the exception you are seeing? The log is in NIFI_HOME/logs/nifi-app.log
Thanks,
Bryan

On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:
Adam,
Thanks for attaching the template, we will take a look and see what is going on.
Thanks,
Bryan

On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <aa...@outlook.com> wrote:



Hey Joe,
Sure thing.  I attached the template, I'm just taking the GDELT data set for the getFile Processor which works.  The error i get is a negative array.


> Date: Mon, 21 Sep 2015 14:24:50 -0400
> Subject: Re: CSV to Mongo
> From: joe.witt@gmail.com
> To: users@nifi.apache.org
> 
> Adam,
> 
> Regarding moving from Storm to NiFi i'd say they make better teammates
> than competitors.  The use case outlines above should be quite easy
> for NiFi but there are analytic/processing functions Storm is probably
> a better answer for.  We're happy to help explore that with you as you
> progress.
> 
> If you ever run into an ArrayIndexBoundsException.. then it will
> always be 100% a coding error.  Would you mind sending your
> flow.xml.gz over or making a template of the flow (assuming it
> contains nothing sensitive)?  If at all possible sample data which
> exposes the issue would be ideal.  As an alternative can you go ahead
> and send us the resulting stack trace/error that comes out?
> 
> We'll get this addressed.
> 
> Thanks
> Joe
> 
> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
> <aa...@outlook.com> wrote:
> > Hello,
> >
> > I'm moving from storm to NiFi and trying to do a simple test with getting a
> > large CSV file dumped into MongoDB.  The CSV file has a header with column
> > names and it is structured, my only problem is dumping it into MongoDB.  At
> > a high level, do the following processor steps look correct?  All i want is
> > to just pull the whole CSV file over the MongoDB without a regex or anything
> > fancy (yet).  I eventually always seem to hit trouble with array index
> > problems with the putmongo processor:
> >
> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> PutMongo.
> >
> > Does that seem to be the right way to do this in NiFi?
> >
> > Thank you,
> > Adam
 		 	   		  



 		 	   		  

Re: CSV to Mongo

Posted by Bryan Bende <bb...@gmail.com>.
Adam,

I imported the template and it looks like it only captured the PutMongo
processor. Can you try deselecting everything on the graph and creating the
template again so we can take a look at the rest of the flow? or if you
have other stuff on your graph, select all of the processors you described
so they all get captured.

Also, can you provide any of the stacktrace for the exception you are
seeing? The log is in NIFI_HOME/logs/nifi-app.log

Thanks,

Bryan


On Mon, Sep 21, 2015 at 3:03 PM, Bryan Bende <bb...@gmail.com> wrote:

> Adam,
>
> Thanks for attaching the template, we will take a look and see what is
> going on.
>
> Thanks,
>
> Bryan
>
>
> On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <aaronfwilliams@outlook.com
> > wrote:
>
>> Hey Joe,
>>
>> Sure thing.  I attached the template, I'm just taking the GDELT data set
>> for the getFile Processor which works.  The error i get is a negative array.
>>
>>
>>
>> > Date: Mon, 21 Sep 2015 14:24:50 -0400
>> > Subject: Re: CSV to Mongo
>> > From: joe.witt@gmail.com
>> > To: users@nifi.apache.org
>>
>> >
>> > Adam,
>> >
>> > Regarding moving from Storm to NiFi i'd say they make better teammates
>> > than competitors. The use case outlines above should be quite easy
>> > for NiFi but there are analytic/processing functions Storm is probably
>> > a better answer for. We're happy to help explore that with you as you
>> > progress.
>> >
>> > If you ever run into an ArrayIndexBoundsException.. then it will
>> > always be 100% a coding error. Would you mind sending your
>> > flow.xml.gz over or making a template of the flow (assuming it
>> > contains nothing sensitive)? If at all possible sample data which
>> > exposes the issue would be ideal. As an alternative can you go ahead
>> > and send us the resulting stack trace/error that comes out?
>> >
>> > We'll get this addressed.
>> >
>> > Thanks
>> > Joe
>> >
>> > On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
>> > <aa...@outlook.com> wrote:
>> > > Hello,
>> > >
>> > > I'm moving from storm to NiFi and trying to do a simple test with
>> getting a
>> > > large CSV file dumped into MongoDB. The CSV file has a header with
>> column
>> > > names and it is structured, my only problem is dumping it into
>> MongoDB. At
>> > > a high level, do the following processor steps look correct? All i
>> want is
>> > > to just pull the whole CSV file over the MongoDB without a regex or
>> anything
>> > > fancy (yet). I eventually always seem to hit trouble with array index
>> > > problems with the putmongo processor:
>> > >
>> > > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
>> PutMongo.
>> > >
>> > > Does that seem to be the right way to do this in NiFi?
>> > >
>> > > Thank you,
>> > > Adam
>>
>
>

Re: CSV to Mongo

Posted by Bryan Bende <bb...@gmail.com>.
Adam,

Thanks for attaching the template, we will take a look and see what is
going on.

Thanks,

Bryan


On Mon, Sep 21, 2015 at 2:50 PM, Adam Williams <aa...@outlook.com>
wrote:

> Hey Joe,
>
> Sure thing.  I attached the template, I'm just taking the GDELT data set
> for the getFile Processor which works.  The error i get is a negative array.
>
>
>
> > Date: Mon, 21 Sep 2015 14:24:50 -0400
> > Subject: Re: CSV to Mongo
> > From: joe.witt@gmail.com
> > To: users@nifi.apache.org
>
> >
> > Adam,
> >
> > Regarding moving from Storm to NiFi i'd say they make better teammates
> > than competitors. The use case outlines above should be quite easy
> > for NiFi but there are analytic/processing functions Storm is probably
> > a better answer for. We're happy to help explore that with you as you
> > progress.
> >
> > If you ever run into an ArrayIndexBoundsException.. then it will
> > always be 100% a coding error. Would you mind sending your
> > flow.xml.gz over or making a template of the flow (assuming it
> > contains nothing sensitive)? If at all possible sample data which
> > exposes the issue would be ideal. As an alternative can you go ahead
> > and send us the resulting stack trace/error that comes out?
> >
> > We'll get this addressed.
> >
> > Thanks
> > Joe
> >
> > On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
> > <aa...@outlook.com> wrote:
> > > Hello,
> > >
> > > I'm moving from storm to NiFi and trying to do a simple test with
> getting a
> > > large CSV file dumped into MongoDB. The CSV file has a header with
> column
> > > names and it is structured, my only problem is dumping it into
> MongoDB. At
> > > a high level, do the following processor steps look correct? All i
> want is
> > > to just pull the whole CSV file over the MongoDB without a regex or
> anything
> > > fancy (yet). I eventually always seem to hit trouble with array index
> > > problems with the putmongo processor:
> > >
> > > GetFile --> ExtractText --> RoutOnAttribute(not a null line) -->
> PutMongo.
> > >
> > > Does that seem to be the right way to do this in NiFi?
> > >
> > > Thank you,
> > > Adam
>

RE: CSV to Mongo

Posted by Adam Williams <aa...@outlook.com>.
Hey Joe,
Sure thing.  I attached the template, I'm just taking the GDELT data set for the getFile Processor which works.  The error i get is a negative array.


> Date: Mon, 21 Sep 2015 14:24:50 -0400
> Subject: Re: CSV to Mongo
> From: joe.witt@gmail.com
> To: users@nifi.apache.org
> 
> Adam,
> 
> Regarding moving from Storm to NiFi i'd say they make better teammates
> than competitors.  The use case outlines above should be quite easy
> for NiFi but there are analytic/processing functions Storm is probably
> a better answer for.  We're happy to help explore that with you as you
> progress.
> 
> If you ever run into an ArrayIndexBoundsException.. then it will
> always be 100% a coding error.  Would you mind sending your
> flow.xml.gz over or making a template of the flow (assuming it
> contains nothing sensitive)?  If at all possible sample data which
> exposes the issue would be ideal.  As an alternative can you go ahead
> and send us the resulting stack trace/error that comes out?
> 
> We'll get this addressed.
> 
> Thanks
> Joe
> 
> On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
> <aa...@outlook.com> wrote:
> > Hello,
> >
> > I'm moving from storm to NiFi and trying to do a simple test with getting a
> > large CSV file dumped into MongoDB.  The CSV file has a header with column
> > names and it is structured, my only problem is dumping it into MongoDB.  At
> > a high level, do the following processor steps look correct?  All i want is
> > to just pull the whole CSV file over the MongoDB without a regex or anything
> > fancy (yet).  I eventually always seem to hit trouble with array index
> > problems with the putmongo processor:
> >
> > GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> PutMongo.
> >
> > Does that seem to be the right way to do this in NiFi?
> >
> > Thank you,
> > Adam
 		 	   		  

Re: CSV to Mongo

Posted by Joe Witt <jo...@gmail.com>.
Adam,

Regarding moving from Storm to NiFi i'd say they make better teammates
than competitors.  The use case outlines above should be quite easy
for NiFi but there are analytic/processing functions Storm is probably
a better answer for.  We're happy to help explore that with you as you
progress.

If you ever run into an ArrayIndexBoundsException.. then it will
always be 100% a coding error.  Would you mind sending your
flow.xml.gz over or making a template of the flow (assuming it
contains nothing sensitive)?  If at all possible sample data which
exposes the issue would be ideal.  As an alternative can you go ahead
and send us the resulting stack trace/error that comes out?

We'll get this addressed.

Thanks
Joe

On Mon, Sep 21, 2015 at 2:17 PM, Adam Williams
<aa...@outlook.com> wrote:
> Hello,
>
> I'm moving from storm to NiFi and trying to do a simple test with getting a
> large CSV file dumped into MongoDB.  The CSV file has a header with column
> names and it is structured, my only problem is dumping it into MongoDB.  At
> a high level, do the following processor steps look correct?  All i want is
> to just pull the whole CSV file over the MongoDB without a regex or anything
> fancy (yet).  I eventually always seem to hit trouble with array index
> problems with the putmongo processor:
>
> GetFile --> ExtractText --> RoutOnAttribute(not a null line) --> PutMongo.
>
> Does that seem to be the right way to do this in NiFi?
>
> Thank you,
> Adam