You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by pradeepbill <pr...@gmail.com> on 2016/05/11 13:09:48 UTC

parquet format

Hi there,how can I write file to HDFS in parquet format ?.Please advice 

Thanks 
Pradeep



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by pradeepbill <pr...@gmail.com>.
done Joe.



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10178.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by pradeepbill <pr...@gmail.com>.
Glad Ricky's guidance got you passed that point.  You can now use 
UpdateAttribute processor before the PutHDFS processor.  Here is a 
format string to do what you mention.  Create a property in update 
attribute called 
'hadoop.dir' and give it a value of 
'/somepath/${now():format('yyyy/MM/dd/HH')}/' 



pradeepbill wrote
> Hi Joe , sorry for the trouble , I did subscribe now, can you see me
> subscribed ?
> 
> Thanks 
> Pradeep
> 
> Sent from my iPhone
> 
>> On May 13, 2016, at 10:21 AM, Joe Witt [via Apache NiFi Developer List]
>> &lt;

> ml-node+s39713n10237h89@.nabble

> &gt; wrote:
>> 
>> Pradeep, 
>> 
>> Note: You are still not subscribed to the mailing list.  Still 
>> requires manual intervention.  Can you forward the email that was sent 
>> to subscribe to the mailing list to '[hidden email]' please.  If 
>> it contained certain formatting it may have been rejected. 
>> 
>> Glad Ricky's guidance got you passed that point.  You can now use 
>> UpdateAttribute processor before the PutHDFS processor.  Here is a 
>> format string to do what you mention.  Create a property in update 
>> attribute called 
>> 'hadoop.dir' and give it a value of 
>> '/somepath/${now():format('yyyy/MM/dd/HH')}/' 
>> 
>> Then in PutHDFS set the 'directory' property to a value of
>> '${hadoop.dir}' 
>> 
>> Thanks 
>> Joe 
>> 
>> On Fri, May 13, 2016 at 11:13 AM, pradeepbill <[hidden email]> wrote:
>> 
>> > hi Ricky, quick update, I was able to save files in parquet
>> format.Followed 
>> > your steps totally.Thanks for the  help.Also , how can I save the files
>> in 
>> > hdfs in year/month/day folders, like files generated on 2016/05/13,
>> should 
>> > go into respective folders etc.PutHDFS can do that ? 
>> > 
>> > Thanks 
>> > PRadeep 
>> > 
>> > 
>> > 
>> > -- 
>> > View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10236.html
>> > Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com. 
>> 
>> 
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10237.html
>> To unsubscribe from parquet format, click here.
>> NAML

Joe, The problem i see here is , the parquet files are saved in the hdfs
folder where the dataset is located , how do I tell StoreInKiteDataset to
move them ?, I dont see any property for that.





--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10241.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by Joe Witt <jo...@gmail.com>.
Nope.  Had to manually approve.

When you subscribe what is actually happening is the sending of an
email.  Check the send folder and see if anything is there.  If so
please forward and I will try to see what is happening.

Thanks
Joe

On Fri, May 13, 2016 at 11:37 AM, pradeepbill <pr...@gmail.com> wrote:
> Hi Joe , sorry for the trouble , I did subscribe now, can you see me subscribed ?
>
> Thanks
> Pradeep
>
> Sent from my iPhone
>
>> On May 13, 2016, at 10:21 AM, Joe Witt [via Apache NiFi Developer List] <ml...@n7.nabble.com> wrote:
>>
>> Pradeep,
>>
>> Note: You are still not subscribed to the mailing list.  Still
>> requires manual intervention.  Can you forward the email that was sent
>> to subscribe to the mailing list to '[hidden email]' please.  If
>> it contained certain formatting it may have been rejected.
>>
>> Glad Ricky's guidance got you passed that point.  You can now use
>> UpdateAttribute processor before the PutHDFS processor.  Here is a
>> format string to do what you mention.  Create a property in update
>> attribute called
>> 'hadoop.dir' and give it a value of
>> '/somepath/${now():format('yyyy/MM/dd/HH')}/'
>>
>> Then in PutHDFS set the 'directory' property to a value of '${hadoop.dir}'
>>
>> Thanks
>> Joe
>>
>> On Fri, May 13, 2016 at 11:13 AM, pradeepbill <[hidden email]> wrote:
>>
>> > hi Ricky, quick update, I was able to save files in parquet format.Followed
>> > your steps totally.Thanks for the  help.Also , how can I save the files in
>> > hdfs in year/month/day folders, like files generated on 2016/05/13, should
>> > go into respective folders etc.PutHDFS can do that ?
>> >
>> > Thanks
>> > PRadeep
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10236.html
>> > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
>>
>>
>> If you reply to this email, your message will be added to the discussion below:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10237.html
>> To unsubscribe from parquet format, click here.
>> NAML
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10238.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by pradeepbill <pr...@gmail.com>.
Hi Joe , sorry for the trouble , I did subscribe now, can you see me subscribed ?

Thanks 
Pradeep

Sent from my iPhone

> On May 13, 2016, at 10:21 AM, Joe Witt [via Apache NiFi Developer List] <ml...@n7.nabble.com> wrote:
> 
> Pradeep, 
> 
> Note: You are still not subscribed to the mailing list.  Still 
> requires manual intervention.  Can you forward the email that was sent 
> to subscribe to the mailing list to '[hidden email]' please.  If 
> it contained certain formatting it may have been rejected. 
> 
> Glad Ricky's guidance got you passed that point.  You can now use 
> UpdateAttribute processor before the PutHDFS processor.  Here is a 
> format string to do what you mention.  Create a property in update 
> attribute called 
> 'hadoop.dir' and give it a value of 
> '/somepath/${now():format('yyyy/MM/dd/HH')}/' 
> 
> Then in PutHDFS set the 'directory' property to a value of '${hadoop.dir}' 
> 
> Thanks 
> Joe 
> 
> On Fri, May 13, 2016 at 11:13 AM, pradeepbill <[hidden email]> wrote:
> 
> > hi Ricky, quick update, I was able to save files in parquet format.Followed 
> > your steps totally.Thanks for the  help.Also , how can I save the files in 
> > hdfs in year/month/day folders, like files generated on 2016/05/13, should 
> > go into respective folders etc.PutHDFS can do that ? 
> > 
> > Thanks 
> > PRadeep 
> > 
> > 
> > 
> > -- 
> > View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10236.html
> > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com. 
> 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10237.html
> To unsubscribe from parquet format, click here.
> NAML




--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10238.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by Joe Witt <jo...@gmail.com>.
Pradeep,

Note: You are still not subscribed to the mailing list.  Still
requires manual intervention.  Can you forward the email that was sent
to subscribe to the mailing list to 'joewitt@apache.org' please.  If
it contained certain formatting it may have been rejected.

Glad Ricky's guidance got you passed that point.  You can now use
UpdateAttribute processor before the PutHDFS processor.  Here is a
format string to do what you mention.  Create a property in update
attribute called
'hadoop.dir' and give it a value of
'/somepath/${now():format('yyyy/MM/dd/HH')}/'

Then in PutHDFS set the 'directory' property to a value of '${hadoop.dir}'

Thanks
Joe

On Fri, May 13, 2016 at 11:13 AM, pradeepbill <pr...@gmail.com> wrote:
> hi Ricky, quick update, I was able to save files in parquet format.Followed
> your steps totally.Thanks for the  help.Also , how can I save the files in
> hdfs in year/month/day folders, like files generated on 2016/05/13, should
> go into respective folders etc.PutHDFS can do that ?
>
> Thanks
> PRadeep
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10236.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by pradeepbill <pr...@gmail.com>.
hi Ricky, quick update, I was able to save files in parquet format.Followed 
your steps totally.Thanks for the  help.Also , how can I save the files in
hdfs in year/month/day folders, like files generated on 2016/05/13, should
go into respective folders etc.PutHDFS can do that ?

Thanks
PRadeep



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10236.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by Ricky Saltzer <ri...@cloudera.com>.
great, I'd love to here if you're successful with this or not.

On Wed, May 11, 2016 at 1:10 PM, pradeepbill <pr...@gmail.com> wrote:

> Thanks for the explanation Ricky, I think I got the idea, will proceed with
> this approach and see how it works.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10186.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>



-- 
Ricky Saltzer
http://www.cloudera.com

Re: parquet format

Posted by pradeepbill <pr...@gmail.com>.
Thanks for the explanation Ricky, I think I got the idea, will proceed with
this approach and see how it works.



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10186.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by Ricky Saltzer <ri...@cloudera.com>.
You essentially need to create a dataset in HDFS using the kite-dataset
tool (http://kitesdk.org/docs/1.1.0/cli-reference.html#create). You use
Avro to define your schema, and then you tell Kite that you want the data
to be in Parquet format.

You will use the StoreInKiteDataset processor to write the data. Keep in
mind that the data must be given to the processor as Avro records. You can
use other processors (JSONToAvro, CSVToAvro, etc) to marshal your data into
that format.

*Example:*
*-- Schema (user.avsc)*

{"namespace": "user.avsc",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

*-- Create Dataset*

The example below creates a dataset locally on your disk for testing,
however you can sub the location's "file://" URI to "hdfs://" to
specify that you want the dataset in HDFS when you're done testing.

./kite-dataset create users --schema user.avsc --format parquet
--location file:///tmp/parquet_users


On Wed, May 11, 2016 at 1:04 PM, Joe Witt <jo...@gmail.com> wrote:

> Hello - can you please register for the dev@nifi.apache.org mailing
> list.  Otherwise I am having to manually approve each email which can
> result in delays.
>
> Just go here to do so: https://nifi.apache.org/mailing_lists.html
>
> Thanks
> Joe
>
> On Wed, May 11, 2016 at 2:45 PM, pradeepbill <pr...@gmail.com>
> wrote:
> > thanks Ricky, I am a starter here, can you point me to a link please.An
> > example would help greatly.
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10168.html
> > Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>



-- 
Ricky Saltzer
http://www.cloudera.com

Re: parquet format

Posted by Joe Witt <jo...@gmail.com>.
Hello - can you please register for the dev@nifi.apache.org mailing
list.  Otherwise I am having to manually approve each email which can
result in delays.

Just go here to do so: https://nifi.apache.org/mailing_lists.html

Thanks
Joe

On Wed, May 11, 2016 at 2:45 PM, pradeepbill <pr...@gmail.com> wrote:
> thanks Ricky, I am a starter here, can you point me to a link please.An
> example would help greatly.
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10168.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by pradeepbill <pr...@gmail.com>.
thanks Ricky, I am a starter here, can you point me to a link please.An
example would help greatly.




--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10168.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: parquet format

Posted by Ricky Saltzer <ri...@cloudera.com>.
If I'm not mistaken you should be able to do this using the Kite processor.
If your Kite dataset in HDFS is parquet, then Kite should automatically
write it using Parquet.

On Wed, May 11, 2016 at 11:30 AM, Joe Witt <jo...@gmail.com> wrote:

> also responded to the same question on SO:
>
> http://stackoverflow.com/questions/37149331/apache-nifi-hdfs-parquet-format/37170672#37170672
>
> On Wed, May 11, 2016 at 12:32 PM, Bryan Bende <bb...@gmail.com> wrote:
> > Hi Pradeep,
> >
> > Currently there isn't anything in NiFi that produces parquet format,
> > although it has been mentioned before.
> >
> > The HDFS processors just write the bytes of the FlowFile to HDFS, so we
> > would need an a processor before that that was able to produce parquet.
> >
> > If you have experience with parquet and are interested in contributing,
> let
> > us know, we'd be happy to help guide the process.
> >
> > -Bryan
> >
> > On Wed, May 11, 2016 at 9:09 AM, pradeepbill <pr...@gmail.com>
> wrote:
> >
> >> Hi there,how can I write file to HDFS in parquet format ?.Please advice
> >>
> >> Thanks
> >> Pradeep
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145.html
> >> Sent from the Apache NiFi Developer List mailing list archive at
> >> Nabble.com.
> >>
>



-- 
Ricky Saltzer
http://www.cloudera.com

Re: parquet format

Posted by Joe Witt <jo...@gmail.com>.
also responded to the same question on SO:
http://stackoverflow.com/questions/37149331/apache-nifi-hdfs-parquet-format/37170672#37170672

On Wed, May 11, 2016 at 12:32 PM, Bryan Bende <bb...@gmail.com> wrote:
> Hi Pradeep,
>
> Currently there isn't anything in NiFi that produces parquet format,
> although it has been mentioned before.
>
> The HDFS processors just write the bytes of the FlowFile to HDFS, so we
> would need an a processor before that that was able to produce parquet.
>
> If you have experience with parquet and are interested in contributing, let
> us know, we'd be happy to help guide the process.
>
> -Bryan
>
> On Wed, May 11, 2016 at 9:09 AM, pradeepbill <pr...@gmail.com> wrote:
>
>> Hi there,how can I write file to HDFS in parquet format ?.Please advice
>>
>> Thanks
>> Pradeep
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>

Re: parquet format

Posted by Bryan Bende <bb...@gmail.com>.
Hi Pradeep,

Currently there isn't anything in NiFi that produces parquet format,
although it has been mentioned before.

The HDFS processors just write the bytes of the FlowFile to HDFS, so we
would need an a processor before that that was able to produce parquet.

If you have experience with parquet and are interested in contributing, let
us know, we'd be happy to help guide the process.

-Bryan

On Wed, May 11, 2016 at 9:09 AM, pradeepbill <pr...@gmail.com> wrote:

> Hi there,how can I write file to HDFS in parquet format ?.Please advice
>
> Thanks
> Pradeep
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: parquet format

Posted by pradeepbill <pr...@gmail.com>.
Thanks Bryan/Joe, thanks for the reply, I will go ahead with nifi->spark
streaming for now.I need to do a proof of concept here.

Thanks
Pradeep



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-tp10145p10167.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.