You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Marcel Kornacker <ma...@cloudera.com> on 2013/09/30 22:33:08 UTC

Re: [parquet-dev] Will parquet support ETL tools like Hadoop Sqoop

Cross-posting to Sqoop dev list.

On Mon, Sep 30, 2013 at 12:03 PM, DB Tsai <db...@dbtsai.com> wrote:
> Hi parquet developers,
>
> Is there any way to use ETL tools like hadoop sqoop with parquet
> format? If not, how do users dump the data from database to hdfs to do
> further analysis now?
>
> Thanks.
>
> Sincerely,
>
> DB Tsai
> -----------------------------------
> Web: http://www.dbtsai.com
>
> --
> http://parquet.github.com/
> ---
> You received this message because you are subscribed to the Google Groups "Parquet" group.
> To post to this group, send email to parquet-dev@googlegroups.com.

Re: [parquet-dev] Will parquet support ETL tools like Hadoop Sqoop

Posted by Venkat Ranganathan <vr...@hortonworks.com>.
THe idea to use HCatalog is you are future proof and format agnostic.
RCFile, ORCFile, SequenceFile (which was not supported in exports from
Sqoop until HCatalog support) and any fileformat which has an Hive Serde
implemented.

Venkat


On Mon, Sep 30, 2013 at 9:47 PM, DB Tsai <db...@dbtsai.com> wrote:

> I knew sqoop support avro directly, so it will be nice to see that
> sqoop can support parquet since lots of organizations are using sqoop
> as ETL solution. I'll try HCat, and report back.
>
> Thanks.
>
> Sincerely,
>
> DB Tsai
> -----------------------------------
> Web: http://www.dbtsai.com
> Phone : +1-650-383-8392
>
>
> On Mon, Sep 30, 2013 at 7:44 PM, Venkat Ranganathan
> <vr...@hortonworks.com> wrote:
> > I have not explicitly tested this, but if Parquet has the hive serdes
> > written for use with Hive, it should be possible to use the HCat
> integration
> > to move data using Sqoop.
> >
> >
> https://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_sqoop_hcatalog_integration
> >
> > Thanks
> >
> >
> > On Mon, Sep 30, 2013 at 6:09 PM, Jason Altekruse <
> altekrusejason@gmail.com>
> > wrote:
> >>
> >> DB Tsai,
> >>
> >> I do not have experience with sqoop, but it looks like the process
> should
> >> be pretty straightforward. As far as I can see sqoop can only export
> >> delimited text or SequenceFile (
> >> http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_file_formats).
> >> That
> >> being said, both of these formats are readable by hive and pig. If you
> do
> >> not mind doing a two pass conversion you can use sqoop to get your data
> >> into HDFS in either of the formats and then use hive or pig to read them
> >> and re-export into parquet. Depending on your cluster setup and use
> case I
> >> would look at the various encodings and compressions offered in parquet,
> >> as
> >> these will need to be chosen when you write the files. In most cases
> >> compression will save you time reading the data.
> >>
> >> Regards,
> >> Jason Altekruse
> >>
> >>
> >>
> >>
> >> On Mon, Sep 30, 2013 at 3:33 PM, Marcel Kornacker
> >> <ma...@cloudera.com>wrote:
> >>
> >> > Cross-posting to Sqoop dev list.
> >> >
> >> > On Mon, Sep 30, 2013 at 12:03 PM, DB Tsai <db...@dbtsai.com> wrote:
> >> > > Hi parquet developers,
> >> > >
> >> > > Is there any way to use ETL tools like hadoop sqoop with parquet
> >> > > format? If not, how do users dump the data from database to hdfs to
> do
> >> > > further analysis now?
> >> > >
> >> > > Thanks.
> >> > >
> >> > > Sincerely,
> >> > >
> >> > > DB Tsai
> >> > > -----------------------------------
> >> > > Web: http://www.dbtsai.com
> >> > >
> >> > > --
> >> > > http://parquet.github.com/
> >> > > ---
> >> > > You received this message because you are subscribed to the Google
> >> > Groups "Parquet" group.
> >> > > To post to this group, send email to parquet-dev@googlegroups.com.
> >> >
> >> > --
> >> > http://parquet.github.com/
> >> > ---
> >> > You received this message because you are subscribed to the Google
> >> > Groups
> >> > "Parquet" group.
> >> > To post to this group, send email to parquet-dev@googlegroups.com.
> >> >
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the
> reader of
> > this message is not the intended recipient, you are hereby notified that
> any
> > printing, copying, dissemination, distribution, disclosure or forwarding
> of
> > this communication is strictly prohibited. If you have received this
> > communication in error, please contact the sender immediately and delete
> it
> > from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [parquet-dev] Will parquet support ETL tools like Hadoop Sqoop

Posted by DB Tsai <db...@dbtsai.com>.
I knew sqoop support avro directly, so it will be nice to see that
sqoop can support parquet since lots of organizations are using sqoop
as ETL solution. I'll try HCat, and report back.

Thanks.

Sincerely,

DB Tsai
-----------------------------------
Web: http://www.dbtsai.com
Phone : +1-650-383-8392


On Mon, Sep 30, 2013 at 7:44 PM, Venkat Ranganathan
<vr...@hortonworks.com> wrote:
> I have not explicitly tested this, but if Parquet has the hive serdes
> written for use with Hive, it should be possible to use the HCat integration
> to move data using Sqoop.
>
> https://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_sqoop_hcatalog_integration
>
> Thanks
>
>
> On Mon, Sep 30, 2013 at 6:09 PM, Jason Altekruse <al...@gmail.com>
> wrote:
>>
>> DB Tsai,
>>
>> I do not have experience with sqoop, but it looks like the process should
>> be pretty straightforward. As far as I can see sqoop can only export
>> delimited text or SequenceFile (
>> http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_file_formats).
>> That
>> being said, both of these formats are readable by hive and pig. If you do
>> not mind doing a two pass conversion you can use sqoop to get your data
>> into HDFS in either of the formats and then use hive or pig to read them
>> and re-export into parquet. Depending on your cluster setup and use case I
>> would look at the various encodings and compressions offered in parquet,
>> as
>> these will need to be chosen when you write the files. In most cases
>> compression will save you time reading the data.
>>
>> Regards,
>> Jason Altekruse
>>
>>
>>
>>
>> On Mon, Sep 30, 2013 at 3:33 PM, Marcel Kornacker
>> <ma...@cloudera.com>wrote:
>>
>> > Cross-posting to Sqoop dev list.
>> >
>> > On Mon, Sep 30, 2013 at 12:03 PM, DB Tsai <db...@dbtsai.com> wrote:
>> > > Hi parquet developers,
>> > >
>> > > Is there any way to use ETL tools like hadoop sqoop with parquet
>> > > format? If not, how do users dump the data from database to hdfs to do
>> > > further analysis now?
>> > >
>> > > Thanks.
>> > >
>> > > Sincerely,
>> > >
>> > > DB Tsai
>> > > -----------------------------------
>> > > Web: http://www.dbtsai.com
>> > >
>> > > --
>> > > http://parquet.github.com/
>> > > ---
>> > > You received this message because you are subscribed to the Google
>> > Groups "Parquet" group.
>> > > To post to this group, send email to parquet-dev@googlegroups.com.
>> >
>> > --
>> > http://parquet.github.com/
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Parquet" group.
>> > To post to this group, send email to parquet-dev@googlegroups.com.
>> >
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.

Re: [parquet-dev] Will parquet support ETL tools like Hadoop Sqoop

Posted by Venkat Ranganathan <vr...@hortonworks.com>.
I have not explicitly tested this, but if Parquet has the hive serdes
written for use with Hive, it should be possible to use the HCat
integration  to move data using Sqoop.

https://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_sqoop_hcatalog_integration

Thanks


On Mon, Sep 30, 2013 at 6:09 PM, Jason Altekruse
<al...@gmail.com>wrote:

> DB Tsai,
>
> I do not have experience with sqoop, but it looks like the process should
> be pretty straightforward. As far as I can see sqoop can only export
> delimited text or SequenceFile (
> http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_file_formats).
> That
> being said, both of these formats are readable by hive and pig. If you do
> not mind doing a two pass conversion you can use sqoop to get your data
> into HDFS in either of the formats and then use hive or pig to read them
> and re-export into parquet. Depending on your cluster setup and use case I
> would look at the various encodings and compressions offered in parquet, as
> these will need to be chosen when you write the files. In most cases
> compression will save you time reading the data.
>
> Regards,
> Jason Altekruse
>
>
>
>
> On Mon, Sep 30, 2013 at 3:33 PM, Marcel Kornacker <marcel@cloudera.com
> >wrote:
>
> > Cross-posting to Sqoop dev list.
> >
> > On Mon, Sep 30, 2013 at 12:03 PM, DB Tsai <db...@dbtsai.com> wrote:
> > > Hi parquet developers,
> > >
> > > Is there any way to use ETL tools like hadoop sqoop with parquet
> > > format? If not, how do users dump the data from database to hdfs to do
> > > further analysis now?
> > >
> > > Thanks.
> > >
> > > Sincerely,
> > >
> > > DB Tsai
> > > -----------------------------------
> > > Web: http://www.dbtsai.com
> > >
> > > --
> > > http://parquet.github.com/
> > > ---
> > > You received this message because you are subscribed to the Google
> > Groups "Parquet" group.
> > > To post to this group, send email to parquet-dev@googlegroups.com.
> >
> > --
> > http://parquet.github.com/
> > ---
> > You received this message because you are subscribed to the Google Groups
> > "Parquet" group.
> > To post to this group, send email to parquet-dev@googlegroups.com.
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [parquet-dev] Will parquet support ETL tools like Hadoop Sqoop

Posted by Jason Altekruse <al...@gmail.com>.
DB Tsai,

I do not have experience with sqoop, but it looks like the process should
be pretty straightforward. As far as I can see sqoop can only export
delimited text or SequenceFile (
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_file_formats). That
being said, both of these formats are readable by hive and pig. If you do
not mind doing a two pass conversion you can use sqoop to get your data
into HDFS in either of the formats and then use hive or pig to read them
and re-export into parquet. Depending on your cluster setup and use case I
would look at the various encodings and compressions offered in parquet, as
these will need to be chosen when you write the files. In most cases
compression will save you time reading the data.

Regards,
Jason Altekruse




On Mon, Sep 30, 2013 at 3:33 PM, Marcel Kornacker <ma...@cloudera.com>wrote:

> Cross-posting to Sqoop dev list.
>
> On Mon, Sep 30, 2013 at 12:03 PM, DB Tsai <db...@dbtsai.com> wrote:
> > Hi parquet developers,
> >
> > Is there any way to use ETL tools like hadoop sqoop with parquet
> > format? If not, how do users dump the data from database to hdfs to do
> > further analysis now?
> >
> > Thanks.
> >
> > Sincerely,
> >
> > DB Tsai
> > -----------------------------------
> > Web: http://www.dbtsai.com
> >
> > --
> > http://parquet.github.com/
> > ---
> > You received this message because you are subscribed to the Google
> Groups "Parquet" group.
> > To post to this group, send email to parquet-dev@googlegroups.com.
>
> --
> http://parquet.github.com/
> ---
> You received this message because you are subscribed to the Google Groups
> "Parquet" group.
> To post to this group, send email to parquet-dev@googlegroups.com.
>