You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Kumar Jayapal <kj...@gmail.com> on 2015/05/28 00:40:19 UTC

how to use --as-parquetfile in sqoop import

Hi,

Can I use --as-parquetfile  argument while importing data to Hive?

I have check the site

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_basic_usage

I don't see this option any place mentioned.









Thanks
Jay

Re: how to use --as-parquetfile in sqoop import

Posted by Joshua Baxter <jo...@gmail.com>.

I've also found the parquet as an output format doesn't work properly with
hcat import and doesn't deal with timestamp or decimal types without
crashing. This was using 1.4.5 and 1.4.6 client and cdh 5.3.

Even when just importing as text fields to hdfs folders, ending up with
files of a suitable size (parquet files aren't splittable) that wont
require rewriting to redistribute evenly is a guessing game. You need to
know how much data you expect to pull out, how many mappers that means you
should specify and you will also be run into problems if the split by
column does not have even distribution, unless you are using the oraoop
connector in which case splitting by a column is unnecessary and
distribution is pretty uniform.

We found the most hastle free method for us to pull from an oracle db was
to do a hcatalog import as sequence file, which correctly mapped the data
types, and then do a insert or create table select *  from the imported
table, converting to parquet at this point instead. Impala is convenient
for the second step if available as it manages parquet output file sizes
without any effort regardless of input data or requested output compression
type.

Jish
On 28 May 2015 01:49, "Brett Medalen" <bm...@hotmail.com> wrote:

> Not available until Sqoop 1.4.5 or 1.4.6
>
> On May 27, 2015, at 6:40 PM, Kumar Jayapal <kj...@gmail.com> wrote:
>
> Hi,
>
> Can I use --as-parquetfile  argument while importing data to Hive?
>
> I have check the site
>
> https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_basic_usage
>
> I don't see this option any place mentioned.
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
>

Re: how to use --as-parquetfile in sqoop import

Posted by Brett Medalen <bm...@hotmail.com>.

Not available until Sqoop 1.4.5 or 1.4.6

> On May 27, 2015, at 6:40 PM, Kumar Jayapal <kj...@gmail.com> wrote:
> 
> Hi,
> 
> Can I use --as-parquetfile  argument while importing data to Hive?
> 
> I have check the site 
> 
> https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_basic_usage 
> 
> I don't see this option any place mentioned.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks
> Jay

RE: how to use --as-parquetfile in sqoop import

Posted by "Xu, Qian A" <qi...@intel.com>.

Yes. This is a new feature in 1.4.6.

Please checkout
http://ingest.tips/2015/05/20/a-roundtrip-from-mysql-to-hive/ or
http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html


Thanks
Stanley (Xu, Qian)

From: Kumar Jayapal [mailto:kjayapal17@gmail.com]
Sent: Thursday, May 28, 2015 6:40 AM
To: user@sqoop.apache.org; user@hadoop.apache.org
Subject: how to use --as-parquetfile in sqoop import

Hi,

Can I use --as-parquetfile  argument while importing data to Hive?

I have check the site

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_basic_usage

I don't see this option any place mentioned.









Thanks
Jay