You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Raimon Bosch <ra...@gmail.com> on 2011/11/11 19:40:12 UTC

load data from s3 to hive

Hi,

I have read that hadoop supports native operations over S3 Filesystem so
you're able to perform operations like:

hadoop fs -ls s3n://mybucket/my_folder/

or:

hadoop fs -copy s3n://mybucket/my_folder /tmp/my_folder

I'm wondering why hive is not able to perform similar operations. It would
be a very good feature load data directly from S3 to Hive, something like:

hive -e "LOAD DATA LOCAL INPATH 's3n://mybucket/my_hive_table' INTO
TABLE my_hive_table PARTITION(dt='2011-11-11');"

Right now this is not possible. What do you think? Which classes should be
changed?

Re: load data from s3 to hive

Posted by Florin Diaconeasa <fl...@gmail.com>.

Hello,

1st of all hadoop needs to use S3 as primary file system. So inside hadoop
configuration core-site.xml you need to set fs.default.name with a value of
the following form: s3n://your-bucket-name

After this, the way i've done it in hive 0.6 and i assume it still
works: alter table my_table add partition (p1=a,p2=b) location
"s3n://your-bucket-name/path-to-folder-for-partition"

This worked for me without any issues. I assume the other way you provided
should work as well, but probably there is an issue with the evaluation of
the query...

Florin



On 11 November 2011 22:11, jiang licht <li...@yahoo.com> wrote:

> Check if this link provides any help:
>
> http://aws.amazon.com/elasticmapreduce/faqs/#hive-2
>
> read " Are there new features in Hive specific to Amazon Elastic
> MapReduce?"
>
> and
>
> http://aws.amazon.com/articles/2856
>
> Best regards,
> Michael
> ------------------------------
> *From:* Raimon Bosch <ra...@gmail.com>
> *To:* user@hive.apache.org
> *Sent:* Friday, November 11, 2011 10:40 AM
> *Subject:* load data from s3 to hive
>
>
> Hi,
>
> I have read that hadoop supports native operations over S3 Filesystem so
> you're able to perform operations like:
>
> hadoop fs -ls s3n://mybucket/my_folder/
>
> or:
>
> hadoop fs -copy s3n://mybucket/my_folder /tmp/my_folder
>
> I'm wondering why hive is not able to perform similar operations. It would
> be a very good feature load data directly from S3 to Hive, something like:
>
> hive -e "LOAD DATA LOCAL INPATH 's3n://mybucket/my_hive_table' INTO
> TABLE my_hive_table PARTITION(dt='2011-11-11');"
>
> Right now this is not possible. What do you think? Which classes should be
> changed?
>
>
>


-- 


Florin

Re: load data from s3 to hive

Posted by jiang licht <li...@yahoo.com>.

Check if this link provides any help:

http://aws.amazon.com/elasticmapreduce/faqs/#hive-2

read " Are there new features in Hive specific to Amazon Elastic MapReduce?"

and 

http://aws.amazon.com/articles/2856
 

Best regards,
Michael


________________________________
From: Raimon Bosch <ra...@gmail.com>
To: user@hive.apache.org
Sent: Friday, November 11, 2011 10:40 AM
Subject: load data from s3 to hive




Hi,

I have read that hadoop supports native operations over S3 Filesystem so you're able to perform operations like:

hadoop fs -ls s3n://mybucket/my_folder/

or:

hadoop fs -copy s3n://mybucket/my_folder /tmp/my_folder

I'm wondering why hive is not able to perform similar operations. It would be a very good feature load data directly from S3 to Hive, something like:

hive -e "LOAD DATA LOCAL INPATH 's3n://mybucket/my_hive_table' INTO TABLE my_hive_table PARTITION(dt='2011-11-11');"

Right now this is not possible. What do you think? Which classes should be changed?