You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Balaji Rao <sb...@gmail.com> on 2012/03/08 17:54:03 UTC

HIVE and S3

Hi,
   I had to post this question to this list because I feel there might
be a bug here.

I'm having problems with HIVE- EC2 reading files on S3 written by other tools

 I have a lot of files and folders on S3 created by s3cmd and utilized
by Elastic MapReduce (HIVE) and they work interchangeably, files
created by HIVE-EMR can be read by s3cmd and vice versa. However, I'm
having problems with HIVE/Hadoop running on EC2. Both Hive 0.7 and 0.8
seem to create an additional folder "/" on S3

 For example, if I have a file s3://bucket/path/00000 created by s3cmd
or HIVE-EMR and I try to create an external table on HIVE- EC2

 create external table wc(site string, cnt int) row format delimited
fields terminated by '\t' stored as textfile location
's3://bucket/path'

This does not recognize the EMR created s3 folders, instead I see a
new folder "/"

 <bucket> / "/" / path


When I look at the debug information, HIVE seems to be sending an
extra "/" when creating a table

Here is a debug message and if you see the path, there is a "/" and a
"%2f". Probably a bug in the code ?

hive> create external table wc(site string, cnt int) .... location
's3://masked/wcoverlay/';

      <StringToSign>GETWed, 07 Mar 2012 18:26:03
GMT/masked/%2Fwcoverlay</StringToSign><AWSAccessKeyId>.....


Am I missing something?

Thanks,
Balaji