You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Balaji Rao <sb...@gmail.com> on 2012/03/08 17:54:03 UTC
HIVE and S3
Hi,
I had to post this question to this list because I feel there might
be a bug here.
I'm having problems with HIVE- EC2 reading files on S3 written by other tools
I have a lot of files and folders on S3 created by s3cmd and utilized
by Elastic MapReduce (HIVE) and they work interchangeably, files
created by HIVE-EMR can be read by s3cmd and vice versa. However, I'm
having problems with HIVE/Hadoop running on EC2. Both Hive 0.7 and 0.8
seem to create an additional folder "/" on S3
For example, if I have a file s3://bucket/path/00000 created by s3cmd
or HIVE-EMR and I try to create an external table on HIVE- EC2
create external table wc(site string, cnt int) row format delimited
fields terminated by '\t' stored as textfile location
's3://bucket/path'
This does not recognize the EMR created s3 folders, instead I see a
new folder "/"
<bucket> / "/" / path
When I look at the debug information, HIVE seems to be sending an
extra "/" when creating a table
Here is a debug message and if you see the path, there is a "/" and a
"%2f". Probably a bug in the code ?
hive> create external table wc(site string, cnt int) .... location
's3://masked/wcoverlay/';
<StringToSign>GETWed, 07 Mar 2012 18:26:03
GMT/masked/%2Fwcoverlay</StringToSign><AWSAccessKeyId>.....
Am I missing something?
Thanks,
Balaji