You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Prashanth R <r....@gmail.com> on 2011/04/11 23:09:39 UTC

External table creation question

Hi,

The Hive documentation describes keyword "external" as following:

The EXTERNAL keyword lets you create a table and provide a LOCATION so that
Hive does not use a default location for this table. This comes in handy if
you already have data generated.

I have my data available in a directory in a bucket in s3. I am trying to
create a table like

CREATE EXTERNAL TABLE IF NOT EXISTS mslog ( TIME_STAMP STRING, SEQ
STRING) LOCATION 's3:// <bucket name>/processed/'

But the table isnt' populated with the data available at the s3 location. Am
i missing something here?


-- 
- Prash

Re: External table creation question

Posted by Avram Aelony <Av...@eharmony.com>.

Hi Prash,

Try this:

create external table mslog 
(  
   time_stamp string,
   seq string
) row format delimited fields terminated by '\t' stored as textfile location 's3://your/bucket/path/'
;

Important: your s3 bucket can only contain files that have the same schema format. Hive doesn't like it when the bucket contains files with a mixture of different columns.
Also, check your logs if you don't think your data was successfully read.

Hope this helps,
~Avram


On Apr 11, 2011, at 2:09 PM, Prashanth R wrote:

> Hi,
> 
> The Hive documentation describes keyword "external" as following:
> 
> The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated.
> 
> I have my data available in a directory in a bucket in s3. I am trying to create a table like 
> 
> CREATE EXTERNAL TABLE IF NOT EXISTS mslog ( TIME_STAMP STRING, SEQ STRING) LOCATION 's3:// <bucket name>/processed/'
> 
> But the table isnt' populated with the data available at the s3 location. Am i missing something here?
> 
> 
> -- 
> - Prash

RE: External table creation question

Posted by "Christopher, Pat" <pa...@hp.com>.

Prash,

1.      You probably want to use the s3n filesystem, not the s3 one.  If you use s3 you need to manage your file blocks manually.  Swap it over to s3n, way easier.

2.      This could be hive failing to read the files.  Hive is probably assuming that there are no readable files in 'processed' so its saying you have no data.  Is the data compressed?  If so, s3 file names need to end in gz/bzip/etc

Pat

From: Prashanth R [mailto:r.prashanth@gmail.com]
Sent: Monday, April 11, 2011 2:10 PM
To: user@hive.apache.org
Subject: External table creation question

Hi,

The Hive documentation describes keyword "external" as following:

The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated.

I have my data available in a directory in a bucket in s3. I am trying to create a table like

CREATE EXTERNAL TABLE IF NOT EXISTS mslog ( TIME_STAMP STRING, SEQ STRING) LOCATION 's3:// <bucket name>/processed/'

But the table isnt' populated with the data available at the s3 location. Am i missing something here?


--
- Prash