You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Tony Burton <TB...@SportingIndex.com> on 2013/03/26 18:11:17 UTC

S3/EMR Hive: Load contents of a single file

Hi list,

I've been using hive to perform queries on data hosted on AWS S3, and my tables point at data by specifying the directory in which the data is stored, eg

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data';

where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data/src1.txt';

and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile

Ok, no problem. Now lets load the data

$ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;

(referring to https://cwiki.apache.org/Hive/languagemanual-dml.html - "...filepath can refer to a file (in which case hive will move the file into the table)")

Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to: hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."

So I check my default.fs.name and hive.metastore.warehouse.dir (which have never caused problems before):

$ set fs.default.name;
fs.default.name=hdfs://10.48.97.97:9000
$ set hive.metastore.warehouse.dir;
hive.metastore.warehouse.dir=/mnt/hive_081/warehouse

Clearly different, but which is correct? Is there an easier way to load a single file into a hive table? Or should I just put each file in a directory and proceed as before?

Thanks!

Tony







Tony Burton
Senior Software Engineer
e: tburton@sportingindex.com<ma...@sportingindex.com>

[cid:image001.png@01CE2A44.EDE53F90]<http://www.sportingsolutions.com/>

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

Re: S3/EMR Hive: Load contents of a single file

Posted by Ramki Palle <ra...@gmail.com>.

First of all, you cannot point a table to a file. Each table will have a
corresponding table. If you want to have all the in the table contains in
only one file, simply copy that one file into the directory. The table does
not need to know the name of the file. It only matters whether the
structure of the data in the file is similar to the table structure.

When you query the table, it gets the data from whatever files are there
from the corresponding directory.

Regards,
Ramki.


On Tue, Mar 26, 2013 at 10:11 AM, Tony Burton <TB...@sportingindex.com>wrote:

> Hi list,****
>
> ** **
>
> I've been using hive to perform queries on data hosted on AWS S3, and my
> tables point at data by specifying the directory in which the data is
> stored, eg ****
>
> ** **
>
> $ create external table myData (str1 string, str2 string, count1 int)
> partitioned by <snip> row format <snip> stored as textfile location
> 's3://mybucket/path/to/data';****
>
> ** **
>
> where s3://mybucket/path/to/data is the "directory" that contains the
> files I'm interested in. My use case now is to create a table with data
> pointing to a specifc file in a directory:****
>
> ** **
>
> $ create external table myData (str1 string, str2 string, count1 int)
> partitioned by <snip> row format <snip> stored as textfile location
> 's3://mybucket/path/to/data/src1.txt';****
>
>             ****
>
> and I get the error: "FAILED: Error in metadata: MetaException(message:Got
> exception: java.io.IOException Can't make directory for path
> 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok,
> lets try to create the table without specifying the data source:****
>
> ** **
>
> $ create external table myData (str1 string, str2 string, count1 int)
> partitioned by <snip> row format <snip> stored as textfile****
>
> ** **
>
> Ok, no problem. Now lets load the data****
>
> ** **
>
> $ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;
> ****
>
> ** **
>
> (referring to https://cwiki.apache.org/Hive/languagemanual-dml.html -
> "...filepath can refer to a file (in which case hive will move the file
> into the table)")****
>
> ** **
>
> Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is
> not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3://
> mybucket/path/to/data/src1.txt to: hdfs://
> 10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check
> that values for params "default.fs.name" and
> "hive.metastore.warehouse.dir" do not conflict."****
>
> ** **
>
> So I check my default.fs.name and hive.metastore.warehouse.dir (which
> have never caused problems before):****
>
> ** **
>
> $ set fs.default.name;****
>
> fs.default.name=hdfs://10.48.97.97:9000****
>
> $ set hive.metastore.warehouse.dir;****
>
> hive.metastore.warehouse.dir=/mnt/hive_081/warehouse****
>
> ** **
>
> Clearly different, but which is correct? Is there an easier way to load a
> single file into a hive table? Or should I just put each file in a
> directory and proceed as before?****
>
> ** **
>
> Thanks!****
>
> ** **
>
> Tony****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *Tony Burton
> Senior Software Engineer*
> e: tburton@sportingindex.com
>
> ****
>
> [image: cid:image001.png@01CDC643.43FE7330]<http://www.sportingsolutions.com/>
> ****
>
> ** **
>
>
>
>
> *****************************************************************************
> P *Please consider the environment before printing this email or
> attachments*
>
>
> This email and any attachments are confidential, protected by copyright
> and may be legally privileged. If you are not the intended recipient, then
> the dissemination or copying of this email is prohibited. If you have
> received this in error, please notify the sender by replying by email and
> then delete the email completely from your system. Neither Sporting Index
> nor the sender accepts responsibility for any virus, or any other defect
> which might affect any computer or IT system into which the email is
> received and/or opened. It is the responsibility of the recipient to scan
> the email and no responsibility is accepted for any loss or damage arising
> in any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842, whose
> registered office is at Gateway House, Milverton Street, London, SE11 4AP.
> Sporting Index Ltd is authorised and regulated by the UK Financial Services
> Authority (reg. no. 150404) and Gambling Commission (reg. no.
> 000-027343-R-308898-001). Any financial promotion contained herein has been
> issued and approved by Sporting Index Ltd.
>
> Outbound email has been scanned for viruses and SPAM
>
>

RE: S3/EMR Hive: Load contents of a single file

Posted by Tony Burton <TB...@SportingIndex.com>.

Also, it might be worth pointing out that despite what I've written in my create table statement, this table is not partitioned - just one set of data contained in one textfile on s3. So perhaps this is more accurate:

$ create external table myData (str1 string, str2 string, count1 int) row format <format options> stored as textfile;

Tony




From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: 26 March 2013 17:42
To: user@hive.apache.org
Subject: Re: S3/EMR Hive: Load contents of a single file

Awesome ! Keep the bees happy ! Happy Hiving !

From: Tony Burton <TB...@SportingIndex.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, March 26, 2013 10:39 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: RE: S3/EMR Hive: Load contents of a single file


Thanks for the quick reply Sanjay.

ALTER TABLE is the key, but slightly different to your suggestion. I create the table as before, but don't specify location:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile;

Then use ALTER TABLE like this:

$ ALTER TABLE myData SET LOCATION ' s3://mybucket/path/to/data/src1.txt ';

Bingo, I can now run queries with myData in the same way I can when the LOCATION is a directory. Cool!

Tony







From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: 26 March 2013 17:22
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: S3/EMR Hive: Load contents of a single file

Hi Tony

Can u create the table without any location.

After that you could do an ALTER TABLE add location and partition

ALTER TABLE myData ADD PARTITION (partitionColumn1='$value1' , partitionColumn2='$value2') LOCATION '/path/to/your/directory/in/hdfs';"

An example Without Partitions
-----------------------------
ALTER TABLE myData SET LOCATION 'hdfs://10.48.97.97:9000/path/to/your/data/directory/in/hdfs';"

While specifying location, you have to point to a directory. You cannot point to a file (IMHO).

Hope that helps

sanjay

From: Tony Burton <TB...@SportingIndex.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, March 26, 2013 10:11 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: S3/EMR Hive: Load contents of a single file

Hi list,

I've been using hive to perform queries on data hosted on AWS S3, and my tables point at data by specifying the directory in which the data is stored, eg

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data';

where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data/src1.txt';

and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile

Ok, no problem. Now lets load the data

$ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;

(referring to https://cwiki.apache.org/Hive/languagemanual-dml.html - "...filepath can refer to a file (in which case hive will move the file into the table)")

Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to: hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."

So I check my default.fs.name and hive.metastore.warehouse.dir (which have never caused problems before):

$ set fs.default.name;
fs.default.name=hdfs://10.48.97.97:9000
$ set hive.metastore.warehouse.dir;
hive.metastore.warehouse.dir=/mnt/hive_081/warehouse

Clearly different, but which is correct? Is there an easier way to load a single file into a hive table? Or should I just put each file in a directory and proceed as before?

Thanks!

Tony







Tony Burton
Senior Software Engineer
e: tburton@sportingindex.com<ma...@sportingindex.com>
[cid:image001.png@01CE2A49.C238E1D0]<http://www.sportingsolutions.com/>



*****************************************************************************
PPlease consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.


*****************************************************************************
PPleaseconsider the environment before printing this email

Inbound email has been scanned for viruses & spam


*****************************************************************************
PPlease consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

Re: MySQL instance on hadoop name node server in production ?

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Thanks Nitin…the mysql replication is a good point
Regards
sanjay

From: Nitin Pawar <ni...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Wednesday, March 27, 2013 11:36 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: MySQL instance on hadoop name node server in production ?

few writes understood but many reads will depends on how big your metastore will grow.

also what size of hdfs you are looking at that will play a huge factor in this cause namenode will contain the FSImage in memory so you need a larger memory given to namenode. Potentially if its a live cluster then I guess your namenode will be under heavy loads if you are setting up a production system.

i would still suggest for metastore you setup a couple of VMs with mysql replication enabled. so the reads can be distributed with load balancers etc

On Thu, Mar 28, 2013 at 12:00 AM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:
Thanks Nitin.
The mysql instance is for hive metastore only so few writes, many reads.
Regards
sanjay

From: Nitin Pawar <ni...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Wednesday, March 27, 2013 11:13 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: MySQL instance on hadoop name node server in production ?

ideally when you setup namenode and jobtracker in any hadoop cluster on production those are normally dedicated to that purpose only.

Depending on how frequently you are going to run queries and how much data the hdfs is going to hold is key factor in deciding this.

On Wed, Mar 27, 2013 at 11:32 PM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:
Hi all

I am planning to install mysql server (as hive metastore) on the same box
as my name node.
My name node has 16GB RAM and hopefully I can get 2TB
Any problems with mysql on the dame node as name node ?

Thanks
sanjay

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

--
Nitin Pawar

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

--
Nitin Pawar

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MySQL instance on hadoop name node server in production ?

Posted by Nitin Pawar <ni...@gmail.com>.

few writes understood but many reads will depends on how big your metastore
will grow.

also what size of hdfs you are looking at that will play a huge factor in
this cause namenode will contain the FSImage in memory so you need a larger
memory given to namenode. Potentially if its a live cluster then I guess
your namenode will be under heavy loads if you are setting up a production
system.

i would still suggest for metastore you setup a couple of VMs with mysql
replication enabled. so the reads can be distributed with load balancers
etc


On Thu, Mar 28, 2013 at 12:00 AM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  Thanks Nitin.
> The mysql instance is for hive metastore only so few writes, many reads.
> Regards
> sanjay
>
>   From: Nitin Pawar <ni...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Wednesday, March 27, 2013 11:13 AM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Re: MySQL instance on hadoop name node server in production ?
>
>   ideally when you setup namenode and jobtracker in any hadoop cluster on
> production those are normally dedicated to that purpose only.
>
>  Depending on how frequently you are going to run queries and how much
> data the hdfs is going to hold is key factor in deciding this.
>
>
> On Wed, Mar 27, 2013 at 11:32 PM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:
>
>> Hi all
>>
>> I am planning to install mysql server (as hive metastore) on the same box
>> as my name node.
>> My name node has 16GB RAM and hopefully I can get 2TB
>> Any problems with mysql on the dame node as name node ?
>>
>> Thanks
>> sanjay
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>>
>
>
>  --
> Nitin Pawar
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>



-- 
Nitin Pawar

Re: MySQL instance on hadoop name node server in production ?

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Thanks Nitin.
The mysql instance is for hive metastore only so few writes, many reads.
Regards
sanjay

From: Nitin Pawar <ni...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Wednesday, March 27, 2013 11:13 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: MySQL instance on hadoop name node server in production ?

ideally when you setup namenode and jobtracker in any hadoop cluster on production those are normally dedicated to that purpose only.

Depending on how frequently you are going to run queries and how much data the hdfs is going to hold is key factor in deciding this.

On Wed, Mar 27, 2013 at 11:32 PM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:
Hi all

I am planning to install mysql server (as hive metastore) on the same box
as my name node.
My name node has 16GB RAM and hopefully I can get 2TB
Any problems with mysql on the dame node as name node ?

Thanks
sanjay

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

--
Nitin Pawar

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MySQL instance on hadoop name node server in production ?

Posted by Nitin Pawar <ni...@gmail.com>.

ideally when you setup namenode and jobtracker in any hadoop cluster on
production those are normally dedicated to that purpose only.

Depending on how frequently you are going to run queries and how much data
the hdfs is going to hold is key factor in deciding this.


On Wed, Mar 27, 2013 at 11:32 PM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

> Hi all
>
> I am planning to install mysql server (as hive metastore) on the same box
> as my name node.
> My name node has 16GB RAM and hopefully I can get 2TB
> Any problems with mysql on the dame node as name node ?
>
> Thanks
> sanjay
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>


-- 
Nitin Pawar

RE: S3/EMR Hive: Load contents of a single file

Posted by Tony Burton <TB...@SportingIndex.com>.

No problem Keith - it was a worthwhile exercise for me to go back and double check everything was working as expected.

-----Original Message-----
From: Keith Wiley [mailto:kwiley@keithwiley.com] 
Sent: 27 March 2013 17:03
To: user@hive.apache.org
Subject: Re: S3/EMR Hive: Load contents of a single file

Okay, I also saw your previous response which analyzed queries into two tables built around two files in the same directory.  I guess I was simply wrong in my understanding that a Hive table is fundamentally associated with a directory instead of a file.  Turns out, it be can either one.  A directory table uses all files in the directory while a file table uses one specific file and properly avoids sibling files.  My bad.

Thanks for the careful analysis and clarification.  TIL!

Cheers!

On Mar 27, 2013, at 02:58 , Tony Burton wrote:

> A bit more info - do an extended description of the table:
>  
> $ desc extended gsrc1;
>  
> And the "location" field is "location:s3://mybucket/path/to/data/src1.txt"
>  
> Do the same on a table created with a location pointing at the directory and the same info gives (not surprisingly) "location:s3://mybucket/path/to/data/"
> 

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with isn't it, and what's it seems weird and scary to me."
                                           --  Abe (Grandpa) Simpson ________________________________________________________________________________

Please consider the environment before printing this email

Inbound Email has been scanned for viruses and SPAM 

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

MySQL instance on hadoop name node server in production ?

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Hi all

I am planning to install mysql server (as hive metastore) on the same box
as my name node.
My name node has 16GB RAM and hopefully I can get 2TB
Any problems with mysql on the dame node as name node ?

Thanks
sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: S3/EMR Hive: Load contents of a single file

Posted by Keith Wiley <kw...@keithwiley.com>.

Okay, I also saw your previous response which analyzed queries into two tables built around two files in the same directory.  I guess I was simply wrong in my understanding that a Hive table is fundamentally associated with a directory instead of a file.  Turns out, it be can either one.  A directory table uses all files in the directory while a file table uses one specific file and properly avoids sibling files.  My bad.

Thanks for the careful analysis and clarification.  TIL!

Cheers!

On Mar 27, 2013, at 02:58 , Tony Burton wrote:

> A bit more info - do an extended description of the table:
>  
> $ desc extended gsrc1;
>  
> And the “location” field is “location:s3://mybucket/path/to/data/src1.txt”
>  
> Do the same on a table created with a location pointing at the directory and the same info gives (not surprisingly) “location:s3://mybucket/path/to/data/”
> 

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
                                           --  Abe (Grandpa) Simpson
________________________________________________________________________________

RE: S3/EMR Hive: Load contents of a single file

Posted by Tony Burton <TB...@SportingIndex.com>.

A bit more info - do an extended description of the table:

$ desc extended gsrc1;

And the "location" field is "location:s3://mybucket/path/to/data/src1.txt"

Do the same on a table created with a location pointing at the directory and the same info gives (not surprisingly) "location:s3://mybucket/path/to/data/"





From: Tony Burton [mailto:TBurton@sportingindex.com]
Sent: 27 March 2013 08:46
To: 'user@hive.apache.org'
Subject: RE: S3/EMR Hive: Load contents of a single file

Thanks for the reply Keith.

> you could have dispensed with the additional "alter table" business and simply created the original table around the directory in the first place

Yep, but I have multiple files in that directory and wanted to create a table based upon one file per table.

> Do you know for certain that it isn't using other files also in that directory as part of the same table
> or if it is currently empty, that if you add a new file to the directory after creating the table in your
> described fashion, it doesn't immediately become visible as part of the table?

I've got two files in my s3://mybucket/path/to/data/ directory, s3://mybucket/path/to/data/src1.txt and s3://mybucket/path/to/data/src2.txt - both contain lists of ~-separated date/count pairs, eg 20130101~12345. Both contain data for just the month of February this year.

Create two tables:

$ create external table gsrc1 (gdate string, c int) row format delimited fields terminated by '~' stored as textfile;
$ alter table gsrc1 set location 's3://spinmetrics/global/src1.txt';
$ create external table gsrc2 (gdate string, c int) row format delimited fields terminated by '~' stored as textfile;
$ alter table gsrc2 set location 's3://spinmetrics/global/src2.txt';

Count(*) on each table:

$ select count(*) from gsrc1:
28
$ select count(*) from gsrc2:
28

Ok, but both tables could be pointing at the same data. Check max, min and first/last entry from both tables:

$ select min(c), max(c) from gsrc1;
2935 23130
$ select min(c), max(c) from gsrc2;
865953 2768868

$ select * from gsrc1 where gdate="20130201"
20130201 5153
$ select * from gsrc1 where gdate="20130228"
20130228 7051
$ select * from gsrc2 where gdate="20130201"
20130201 1472017
$ select * from gsrc2 where gdate="20130228"
20130228 1323241

And without copying in the whole data set I am 100% confident that these values match the contents of the individual files in s3. Maybe other readers could try a similar exercise and present their results? Are there other tests I could try to further verify my findings?

Tony





-----Original Message-----
From: Keith Wiley [mailto:kwiley@keithwiley.com]
Sent: 26 March 2013 19:40
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: S3/EMR Hive: Load contents of a single file

Are you sure this is doing what you think it's doing? Since Hive associates tables with directories (well external tables at least, I'm not very familiar with internal tables), my suspicion is that even if your approach described below works, what Hive actually did was use s3://mybucket/path/to/data/ as the table location...in which case you could have dispensed with the additional "alter table" business and simply created the original table around the directory in the first place...or I could be completely wrong. Do you know for certain that it isn't using other files also in that directory as part of the same table...or if it is currently empty, that if you add a new file to the directory after creating the table in your described fashion, it doesn't immediately become visible as part of the table? I eagerly await clarification.

On Mar 26, 2013, at 10:39 , Tony Burton wrote:

>
> Thanks for the quick reply Sanjay.
>
> ALTER TABLE is the key, but slightly different to your suggestion. I create the table as before, but don't specify location:
>
> $ create external table myData (str1 string, str2 string, count1 int)
> partitioned by <snip> row format <snip> stored as textfile;
>
> Then use ALTER TABLE like this:
>
> $ ALTER TABLE myData SET LOCATION '
> s3://mybucket/path/to/data/src1.txt ';
>
> Bingo, I can now run queries with myData in the same way I can when the LOCATION is a directory. Cool!
>
> Tony
>
>
>
>
>
>
>
> From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> Sent: 26 March 2013 17:22
> To: user@hive.apache.org<ma...@hive.apache.org>
> Subject: Re: S3/EMR Hive: Load contents of a single file
>
> Hi Tony
>
> Can u create the table without any location.
>
> After that you could do an ALTER TABLE add location and partition
>
> ALTER TABLE myData ADD PARTITION (partitionColumn1='$value1' , partitionColumn2='$value2') LOCATION '/path/to/your/directory/in/hdfs';"
>
>
> An example Without Partitions
> -----------------------------
> ALTER TABLE myData SET LOCATION 'hdfs://10.48.97.97:9000/path/to/your/data/directory/in/hdfs';"
>
>
> While specifying location, you have to point to a directory. You cannot point to a file (IMHO).
>
> Hope that helps
>
> sanjay
>
> From: Tony Burton <TB...@SportingIndex.com>>
> Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
> Date: Tuesday, March 26, 2013 10:11 AM
> To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
> Subject: S3/EMR Hive: Load contents of a single file
>
> Hi list,
>
> I've been using hive to perform queries on data hosted on AWS S3, and
> my tables point at data by specifying the directory in which the data
> is stored, eg
>
> $ create external table myData (str1 string, str2 string, count1 int)
> partitioned by <snip> row format <snip> stored as textfile location
> 's3://mybucket/path/to/data';
>
> where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:
>
> $ create external table myData (str1 string, str2 string, count1 int)
> partitioned by <snip> row format <snip> stored as textfile location
> 's3://mybucket/path/to/data/src1.txt';
>
> and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:
>
> $ create external table myData (str1 string, str2 string, count1 int)
> partitioned by <snip> row format <snip> stored as textfile
>
> Ok, no problem. Now lets load the data
>
> $ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE
> myData;
>
> (referring to https://cwiki.apache.org/Hive/languagemanual-dml.html -
> "...filepath can refer to a file (in which case hive will move the
> file into the table)")
>
> Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to:hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."
>
> So I check my default.fs.name and hive.metastore.warehouse.dir (which have never caused problems before):
>
> $ set fs.default.name;
> fs.default.name=hdfs://10.48.97.97:9000
> $ set hive.metastore.warehouse.dir;
> hive.metastore.warehouse.dir=/mnt/hive_081/warehouse
>
> Clearly different, but which is correct? Is there an easier way to load a single file into a hive table? Or should I just put each file in a directory and proceed as before?
>
> Thanks!
>
> Tony
>
>
>
>
>
>
>
> Tony Burton
> Senior Software Engineer
> e: tburton@sportingindex.com<ma...@sportingindex.com>
>
> <image001.png>
>
>
>
>
> **********************************************************************
> ******* PPlease consider the environment before printing this email or
> attachments
>
>
> This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
>
>
>
> **********************************************************************
> ******* P Pleaseconsider the environment before printing this email
>
>
> Inbound email has been scanned for viruses & spam
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM
>


________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com<ma...@keithwiley.com> keithwiley.com music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy."
-- Edwin A. Abbott, Flatland ________________________________________________________________________________



Please consider the environment before printing this email

Inbound Email has been scanned for viruses and SPAM


*****************************************************************************
P Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

RE: S3/EMR Hive: Load contents of a single file

Posted by Tony Burton <TB...@SportingIndex.com>.

Thanks for the reply Keith.

> you could have dispensed with the additional "alter table" business and simply created the original table around the directory in the first place

Yep, but I have multiple files in that directory and wanted to create a table based upon one file per table.

> Do you know for certain that it isn't using other files also in that directory as part of the same table
> or if it is currently empty, that if you add a new file to the directory after creating the table in your
> described fashion, it doesn't immediately become visible as part of the table?

I've got two files in my s3://mybucket/path/to/data/ directory, s3://mybucket/path/to/data/src1.txt and s3://mybucket/path/to/data/src2.txt - both contain lists of ~-separated date/count pairs, eg 20130101~12345. Both contain data for just the month of February this year.

Create two tables: 

$ create external table gsrc1 (gdate string, c int) row format delimited fields terminated by '~' stored as textfile;
$ alter table gsrc1 set location 's3://spinmetrics/global/src1.txt';
$ create external table gsrc2 (gdate string, c int) row format delimited fields terminated by '~' stored as textfile;
$ alter table gsrc2 set location 's3://spinmetrics/global/src2.txt';

Count(*) on each table:

$ select count(*) from gsrc1:
28
$ select count(*) from gsrc2:
28

Ok, but both tables could be pointing at the same data. Check max, min and first/last entry from both tables:

$ select min(c), max(c) from gsrc1;
2935    23130
$ select min(c), max(c) from gsrc2;
865953  2768868

$ select * from gsrc1 where gdate="20130201"
20130201        5153
$ select * from gsrc1 where gdate="20130228"
20130228        7051
$ select * from gsrc2 where gdate="20130201"
20130201        1472017
$ select * from gsrc2 where gdate="20130228"
20130228        1323241

And without copying in the whole data set I am 100% confident that these values match the contents of the individual files in s3. Maybe other readers could try a similar exercise and present their results? Are there other tests I could try to further verify my findings?

Tony





-----Original Message-----
From: Keith Wiley [mailto:kwiley@keithwiley.com] 
Sent: 26 March 2013 19:40
To: user@hive.apache.org
Subject: Re: S3/EMR Hive: Load contents of a single file

Are you sure this is doing what you think it's doing?  Since Hive associates tables with directories (well external tables at least, I'm not very familiar with internal tables), my suspicion is that even if your approach described below works, what Hive actually did was use s3://mybucket/path/to/data/ as the table location...in which case you could have dispensed with the additional "alter table" business and simply created the original table around the directory in the first place...or I could be completely wrong.  Do you know for certain that it isn't using other files also in that directory as part of the same table...or if it is currently empty, that if you add a new file to the directory after creating the table in your described fashion, it doesn't immediately become visible as part of the table?  I eagerly await clarification.

On Mar 26, 2013, at 10:39 , Tony Burton wrote:

>  
> Thanks for the quick reply Sanjay.
>  
> ALTER TABLE is the key, but slightly different to your suggestion. I create the table as before, but don't specify location:
>  
> $ create external table myData (str1 string, str2 string, count1 int) 
> partitioned by <snip> row format <snip> stored as textfile;
>  
> Then use ALTER TABLE like this:
>  
> $ ALTER TABLE myData SET LOCATION ' 
> s3://mybucket/path/to/data/src1.txt ';
>  
> Bingo, I can now run queries with myData in the same way I can when the LOCATION is a directory. Cool!
>  
> Tony
>  
>  
>  
>  
>  
>  
>  
> From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> Sent: 26 March 2013 17:22
> To: user@hive.apache.org
> Subject: Re: S3/EMR Hive: Load contents of a single file
>  
> Hi Tony
>  
> Can u create the table without any location. 
>  
> After that you could do an ALTER TABLE add location and partition
>  
> ALTER TABLE myData ADD PARTITION (partitionColumn1='$value1' , partitionColumn2='$value2') LOCATION '/path/to/your/directory/in/hdfs';"
> 
> 
> An example Without Partitions
> -----------------------------
> ALTER TABLE myData SET LOCATION 'hdfs://10.48.97.97:9000/path/to/your/data/directory/in/hdfs';"
> 
> 
> While specifying location, you have to point to a directory. You cannot point to a file (IMHO).
>  
> Hope that helps
>  
> sanjay
>  
> From: Tony Burton <TB...@SportingIndex.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Tuesday, March 26, 2013 10:11 AM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: S3/EMR Hive: Load contents of a single file
>  
> Hi list,
>  
> I've been using hive to perform queries on data hosted on AWS S3, and 
> my tables point at data by specifying the directory in which the data 
> is stored, eg
>  
> $ create external table myData (str1 string, str2 string, count1 int) 
> partitioned by <snip> row format <snip> stored as textfile location 
> 's3://mybucket/path/to/data';
>  
> where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:
>  
> $ create external table myData (str1 string, str2 string, count1 int) 
> partitioned by <snip> row format <snip> stored as textfile location 
> 's3://mybucket/path/to/data/src1.txt';
>            
> and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:
>  
> $ create external table myData (str1 string, str2 string, count1 int) 
> partitioned by <snip> row format <snip> stored as textfile
>  
> Ok, no problem. Now lets load the data
>  
> $ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE 
> myData;
>  
> (referring to https://cwiki.apache.org/Hive/languagemanual-dml.html - 
> "...filepath can refer to a file (in which case hive will move the 
> file into the table)")
>  
> Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to:hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."
>  
> So I check my default.fs.name and hive.metastore.warehouse.dir (which have never caused problems before):
>  
> $ set fs.default.name;
> fs.default.name=hdfs://10.48.97.97:9000
> $ set hive.metastore.warehouse.dir;
> hive.metastore.warehouse.dir=/mnt/hive_081/warehouse
>  
> Clearly different, but which is correct? Is there an easier way to load a single file into a hive table? Or should I just put each file in a directory and proceed as before?
>  
> Thanks!
>  
> Tony
>  
>  
>  
>  
>  
>  
>  
> Tony Burton
> Senior Software Engineer
> e: tburton@sportingindex.com
> 
> <image001.png>
>  
>  
> 
> 
> **********************************************************************
> ******* PPlease consider the environment before printing this email or 
> attachments
> 
> 
> This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.
> 
> 
> Outbound email has been scanned for viruses and SPAM
>  
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
>  
> 
> 
> **********************************************************************
> ******* P Pleaseconsider the environment before printing this email
> 
> 
> Inbound email has been scanned for viruses & spam
> 
> 
> 
> 
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
> 
> 
> This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.
> 
> 
> Outbound email has been scanned for viruses and SPAM
> 


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland ________________________________________________________________________________



Please consider the environment before printing this email

Inbound Email has been scanned for viruses and SPAM 

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

Re: S3/EMR Hive: Load contents of a single file

Posted by Keith Wiley <kw...@keithwiley.com>.

Are you sure this is doing what you think it's doing?  Since Hive associates tables with directories (well external tables at least, I'm not very familiar with internal tables), my suspicion is that even if your approach described below works, what Hive actually did was use s3://mybucket/path/to/data/ as the table location...in which case you could have dispensed with the additional "alter table" business and simply created the original table around the directory in the first place...or I could be completely wrong.  Do you know for certain that it isn't using other files also in that directory as part of the same table...or if it is currently empty, that if you add a new file to the directory after creating the table in your described fashion, it doesn't immediately become visible as part of the table?  I eagerly await clarification.

On Mar 26, 2013, at 10:39 , Tony Burton wrote:

>  
> Thanks for the quick reply Sanjay.
>  
> ALTER TABLE is the key, but slightly different to your suggestion. I create the table as before, but don’t specify location:
>  
> $ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile;
>  
> Then use ALTER TABLE like this:
>  
> $ ALTER TABLE myData SET LOCATION ' s3://mybucket/path/to/data/src1.txt ';
>  
> Bingo, I can now run queries with myData in the same way I can when the LOCATION is a directory. Cool!
>  
> Tony
>  
>  
>  
>  
>  
>  
>  
> From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com] 
> Sent: 26 March 2013 17:22
> To: user@hive.apache.org
> Subject: Re: S3/EMR Hive: Load contents of a single file
>  
> Hi Tony 
>  
> Can u create the table without any location. 
>  
> After that you could do an ALTER TABLE add location and partition
>  
> ALTER TABLE myData ADD PARTITION (partitionColumn1='$value1' , partitionColumn2='$value2') LOCATION '/path/to/your/directory/in/hdfs';"
> 
> 
> An example Without Partitions
> -----------------------------
> ALTER TABLE myData SET LOCATION 'hdfs://10.48.97.97:9000/path/to/your/data/directory/in/hdfs';"
> 
> 
> While specifying location, you have to point to a directory. You cannot point to a file (IMHO).
>  
> Hope that helps
>  
> sanjay
>  
> From: Tony Burton <TB...@SportingIndex.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Tuesday, March 26, 2013 10:11 AM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: S3/EMR Hive: Load contents of a single file
>  
> Hi list,
>  
> I've been using hive to perform queries on data hosted on AWS S3, and my tables point at data by specifying the directory in which the data is stored, eg
>  
> $ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data';
>  
> where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:
>  
> $ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data/src1.txt';
>            
> and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:
>  
> $ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile
>  
> Ok, no problem. Now lets load the data
>  
> $ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;
>  
> (referring to https://cwiki.apache.org/Hive/languagemanual-dml.html - "...filepath can refer to a file (in which case hive will move the file into the table)")
>  
> Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to:hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."
>  
> So I check my default.fs.name and hive.metastore.warehouse.dir (which have never caused problems before):
>  
> $ set fs.default.name;
> fs.default.name=hdfs://10.48.97.97:9000
> $ set hive.metastore.warehouse.dir;
> hive.metastore.warehouse.dir=/mnt/hive_081/warehouse
>  
> Clearly different, but which is correct? Is there an easier way to load a single file into a hive table? Or should I just put each file in a directory and proceed as before?
>  
> Thanks!
>  
> Tony
>  
>  
>  
>  
>  
>  
>  
> Tony Burton
> Senior Software Engineer
> e: tburton@sportingindex.com
> 
> <image001.png>
>  
>  
> 
> 
> *****************************************************************************
> PPlease consider the environment before printing this email or attachments
> 
> 
> This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.
> 
> 
> Outbound email has been scanned for viruses and SPAM
>  
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
>  
> 
> 
> *****************************************************************************
> P Pleaseconsider the environment before printing this email
> 
> 
> Inbound email has been scanned for viruses & spam
> 
> 
> 
> 
> *****************************************************************************
> P Please consider the environment before printing this email or attachments
> 
> 
> This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.
> 
> 
> Outbound email has been scanned for viruses and SPAM
> 


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland
________________________________________________________________________________

Re: S3/EMR Hive: Load contents of a single file

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Awesome ! Keep the bees happy ! Happy Hiving !

From: Tony Burton <TB...@SportingIndex.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, March 26, 2013 10:39 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: RE: S3/EMR Hive: Load contents of a single file

Thanks for the quick reply Sanjay.

ALTER TABLE is the key, but slightly different to your suggestion. I create the table as before, but don’t specify location:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile;

Then use ALTER TABLE like this:

$ ALTER TABLE myData SET LOCATION ' s3://mybucket/path/to/data/src1.txt ';

Bingo, I can now run queries with myData in the same way I can when the LOCATION is a directory. Cool!

Tony

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: 26 March 2013 17:22
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: S3/EMR Hive: Load contents of a single file

Hi Tony

Can u create the table without any location.

After that you could do an ALTER TABLE add location and partition

ALTER TABLE myData ADD PARTITION (partitionColumn1='$value1' , partitionColumn2='$value2') LOCATION '/path/to/your/directory/in/hdfs';"

An example Without Partitions
-----------------------------
ALTER TABLE myData SET LOCATION 'hdfs://10.48.97.97:9000/path/to/your/data/directory/in/hdfs';"

While specifying location, you have to point to a directory. You cannot point to a file (IMHO).

Hope that helps

sanjay

From: Tony Burton <TB...@SportingIndex.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, March 26, 2013 10:11 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: S3/EMR Hive: Load contents of a single file

Hi list,

I've been using hive to perform queries on data hosted on AWS S3, and my tables point at data by specifying the directory in which the data is stored, eg

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data';

where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data/src1.txt';

and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile

Ok, no problem. Now lets load the data

$ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;

(referring to https://cwiki.apache.org/Hive/languagemanual-dml.html - "...filepath can refer to a file (in which case hive will move the file into the table)")

Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to: hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."

So I check my default.fs.name and hive.metastore.warehouse.dir (which have never caused problems before):

$ set fs.default.name;
fs.default.name=hdfs://10.48.97.97:9000
$ set hive.metastore.warehouse.dir;
hive.metastore.warehouse.dir=/mnt/hive_081/warehouse

Clearly different, but which is correct? Is there an easier way to load a single file into a hive table? Or should I just put each file in a directory and proceed as before?

Thanks!

Tony

Tony Burton
Senior Software Engineer
e: tburton@sportingindex.com<ma...@sportingindex.com>
[cid:image001.png@01CDC643.43FE7330]<http://www.sportingsolutions.com/>

*****************************************************************************
PPlease consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

*****************************************************************************
PPleaseconsider the environment before printing this email

Inbound email has been scanned for viruses & spam

*****************************************************************************
PPlease consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

RE: S3/EMR Hive: Load contents of a single file

Posted by Tony Burton <TB...@SportingIndex.com>.

Thanks for the quick reply Sanjay.

ALTER TABLE is the key, but slightly different to your suggestion. I create the table as before, but don't specify location:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile;

Then use ALTER TABLE like this:

$ ALTER TABLE myData SET LOCATION ' s3://mybucket/path/to/data/src1.txt ';

Bingo, I can now run queries with myData in the same way I can when the LOCATION is a directory. Cool!

Tony







From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: 26 March 2013 17:22
To: user@hive.apache.org
Subject: Re: S3/EMR Hive: Load contents of a single file

Hi Tony

Can u create the table without any location.

After that you could do an ALTER TABLE add location and partition

ALTER TABLE myData ADD PARTITION (partitionColumn1='$value1' , partitionColumn2='$value2') LOCATION '/path/to/your/directory/in/hdfs';"


An example Without Partitions
-----------------------------
ALTER TABLE myData SET LOCATION 'hdfs://10.48.97.97:9000/path/to/your/data/directory/in/hdfs';"


While specifying location, you have to point to a directory. You cannot point to a file (IMHO).

Hope that helps

sanjay

From: Tony Burton <TB...@SportingIndex.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, March 26, 2013 10:11 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: S3/EMR Hive: Load contents of a single file

Hi list,

I've been using hive to perform queries on data hosted on AWS S3, and my tables point at data by specifying the directory in which the data is stored, eg

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data';

where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data/src1.txt';

and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile

Ok, no problem. Now lets load the data

$ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;

(referring to https://cwiki.apache.org/Hive/languagemanual-dml.html - "...filepath can refer to a file (in which case hive will move the file into the table)")

Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to: hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."

So I check my default.fs.name and hive.metastore.warehouse.dir (which have never caused problems before):

$ set fs.default.name;
fs.default.name=hdfs://10.48.97.97:9000
$ set hive.metastore.warehouse.dir;
hive.metastore.warehouse.dir=/mnt/hive_081/warehouse

Clearly different, but which is correct? Is there an easier way to load a single file into a hive table? Or should I just put each file in a directory and proceed as before?

Thanks!

Tony







Tony Burton
Senior Software Engineer
e: tburton@sportingindex.com<ma...@sportingindex.com>
[cid:image001.png@01CE2A48.EF2C8FD0]<http://www.sportingsolutions.com/>



*****************************************************************************
PPlease consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.


*****************************************************************************
P Pleaseconsider the environment before printing this email

Inbound email has been scanned for viruses & spam

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

Re: S3/EMR Hive: Load contents of a single file

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Hi Tony

Can u create the table without any location.

After that you could do an ALTER TABLE add location and partition

ALTER TABLE myData ADD PARTITION (partitionColumn1='$value1' , partitionColumn2='$value2') LOCATION '/path/to/your/directory/in/hdfs';"

An example Without Partitions
-----------------------------
ALTER TABLE myData SET LOCATION 'hdfs://10.48.97.97:9000/path/to/your/data/directory/in/hdfs';"

While specifying location, you have to point to a directory. You cannot point to a file (IMHO).

Hope that helps

sanjay

From: Tony Burton <TB...@SportingIndex.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, March 26, 2013 10:11 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: S3/EMR Hive: Load contents of a single file

Hi list,

I've been using hive to perform queries on data hosted on AWS S3, and my tables point at data by specifying the directory in which the data is stored, eg

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data';

where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data/src1.txt';

and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:

$ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile

Ok, no problem. Now lets load the data

$ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;

(referring to https://cwiki.apache.org/Hive/languagemanual-dml.html - "...filepath can refer to a file (in which case hive will move the file into the table)")

Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to: hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."

So I check my default.fs.name and hive.metastore.warehouse.dir (which have never caused problems before):

$ set fs.default.name;
fs.default.name=hdfs://10.48.97.97:9000
$ set hive.metastore.warehouse.dir;
hive.metastore.warehouse.dir=/mnt/hive_081/warehouse

Clearly different, but which is correct? Is there an easier way to load a single file into a hive table? Or should I just put each file in a directory and proceed as before?

Thanks!

Tony







Tony Burton
Senior Software Engineer
e: tburton@sportingindex.com<ma...@sportingindex.com>

[cid:image001.png@01CDC643.43FE7330]<http://www.sportingsolutions.com/>




*****************************************************************************
PPlease consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.