You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by chen keven <ck...@gmail.com> on 2009/07/21 00:53:01 UTC
load data into table
I'm trying to load data into table using the command below. However, I only
got a bunch of NULL in the field. The data fields are seperated by tab.
CREATE TABLE IF NOT EXISTS userweight(source INT, dist INT, weight DOUBLE)
row format delimited fields terminated by " \t";
load data local inpath "/tmp/Graph/edges_tag_jaccard_directed_2006.dat" into
table userweight
--
Thank you,
Keven Chen
Re: loading data from HDFS or local file to
Posted by Manhee Jo <jo...@nttdocomo.com>.
Excellent! Thank you, Zheng.
----- Original Message -----
From: "Zheng Shao" <zs...@gmail.com>
To: <hi...@hadoop.apache.org>
Sent: Friday, July 24, 2009 6:38 PM
Subject: Re: loading data from HDFS or local file to
Hi Manhee,
You don't need to do "load" for an external table. You already
specified the location of the external table in the "create external
table" command, so you can directly use that external table.
Zheng
On Wed, Jul 22, 2009 at 7:12 PM, Manhee Jo<jo...@nttdocomo.com> wrote:
> Hi Zheng,
>
> I've tried to load a sample file after creating an external table like
> below.
>
> hive> create external table extab (key int, val string)
> > row format delimited fields terminated by '\t'
> > lines terminated by '\n'
> > location '/user/hive/warehouse/test/';
>
> Here, /user/hive/warehouse/test contains an HDFS file which I am going to
> load
> into table extab. this was OK. On load, though,
>
> hive> load data inpath '/user/hive/warehouse/test/kv1.txt'
> > overwrite into table extab;
>
> I found an error like below
>
> FAILED: Error in semantic analysis: line 2:17 Path is not legal
> '/user/hive/warehouse/test/kv1.txt':
> Move from: hdfs://vm2:9000/user/hive/warehouse/test/kv1.txt to:
> /user/hive/warehouse/test/ is not valid.
> Please check that values for params "default.fs.name" and
> "hive.metastore.warehouse.dir" do not onflict.
>
> I've changed directories different ones, but to no avail. Can you suggest
> any solutions?
>
> By the way, is "default.fs.name" right? I could find "fs.default.name" but
> not "default.fs.name".
>
> Thank you,
> Manhee
>
>
> ----- Original Message ----- From: "Zheng Shao" <zs...@gmail.com>
> To: <hi...@hadoop.apache.org>
> Sent: Thursday, July 23, 2009 5:49 AM
> Subject: Re: loading data from HDFS or local file to
>
>
> If the huge file is already on HDFS (load data WITHOUT local), Hive
> will just *move* the file into the table (NOTE: that means user won't
> be able to see the file in its original directory afterwards)
>
> If you don't want that to happen, you might want to use "CREATE
> EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"
>
> If the huge file is on local file system, you will have to use (load
> data WITH local), and Hive will copy the file.
>
>
> Zheng
>
> On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jo<jo...@nttdocomo.com> wrote:
>>
>> Hi all,
>>
>> What really happens when a huge file (e.g. some tens of TB) is "LOADed
>> DATA
>> (LOCAL) INPATH ...
>> INTO TABLE"? Does hive need to scan the entire file before processing
>> anything even very simple (e.g. select)?
>> If so, are there any solutions to decrease the number of disk access? Is
>> partitioning a way to do it?
>>
>> Many Thanks,
>> Manhee
>>
>
>
>
> --
> Yours,
> Zheng
>
>
>
--
Yours,
Zheng
Re: loading data from HDFS or local file to
Posted by Zheng Shao <zs...@gmail.com>.
Hi Manhee,
You don't need to do "load" for an external table. You already
specified the location of the external table in the "create external
table" command, so you can directly use that external table.
Zheng
On Wed, Jul 22, 2009 at 7:12 PM, Manhee Jo<jo...@nttdocomo.com> wrote:
> Hi Zheng,
>
> I've tried to load a sample file after creating an external table like
> below.
>
> hive> create external table extab (key int, val string)
> > row format delimited fields terminated by '\t'
> > lines terminated by '\n'
> > location '/user/hive/warehouse/test/';
>
> Here, /user/hive/warehouse/test contains an HDFS file which I am going to
> load
> into table extab. this was OK. On load, though,
>
> hive> load data inpath '/user/hive/warehouse/test/kv1.txt'
> > overwrite into table extab;
>
> I found an error like below
>
> FAILED: Error in semantic analysis: line 2:17 Path is not legal
> '/user/hive/warehouse/test/kv1.txt':
> Move from: hdfs://vm2:9000/user/hive/warehouse/test/kv1.txt to:
> /user/hive/warehouse/test/ is not valid.
> Please check that values for params "default.fs.name" and
> "hive.metastore.warehouse.dir" do not onflict.
>
> I've changed directories different ones, but to no avail. Can you suggest
> any solutions?
>
> By the way, is "default.fs.name" right? I could find "fs.default.name" but
> not "default.fs.name".
>
> Thank you,
> Manhee
>
>
> ----- Original Message ----- From: "Zheng Shao" <zs...@gmail.com>
> To: <hi...@hadoop.apache.org>
> Sent: Thursday, July 23, 2009 5:49 AM
> Subject: Re: loading data from HDFS or local file to
>
>
> If the huge file is already on HDFS (load data WITHOUT local), Hive
> will just *move* the file into the table (NOTE: that means user won't
> be able to see the file in its original directory afterwards)
>
> If you don't want that to happen, you might want to use "CREATE
> EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"
>
> If the huge file is on local file system, you will have to use (load
> data WITH local), and Hive will copy the file.
>
>
> Zheng
>
> On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jo<jo...@nttdocomo.com> wrote:
>>
>> Hi all,
>>
>> What really happens when a huge file (e.g. some tens of TB) is "LOADed
>> DATA
>> (LOCAL) INPATH ...
>> INTO TABLE"? Does hive need to scan the entire file before processing
>> anything even very simple (e.g. select)?
>> If so, are there any solutions to decrease the number of disk access? Is
>> partitioning a way to do it?
>>
>> Many Thanks,
>> Manhee
>>
>
>
>
> --
> Yours,
> Zheng
>
>
>
--
Yours,
Zheng
Re: loading data from HDFS or local file to
Posted by Manhee Jo <jo...@nttdocomo.com>.
Hi Zheng,
I've tried to load a sample file after creating an external table like
below.
hive> create external table extab (key int, val string)
> row format delimited fields terminated by '\t'
> lines terminated by '\n'
> location '/user/hive/warehouse/test/';
Here, /user/hive/warehouse/test contains an HDFS file which I am going to
load
into table extab. this was OK. On load, though,
hive> load data inpath '/user/hive/warehouse/test/kv1.txt'
> overwrite into table extab;
I found an error like below
FAILED: Error in semantic analysis: line 2:17 Path is not legal
'/user/hive/warehouse/test/kv1.txt':
Move from: hdfs://vm2:9000/user/hive/warehouse/test/kv1.txt to:
/user/hive/warehouse/test/ is not valid.
Please check that values for params "default.fs.name" and
"hive.metastore.warehouse.dir" do not onflict.
I've changed directories different ones, but to no avail. Can you suggest
any solutions?
By the way, is "default.fs.name" right? I could find "fs.default.name" but
not "default.fs.name".
Thank you,
Manhee
----- Original Message -----
From: "Zheng Shao" <zs...@gmail.com>
To: <hi...@hadoop.apache.org>
Sent: Thursday, July 23, 2009 5:49 AM
Subject: Re: loading data from HDFS or local file to
If the huge file is already on HDFS (load data WITHOUT local), Hive
will just *move* the file into the table (NOTE: that means user won't
be able to see the file in its original directory afterwards)
If you don't want that to happen, you might want to use "CREATE
EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"
If the huge file is on local file system, you will have to use (load
data WITH local), and Hive will copy the file.
Zheng
On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jo<jo...@nttdocomo.com> wrote:
> Hi all,
>
> What really happens when a huge file (e.g. some tens of TB) is "LOADed
> DATA
> (LOCAL) INPATH ...
> INTO TABLE"? Does hive need to scan the entire file before processing
> anything even very simple (e.g. select)?
> If so, are there any solutions to decrease the number of disk access? Is
> partitioning a way to do it?
>
> Many Thanks,
> Manhee
>
--
Yours,
Zheng
Re: loading data from HDFS or local file to
Posted by Manhee Jo <jo...@nttdocomo.com>.
Thank you!
----- Original Message -----
From: "Zheng Shao" <zs...@gmail.com>
To: <hi...@hadoop.apache.org>
Sent: Thursday, July 23, 2009 5:49 AM
Subject: Re: loading data from HDFS or local file to
If the huge file is already on HDFS (load data WITHOUT local), Hive
will just *move* the file into the table (NOTE: that means user won't
be able to see the file in its original directory afterwards)
If you don't want that to happen, you might want to use "CREATE
EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"
If the huge file is on local file system, you will have to use (load
data WITH local), and Hive will copy the file.
Zheng
On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jo<jo...@nttdocomo.com> wrote:
> Hi all,
>
> What really happens when a huge file (e.g. some tens of TB) is "LOADed
> DATA
> (LOCAL) INPATH ...
> INTO TABLE"? Does hive need to scan the entire file before processing
> anything even very simple (e.g. select)?
> If so, are there any solutions to decrease the number of disk access? Is
> partitioning a way to do it?
>
> Many Thanks,
> Manhee
>
--
Yours,
Zheng
Re: loading data from HDFS or local file to
Posted by Zheng Shao <zs...@gmail.com>.
If the huge file is already on HDFS (load data WITHOUT local), Hive
will just *move* the file into the table (NOTE: that means user won't
be able to see the file in its original directory afterwards)
If you don't want that to happen, you might want to use "CREATE
EXTERNAL TABLE .... LOCATION "/user/myname/myfiledir";"
If the huge file is on local file system, you will have to use (load
data WITH local), and Hive will copy the file.
Zheng
On Wed, Jul 22, 2009 at 12:25 AM, Manhee Jo<jo...@nttdocomo.com> wrote:
> Hi all,
>
> What really happens when a huge file (e.g. some tens of TB) is "LOADed DATA
> (LOCAL) INPATH ...
> INTO TABLE"? Does hive need to scan the entire file before processing
> anything even very simple (e.g. select)?
> If so, are there any solutions to decrease the number of disk access? Is
> partitioning a way to do it?
>
> Many Thanks,
> Manhee
>
--
Yours,
Zheng
loading data from HDFS or local file to
Posted by Manhee Jo <jo...@nttdocomo.com>.
Hi all,
What really happens when a huge file (e.g. some tens of TB) is "LOADed DATA
(LOCAL) INPATH ...
INTO TABLE"? Does hive need to scan the entire file before processing
anything even very simple (e.g. select)?
If so, are there any solutions to decrease the number of disk access? Is
partitioning a way to do it?
Many Thanks,
Manhee
Re: load data into table
Posted by Zheng Shao <zs...@gmail.com>.
Hi Chen,
Can you double check the format of the file? Is it plain text?
CREATE TABLE IF NOT EXISTS userweight(source INT, dist INT, weight DOUBLE)
ROW FORMAT delimited fields terminated by " \t"
STORED AS TEXTFILE;
The optional "STORED AS" clause tells Hive the format of your file.
Hive supports TEXTFILE and SEQUENCEFILE natively.
If your file has a customized format, you need to write your own
fileformat classes.
Please take a look at the example added by:
https://issues.apache.org/jira/browse/HIVE-639
Zheng
On Mon, Jul 20, 2009 at 3:53 PM, chen keven<ck...@gmail.com> wrote:
> I'm trying to load data into table using the command below. However, I only
> got a bunch of NULL in the field. The data fields are seperated by tab.
>
> CREATE TABLE IF NOT EXISTS userweight(source INT, dist INT, weight DOUBLE)
> row format delimited fields terminated by " \t";
> load data local inpath "/tmp/Graph/edges_tag_jaccard_directed_2006.dat" into
> table userweight
>
> --
> Thank you,
> Keven Chen
>
--
Yours,
Zheng