You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Aaron Kimball <aa...@cloudera.com> on 2009/01/21 04:56:11 UTC

Error loading data from HDFS into Hive

Hi all,

I generated some data using a MapReduce process (output in HDFS) and want to
play with it in Hive.

The output is a set of part-nnnn files in a directory.

I created the table and tried to run the statement:
LOAD DATA INPATH 'hdfs://
namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000' INTO
TABLE ip_locations;

It failed with:

FAILED: Error in semantic analysis: Path is not legal 'hdfs://
namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000':
Cannot load data across filesystems, use load data local
Time taken: 1.51 seconds

The fully-qualified hdfs URI I am using matches exactly against the
fs.default.name in $HADOOP_HOME/conf/hadoop-site.xml. Can anyone suggest
what I might be doing wrong, or where I should look for more information?

I also tried just using "/user/aaron/ip_addr_tables/0/part-00000" and
"ip_addr_tables/0/part-00000" instead.

Thanks,
- Aaron

Re: Error loading data from HDFS into Hive

Posted by Aaron Kimball <aa...@cloudera.com>.
Hi Joydeep,

Thanks for suggesting that - I had created the table before putting the full
URI in the warehouse.dir property, so it captured the borked warehouse path.
Dropping and recreating the table worked. Thanks! :)

- Aaron

On Tue, Jan 20, 2009 at 8:23 PM, Joydeep Sen Sarma <js...@facebook.com>wrote:

>  Can u do a describe extended on the ip_locations table?
>
>
>
> it will have a location string. It's possible that the location spec in it
> does not have full uri (perhaps the table was created before the
> warehouse.dir was filled in?)
>
>
>
> some of these issues were fixed in a jira fixed by Prasad a couple of days
> back (where the metastore uses the namenode schema/authority by default if
> warehouse.dir is not filled in I think)
>
>
>  ------------------------------
>
> *From:* Aaron Kimball [mailto:aaron@cloudera.com]
> *Sent:* Tuesday, January 20, 2009 8:02 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Re: Error loading data from HDFS into Hive
>
>
>
> I should also add that I have set the hive.metastore.warehouse.dir in
> conf/hive-default.xml to include the full URI to /user/hive/warehouse (Same
> HDFS host)
>
> - Aaron
>
> On Tue, Jan 20, 2009 at 7:56 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
> Hi all,
>
> I generated some data using a MapReduce process (output in HDFS) and want
> to play with it in Hive.
>
> The output is a set of part-nnnn files in a directory.
>
> I created the table and tried to run the statement:
> LOAD DATA INPATH 'hdfs://
> namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000' INTO
> TABLE ip_locations;
>
> It failed with:
>
> FAILED: Error in semantic analysis: Path is not legal 'hdfs://
> namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000':
> Cannot load data across filesystems, use load data local
> Time taken: 1.51 seconds
>
> The fully-qualified hdfs URI I am using matches exactly against the
> fs.default.name in $HADOOP_HOME/conf/hadoop-site.xml. Can anyone suggest
> what I might be doing wrong, or where I should look for more information?
>
> I also tried just using "/user/aaron/ip_addr_tables/0/part-00000" and
> "ip_addr_tables/0/part-00000" instead.
>
> Thanks,
> - Aaron
>
>
>

RE: Error loading data from HDFS into Hive

Posted by Joydeep Sen Sarma <js...@facebook.com>.
Can u do a describe extended on the ip_locations table?

it will have a location string. It's possible that the location spec in it does not have full uri (perhaps the table was created before the warehouse.dir was filled in?)

some of these issues were fixed in a jira fixed by Prasad a couple of days back (where the metastore uses the namenode schema/authority by default if warehouse.dir is not filled in I think)

________________________________
From: Aaron Kimball [mailto:aaron@cloudera.com]
Sent: Tuesday, January 20, 2009 8:02 PM
To: hive-user@hadoop.apache.org
Subject: Re: Error loading data from HDFS into Hive

I should also add that I have set the hive.metastore.warehouse.dir in conf/hive-default.xml to include the full URI to /user/hive/warehouse (Same HDFS host)

- Aaron
On Tue, Jan 20, 2009 at 7:56 PM, Aaron Kimball <aa...@cloudera.com>> wrote:
Hi all,

I generated some data using a MapReduce process (output in HDFS) and want to play with it in Hive.

The output is a set of part-nnnn files in a directory.

I created the table and tried to run the statement:
LOAD DATA INPATH 'hdfs://namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000<http://namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000>' INTO TABLE ip_locations;

It failed with:

FAILED: Error in semantic analysis: Path is not legal 'hdfs://namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000<http://namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000>': Cannot load data across filesystems, use load data local
Time taken: 1.51 seconds

The fully-qualified hdfs URI I am using matches exactly against the fs.default.name<http://fs.default.name> in $HADOOP_HOME/conf/hadoop-site.xml. Can anyone suggest what I might be doing wrong, or where I should look for more information?

I also tried just using "/user/aaron/ip_addr_tables/0/part-00000" and "ip_addr_tables/0/part-00000" instead.

Thanks,
- Aaron


Re: Error loading data from HDFS into Hive

Posted by Aaron Kimball <aa...@cloudera.com>.
I should also add that I have set the hive.metastore.warehouse.dir in
conf/hive-default.xml to include the full URI to /user/hive/warehouse (Same
HDFS host)

- Aaron

On Tue, Jan 20, 2009 at 7:56 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> Hi all,
>
> I generated some data using a MapReduce process (output in HDFS) and want
> to play with it in Hive.
>
> The output is a set of part-nnnn files in a directory.
>
> I created the table and tried to run the statement:
> LOAD DATA INPATH 'hdfs://
> namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000' INTO
> TABLE ip_locations;
>
> It failed with:
>
> FAILED: Error in semantic analysis: Path is not legal 'hdfs://
> namenode.server.addr.com:9000/user/aaron/ip_addr_tables/0/part-00000':
> Cannot load data across filesystems, use load data local
> Time taken: 1.51 seconds
>
> The fully-qualified hdfs URI I am using matches exactly against the
> fs.default.name in $HADOOP_HOME/conf/hadoop-site.xml. Can anyone suggest
> what I might be doing wrong, or where I should look for more information?
>
> I also tried just using "/user/aaron/ip_addr_tables/0/part-00000" and
> "ip_addr_tables/0/part-00000" instead.
>
> Thanks,
> - Aaron
>