You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Vijay <te...@gmail.com> on 2009/07/27 22:24:24 UTC

I'm having to do LOAD DATA LOCAL INPATH two times to add data

Hi,

I'm pretty new to hadoop/hive. I have everything running pretty good on a
single server. I have a simple table defined with hive for access logs and
was trying to import log files with the LOAD DATA LOCAL INPATH command.
Here's how my command looks like:

LOAD DATA LOCAL INPATH '../test/test.log' INTO TABLE accesslog PARTITION
(dt='2009-07-14');

For some reason I'm having to execute this command twice in order for the
log file to show up in HDFS. When I first issue this command, a directory
with 'dt=2009-07-14' gets created but there is nothing under it. Then I
issue the command second time and the file test.log gets uploaded. The data
is NOT inserted twice. Hive does not give any error the first time either.
The output is the same both times:

Copying data from file:../test/test.log
Loading data to table accesslog partition {dt=2009-07-15}

Can anybody tell what the problem might be? I hope I'm being clear.

Thanks,
Vijay

Re: I'm having to do LOAD DATA LOCAL INPATH two times to add data

Posted by Zheng Shao <zs...@gmail.com>.
Hi Vijay,

What version of Hive are you using?
Can you attach /tmp/<your_unix_user>/hive.log so we can see what might
be happening?

Zheng

On Mon, Jul 27, 2009 at 1:24 PM, Vijay<te...@gmail.com> wrote:
> Hi,
>
> I'm pretty new to hadoop/hive. I have everything running pretty good on a
> single server. I have a simple table defined with hive for access logs and
> was trying to import log files with the LOAD DATA LOCAL INPATH command.
> Here's how my command looks like:
>
> LOAD DATA LOCAL INPATH '../test/test.log' INTO TABLE accesslog PARTITION
> (dt='2009-07-14');
>
> For some reason I'm having to execute this command twice in order for the
> log file to show up in HDFS. When I first issue this command, a directory
> with 'dt=2009-07-14' gets created but there is nothing under it. Then I
> issue the command second time and the file test.log gets uploaded. The data
> is NOT inserted twice. Hive does not give any error the first time either.
> The output is the same both times:
>
> Copying data from file:../test/test.log
> Loading data to table accesslog partition {dt=2009-07-15}
>
> Can anybody tell what the problem might be? I hope I'm being clear.
>
> Thanks,
> Vijay
>



-- 
Yours,
Zheng