You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Mark <st...@gmail.com> on 2010/12/16 00:33:34 UTC

Tables and importing

Can someone explain what actually happens when you create a table and 
import data into using "LOAD DATA INPATH..."

I noticed that when I load the data from files already existing in HDFS 
that it actually removes the original file from its location and moves 
it under the /user/hive directory. Is there anyway I can prevent this 
from happening or is this just the way things work? At this point is the 
file modified in anyway? I have some other Hadoop jobs that rely on this 
data. Should I just update those jobs to operate on the data within 
these directories? Thanks

Re: Tables and importing

Posted by Viral Bajaria <vi...@gmail.com>.

I had another question about the LOAD DATA LOCAL INPATH

it does a exec.CopyTask first and then does a exec.MoveTask but I think it
is leaving a file handle open because when I run a thread which bulk inserts
data into partitions the open file count for the user under which my hive
thrift server runs keeps on increasing till it reaches the max after which i
start getting "connection refused".

I am currently still on hive 0.5.0 and have checked out the svn repository
but can't exactly figure out the location where the files are being left
open.

anyone aware of similar problems ?

Thanks
Viral
On Wed, Dec 15, 2010 at 3:51 PM, Leo Alekseyev <dn...@gmail.com> wrote:

> You can use CREATE EXTERNAL TABLE... LOCATION.
>
> See http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL and examples
> thereof.  When you LOAD DATA INPATH, the directory gets moved to the
> Hive warehouse dir; it does not get modified.
>
> On Wed, Dec 15, 2010 at 3:33 PM, Mark <st...@gmail.com> wrote:
> > Can someone explain what actually happens when you create a table and
> import
> > data into using "LOAD DATA INPATH..."
> >
> > I noticed that when I load the data from files already existing in HDFS
> that
> > it actually removes the original file from its location and moves it
> under
> > the /user/hive directory. Is there anyway I can prevent this from
> happening
> > or is this just the way things work? At this point is the file modified
> in
> > anyway? I have some other Hadoop jobs that rely on this data. Should I
> just
> > update those jobs to operate on the data within these directories? Thanks
> >
>

Re: Tables and importing

Posted by Leo Alekseyev <dn...@gmail.com>.

You can use CREATE EXTERNAL TABLE... LOCATION.

See http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL and examples
thereof.  When you LOAD DATA INPATH, the directory gets moved to the
Hive warehouse dir; it does not get modified.

On Wed, Dec 15, 2010 at 3:33 PM, Mark <st...@gmail.com> wrote:
> Can someone explain what actually happens when you create a table and import
> data into using "LOAD DATA INPATH..."
>
> I noticed that when I load the data from files already existing in HDFS that
> it actually removes the original file from its location and moves it under
> the /user/hive directory. Is there anyway I can prevent this from happening
> or is this just the way things work? At this point is the file modified in
> anyway? I have some other Hadoop jobs that rely on this data. Should I just
> update those jobs to operate on the data within these directories? Thanks
>