You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by vaibhav negi <ss...@gmail.com> on 2010/07/26 09:32:32 UTC

HIVE: How to Load CSV File?

Hi,

Is there some way to load csv file into hive?

Vaibhav Negi

Re: HIVE: How to Load CSV File?

Posted by vaibhav negi <ss...@gmail.com>.
thanks all. Got some idea of it.

Vaibhav Negi


On Tue, Jul 27, 2010 at 12:12 PM, Bennie Schut <bs...@ebuddy.com> wrote:

>  Hi,
>
>
>
> Hdfs can be better compared with something like ext3 then with mysql. You
> can use the "hadoop fs" to look at the files on hdfs just like you would
> look at the "/mysql" dir on ext3. Hdfs internally splits these files into
> chunks of 64M (configurable) and each chunk will end up on the underlying
> linux filesystem.  You can configure the location of these chunks in a
> config file called hdfs-site.xml with a property called "dfs.data.dir" which
> defaults to "${hadoop.tmp.dir}/dfs/data"
>
> I doubt there are many use cases where looking at these individual chunks
> is useful tough.
>
> If you are interested to see how much space something is using use
> something like this:
>
> hadoop fs -du /user/hive/warehouse/
>
>
>
> Just keep in mind if you have a replication factor of 3 on your setup it
> means you are using 3x the physical space the -du command is telling you
> (roughly).
>
>
>
> I hope that helps.
>
>
>
> Bennie.
>
>
>  ------------------------------
>
> *From:* vaibhav negi [mailto:sssssssenator@gmail.com]
> *Sent:* Tuesday, July 27, 2010 8:04 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Re: HIVE: How to Load CSV File?
>
>
>
> Hi ,
>
> By actual physical path , i mean full path in linux / directory. Like for
> mysql, there is /mysql directory .
> Inside it i can see files for individual tables  and also can see what lies
> inside those files.
>
>
>
> Vaibhav Negi
>
>  2010/7/26 Alex Rovner <ar...@contextweb.com>
>
> Hadoop fs -du command will show you the size of the files. What do you mean
> by physical?
>
> Sent from my iPhone
>
>
> On Jul 26, 2010, at 6:43 AM, "vaibhav negi" <ss...@gmail.com>
> wrote:
>
>  Hi,
>
> Hadoop -dfs command show logical path /user/hive/warehouse. How can i see
> where this directory exists physically ?
>
>
>
> Vaibhav Negi
>
>  On Mon, Jul 26, 2010 at 2:45 PM, Amogh Vasekar < <am...@yahoo-inc.com>
> amogh@yahoo-inc.com> wrote:
>
> Hi,
> The default HWI (hive web interface) provides some basic metadata, but
> don’t think file sizes are included. In any case, you can query using the
> common hadoop dfs commands. The default warehouse directory is as set in
> your hive conf xml.
>
> Amogh
>
>
>
>
> On 7/26/10 2:30 PM, "vaibhav negi" < <ht...@gmail.com>
> sssssssenator@gmail.com> wrote:
>
>  Hi,
>
> Thanks amogh.
> How can i browse actual physical location  of hive tables juts like i see
> mysql tables in mysql directory. I want to check actual disk space consumed
> by hive tables.
>
>
>
> Vaibhav Negi
>
>
> On Mon, Jul 26, 2010 at 1:55 PM, Amogh Vasekar <<h...@yahoo-inc.com>
> amogh@yahoo-inc.com> wrote:
>
>  Hi,
> You can create an external table pointing to data already on hdfs and
> specifying the delimiter-
> CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
>                     page_url STRING, referrer_url STRING,
>                     ip STRING COMMENT 'IP Address of the User',
>                     country STRING COMMENT 'country of origination')
>     COMMENT 'This is the staging page view table'
>     ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
>     STORED AS TEXTFILE
>     LOCATION '/user/data/staging/page_view';
>
>  <http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables>
> http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables   for more
>
> HTH,
> Amogh
>
>
>   On 7/26/10 1:02 PM, "vaibhav negi" < <ht...@gmail.com>
> sssssssenator@gmail.com < <ht...@gmail.com>
> http://sssssssenator@gmail.com> > wrote:
>
> Hi,
>
> Is there some way to load csv file into hive?
>
> Vaibhav Negi
>
>
>
>
>
>
>

RE: HIVE: How to Load CSV File?

Posted by Bennie Schut <bs...@ebuddy.com>.
Hi,

Hdfs can be better compared with something like ext3 then with mysql. You can use the "hadoop fs" to look at the files on hdfs just like you would look at the "/mysql" dir on ext3. Hdfs internally splits these files into chunks of 64M (configurable) and each chunk will end up on the underlying linux filesystem.  You can configure the location of these chunks in a config file called hdfs-site.xml with a property called "dfs.data.dir" which defaults to "${hadoop.tmp.dir}/dfs/data"
I doubt there are many use cases where looking at these individual chunks is useful tough.
If you are interested to see how much space something is using use something like this:
hadoop fs -du /user/hive/warehouse/

Just keep in mind if you have a replication factor of 3 on your setup it means you are using 3x the physical space the -du command is telling you (roughly).

I hope that helps.

Bennie.

________________________________
From: vaibhav negi [mailto:sssssssenator@gmail.com]
Sent: Tuesday, July 27, 2010 8:04 AM
To: hive-user@hadoop.apache.org
Subject: Re: HIVE: How to Load CSV File?

Hi ,

By actual physical path , i mean full path in linux / directory. Like for mysql, there is /mysql directory .
Inside it i can see files for individual tables  and also can see what lies inside those files.



Vaibhav Negi

2010/7/26 Alex Rovner <ar...@contextweb.com>>
Hadoop fs -du command will show you the size of the files. What do you mean by physical?

Sent from my iPhone

On Jul 26, 2010, at 6:43 AM, "vaibhav negi" <ss...@gmail.com>> wrote:
Hi,

Hadoop -dfs command show logical path /user/hive/warehouse. How can i see where this directory exists physically ?



Vaibhav Negi

On Mon, Jul 26, 2010 at 2:45 PM, Amogh Vasekar <<m...@yahoo-inc.com>> wrote:
Hi,
The default HWI (hive web interface) provides some basic metadata, but don't think file sizes are included. In any case, you can query using the common hadoop dfs commands. The default warehouse directory is as set in your hive conf xml.

Amogh



On 7/26/10 2:30 PM, "vaibhav negi" <<h...@gmail.com>> wrote:
Hi,

Thanks amogh.
How can i browse actual physical location  of hive tables juts like i see mysql tables in mysql directory. I want to check actual disk space consumed by hive tables.



Vaibhav Negi


On Mon, Jul 26, 2010 at 1:55 PM, Amogh Vasekar <<h...@yahoo-inc.com>> wrote:
Hi,
You can create an external table pointing to data already on hdfs and specifying the delimiter-
CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
                    page_url STRING, referrer_url STRING,
                    ip STRING COMMENT 'IP Address of the User',
                    country STRING COMMENT 'country of origination')
    COMMENT 'This is the staging page view table'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
    STORED AS TEXTFILE
    LOCATION '/user/data/staging/page_view';

<http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables>http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables   for more

HTH,
Amogh


On 7/26/10 1:02 PM, "vaibhav negi" <<h...@gmail.com> <<h...@gmail.com> > wrote:
Hi,

Is there some way to load csv file into hive?

Vaibhav Negi




Re: HIVE: How to Load CSV File?

Posted by bc Wong <bc...@cloudera.com>.
On Mon, Jul 26, 2010 at 11:04 PM, vaibhav negi <ss...@gmail.com> wrote:
> Hi ,
>
> By actual physical path , i mean full path in linux / directory. Like for
> mysql, there is /mysql directory .
> Inside it i can see files for individual tables  and also can see what lies
> inside those files.

You can take a look at Beeswax. There's a video demo here:
<http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3b2-hue/>
It easily links the table back to the HDFS browser and you can see
file sizes and more.

Cheers,
-- 
bc Wong
Cloudera Software Engineer

Re: HIVE: How to Load CSV File?

Posted by vaibhav negi <ss...@gmail.com>.
Hi ,

By actual physical path , i mean full path in linux / directory. Like for
mysql, there is /mysql directory .
Inside it i can see files for individual tables  and also can see what lies
inside those files.



Vaibhav Negi


2010/7/26 Alex Rovner <ar...@contextweb.com>

> Hadoop fs -du command will show you the size of the files. What do you mean
> by physical?
>
> Sent from my iPhone
>
> On Jul 26, 2010, at 6:43 AM, "vaibhav negi" <ss...@gmail.com>
> wrote:
>
> Hi,
>
> Hadoop -dfs command show logical path /user/hive/warehouse. How can i see
> where this directory exists physically ?
>
>
>
> Vaibhav Negi
>
>
> On Mon, Jul 26, 2010 at 2:45 PM, Amogh Vasekar < <am...@yahoo-inc.com>
> amogh@yahoo-inc.com> wrote:
>
>>  Hi,
>> The default HWI (hive web interface) provides some basic metadata, but
>> don’t think file sizes are included. In any case, you can query using the
>> common hadoop dfs commands. The default warehouse directory is as set in
>> your hive conf xml.
>>
>> Amogh
>>
>>
>>
>> On 7/26/10 2:30 PM, "vaibhav negi" < <ht...@gmail.com>
>> sssssssenator@gmail.com> wrote:
>>
>> Hi,
>>
>> Thanks amogh.
>> How can i browse actual physical location  of hive tables juts like i see
>> mysql tables in mysql directory. I want to check actual disk space consumed
>> by hive tables.
>>
>>
>>
>> Vaibhav Negi
>>
>>
>> On Mon, Jul 26, 2010 at 1:55 PM, Amogh Vasekar <<h...@yahoo-inc.com>
>> amogh@yahoo-inc.com> wrote:
>>
>> Hi,
>> You can create an external table pointing to data already on hdfs and
>> specifying the delimiter-
>> CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
>>                     page_url STRING, referrer_url STRING,
>>                     ip STRING COMMENT 'IP Address of the User',
>>                     country STRING COMMENT 'country of origination')
>>     COMMENT 'This is the staging page view table'
>>     ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY
>> '12'
>>     STORED AS TEXTFILE
>>     LOCATION '/user/data/staging/page_view';
>>
>>  <http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables>
>> http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables   for more
>>
>> HTH,
>> Amogh
>>
>>
>>
>> On 7/26/10 1:02 PM, "vaibhav negi" < <ht...@gmail.com>
>> sssssssenator@gmail.com < <ht...@gmail.com>
>> http://sssssssenator@gmail.com> > wrote:
>>
>> Hi,
>>
>> Is there some way to load csv file into hive?
>>
>> Vaibhav Negi
>>
>>
>>
>>
>

Re: HIVE: How to Load CSV File?

Posted by Alex Rovner <ar...@contextweb.com>.
Hadoop fs -du command will show you the size of the files. What do you mean by physical?

Sent from my iPhone

On Jul 26, 2010, at 6:43 AM, "vaibhav negi" <ss...@gmail.com> wrote:

> Hi,
> 
> Hadoop -dfs command show logical path /user/hive/warehouse. How can i see where this directory exists physically ?
> 
>  
> 
> Vaibhav Negi
> 
> 
> On Mon, Jul 26, 2010 at 2:45 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
> Hi,
> The default HWI (hive web interface) provides some basic metadata, but don’t think file sizes are included. In any case, you can query using the common hadoop dfs commands. The default warehouse directory is as set in your hive conf xml.
> 
> Amogh
> 
> 
> 
> On 7/26/10 2:30 PM, "vaibhav negi" <ss...@gmail.com> wrote:
> 
> Hi,
> 
> Thanks amogh.
> How can i browse actual physical location  of hive tables juts like i see mysql tables in mysql directory. I want to check actual disk space consumed by hive tables.
> 
> 
> 
> Vaibhav Negi
> 
> 
> On Mon, Jul 26, 2010 at 1:55 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
> Hi,
> You can create an external table pointing to data already on hdfs and specifying the delimiter-
> CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
>                     page_url STRING, referrer_url STRING,
>                     ip STRING COMMENT 'IP Address of the User',
>                     country STRING COMMENT 'country of origination')
>     COMMENT 'This is the staging page view table'
>     ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
>     STORED AS TEXTFILE
>     LOCATION '/user/data/staging/page_view';
> 
> http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables   for more
> 
> HTH,
> Amogh
> 
> 
> 
> On 7/26/10 1:02 PM, "vaibhav negi" <sssssssenator@gmail.com <ht...@gmail.com> > wrote:
> 
> Hi,
> 
> Is there some way to load csv file into hive? 
> 
> Vaibhav Negi
> 
> 
> 
> 

Re: HIVE: How to Load CSV File?

Posted by vaibhav negi <ss...@gmail.com>.
Hi,

Hadoop -dfs command show logical path /user/hive/warehouse. How can i see
where this directory exists physically ?



Vaibhav Negi


On Mon, Jul 26, 2010 at 2:45 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

>  Hi,
> The default HWI (hive web interface) provides some basic metadata, but
> don’t think file sizes are included. In any case, you can query using the
> common hadoop dfs commands. The default warehouse directory is as set in
> your hive conf xml.
>
> Amogh
>
>
>
> On 7/26/10 2:30 PM, "vaibhav negi" <ss...@gmail.com> wrote:
>
> Hi,
>
> Thanks amogh.
> How can i browse actual physical location  of hive tables juts like i see
> mysql tables in mysql directory. I want to check actual disk space consumed
> by hive tables.
>
>
>
> Vaibhav Negi
>
>
> On Mon, Jul 26, 2010 at 1:55 PM, Amogh Vasekar <am...@yahoo-inc.com>
> wrote:
>
> Hi,
> You can create an external table pointing to data already on hdfs and
> specifying the delimiter-
> CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
>                     page_url STRING, referrer_url STRING,
>                     ip STRING COMMENT 'IP Address of the User',
>                     country STRING COMMENT 'country of origination')
>     COMMENT 'This is the staging page view table'
>     ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
>     STORED AS TEXTFILE
>     LOCATION '/user/data/staging/page_view';
>
> http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables   for more
>
> HTH,
> Amogh
>
>
>
> On 7/26/10 1:02 PM, "vaibhav negi" <sssssssenator@gmail.com <
> http://sssssssenator@gmail.com> > wrote:
>
> Hi,
>
> Is there some way to load csv file into hive?
>
> Vaibhav Negi
>
>
>
>

Re: HIVE: How to Load CSV File?

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
The default HWI (hive web interface) provides some basic metadata, but don't think file sizes are included. In any case, you can query using the common hadoop dfs commands. The default warehouse directory is as set in your hive conf xml.

Amogh


On 7/26/10 2:30 PM, "vaibhav negi" <ss...@gmail.com> wrote:

Hi,

Thanks amogh.
How can i browse actual physical location  of hive tables juts like i see mysql tables in mysql directory. I want to check actual disk space consumed by hive tables.



Vaibhav Negi


On Mon, Jul 26, 2010 at 1:55 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
Hi,
You can create an external table pointing to data already on hdfs and specifying the delimiter-
CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
                    page_url STRING, referrer_url STRING,
                    ip STRING COMMENT 'IP Address of the User',
                    country STRING COMMENT 'country of origination')
    COMMENT 'This is the staging page view table'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
    STORED AS TEXTFILE
    LOCATION '/user/data/staging/page_view';

http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables   for more

HTH,
Amogh



On 7/26/10 1:02 PM, "vaibhav negi" <sssssssenator@gmail.com <ht...@gmail.com> > wrote:

Hi,

Is there some way to load csv file into hive?

Vaibhav Negi




Re: HIVE: How to Load CSV File?

Posted by vaibhav negi <ss...@gmail.com>.
Hi,

Thanks amogh.
How can i browse actual physical location  of hive tables juts like i see
mysql tables in mysql directory. I want to check actual disk space consumed
by hive tables.



Vaibhav Negi


On Mon, Jul 26, 2010 at 1:55 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

>  Hi,
> You can create an external table pointing to data already on hdfs and
> specifying the delimiter-
> CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
>                     page_url STRING, referrer_url STRING,
>                     ip STRING COMMENT 'IP Address of the User',
>                     country STRING COMMENT 'country of origination')
>     COMMENT 'This is the staging page view table'
>     ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
>     STORED AS TEXTFILE
>     LOCATION '/user/data/staging/page_view';
>
> http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables   for more
>
> HTH,
> Amogh
>
>
>
> On 7/26/10 1:02 PM, "vaibhav negi" <ss...@gmail.com> wrote:
>
> Hi,
>
> Is there some way to load csv file into hive?
>
> Vaibhav Negi
>
>

Re: HIVE: How to Load CSV File?

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
You can create an external table pointing to data already on hdfs and specifying the delimiter-
CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
                    page_url STRING, referrer_url STRING,
                    ip STRING COMMENT 'IP Address of the User',
                    country STRING COMMENT 'country of origination')
    COMMENT 'This is the staging page view table'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
    STORED AS TEXTFILE
    LOCATION '/user/data/staging/page_view';

http://wiki.apache.org/hadoop/Hive/Tutorial#Creating_Tables   for more

HTH,
Amogh


On 7/26/10 1:02 PM, "vaibhav negi" <ss...@gmail.com> wrote:

Hi,

Is there some way to load csv file into hive?

Vaibhav Negi