You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Manhee Jo <jo...@nttdocomo.com> on 2009/05/21 08:05:03 UTC

Can Hive recognize commented out line in data files while loading?

Is it possible for hive to recognize commented rows in a file when it loads 
a csv file?

For example, say contents of test.csv is,

#123
#Red, Brown, Black, Blue
3, AB, 5, 3
2, AA, 1, 4
...

In hive, how to ignore first two lines while loading?


Thanks,
Manhee 



Re: Can Hive recognize commented out line in data files while loading?

Posted by Raghu Murthy <rm...@facebook.com>.
I guess the question was about loading the data. In the load command, we
currently just copy over the data without parsing it (via the CopyTask).

Even though we could choose to neglect commented rows at query time (via
SerDes), its probably more efficient to do it once while loading.

In addition, it would be good to provide more features to the load command
like verifying schema and loading to multiple partitions based on columns in
the rows. Can you file a jira for this? I can take a shot at implementing
these features.

On 5/20/09 11:26 PM, "Zheng Shao" <zs...@gmail.com> wrote:

> The Hive internal Serdes do not allow this format yet. We will need to change
> Hive code to make that happen.
> Specifically, it's the LazySimpleSerDe class.
> 
> Zheng
> 
> On Wed, May 20, 2009 at 11:05 PM, Manhee Jo <jo...@nttdocomo.com> wrote:
>> Is it possible for hive to recognize commented rows in a file when it loads a
>> csv file?
>> 
>> For example, say contents of test.csv is,
>> 
>> #123
>> #Red, Brown, Black, Blue
>> 3, AB, 5, 3
>> 2, AA, 1, 4
>> ...
>> 
>> In hive, how to ignore first two lines while loading?
>> 
>> 
>> Thanks,
>> Manhee 
>> 
> 
> 


Re: Can Hive recognize commented out line in data files while loading?

Posted by Zheng Shao <zs...@gmail.com>.
The Hive internal Serdes do not allow this format yet. We will need to
change Hive code to make that happen.
Specifically, it's the LazySimpleSerDe class.

Zheng

On Wed, May 20, 2009 at 11:05 PM, Manhee Jo <jo...@nttdocomo.com> wrote:

> Is it possible for hive to recognize commented rows in a file when it loads
> a csv file?
>
> For example, say contents of test.csv is,
>
> #123
> #Red, Brown, Black, Blue
> 3, AB, 5, 3
> 2, AA, 1, 4
> ...
>
> In hive, how to ignore first two lines while loading?
>
>
> Thanks,
> Manhee
>
>


-- 
Yours,
Zheng

Re: Can Hive recognize commented out line in data files while loading?

Posted by Prasad Chakka <pc...@facebook.com>.
Hive doesn't do any transformation while loading, atleast not yet. You can load data into a temporary field and then do a 'insert overwrite <tab> select * from <tmp_tab> where <predicate that filters out comments>'


________________________________
From: Manhee Jo <jo...@nttdocomo.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Wed, 20 May 2009 23:05:03 -0700
To: <hi...@hadoop.apache.org>
Subject: Can Hive recognize commented out line in data files while loading?

Is it possible for hive to recognize commented rows in a file when it loads
a csv file?

For example, say contents of test.csv is,

#123
#Red, Brown, Black, Blue
3, AB, 5, 3
2, AA, 1, 4
...

In hive, how to ignore first two lines while loading?


Thanks,
Manhee