You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Raghu Murthy <rm...@facebook.com> on 2009/05/21 08:37:50 UTC

Re: Can Hive recognize commented out line in data files while loading?

I guess the question was about loading the data. In the load command, we
currently just copy over the data without parsing it (via the CopyTask).

Even though we could choose to neglect commented rows at query time (via
SerDes), its probably more efficient to do it once while loading.

In addition, it would be good to provide more features to the load command
like verifying schema and loading to multiple partitions based on columns in
the rows. Can you file a jira for this? I can take a shot at implementing
these features.

On 5/20/09 11:26 PM, "Zheng Shao" <zs...@gmail.com> wrote:

> The Hive internal Serdes do not allow this format yet. We will need to change
> Hive code to make that happen.
> Specifically, it's the LazySimpleSerDe class.
> 
> Zheng
> 
> On Wed, May 20, 2009 at 11:05 PM, Manhee Jo <jo...@nttdocomo.com> wrote:
>> Is it possible for hive to recognize commented rows in a file when it loads a
>> csv file?
>> 
>> For example, say contents of test.csv is,
>> 
>> #123
>> #Red, Brown, Black, Blue
>> 3, AB, 5, 3
>> 2, AA, 1, 4
>> ...
>> 
>> In hive, how to ignore first two lines while loading?
>> 
>> 
>> Thanks,
>> Manhee 
>> 
> 
>