You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Johan Oskarsson <jo...@oskarsson.nu> on 2009/02/24 13:34:11 UTC

Lines terminated by

I've been trying to use a text file with the field separator \001 and
line separator \002\n in Hive, similar to what's described here
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL.

I've set the "lines terminated by" to \002 but when selecting data the
last column still includes that character, if the last col is a string.
If it's an int the value fails to parse and is left as null.

Is this a known issue? I can't find a ticket for it.
I assume the TextInputFormat takes care of the \n so that I only need to
use \002 as the termination.

Example create queries

create table artistsong (export_date bigint, artist_id int, song_id int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' lines terminated by '\002'

and this one:

create table artist (export_date bigint, artist_id int, name string,
is_actual_artist int, view_url string) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\001' lines terminated by '\002'

/Johan

Re: Lines terminated by

Posted by Zheng Shao <zs...@gmail.com>.

I don't think there is one for TextInputFormat using custom line separator
right now.

BTW, the current logic in TextInputFormat for link breaks is pretty
complicated because it needs to deal with "\n", "\r\n", "\n\r" etc.
Of course that does not preclude us from adding a custom line separator but
it may make the solution a little more nonobvious.

Zheng

On Tue, Feb 24, 2009 at 8:24 AM, Johan Oskarsson <jo...@oskarsson.nu> wrote:

> Ok, I'll create a Hive jira for it then. I can't find a Hadoop one for
> adding custom line separators in the TextInputFormat either, if there is
> none I'll create that too.
>
> /Johan
>
> Zheng Shao wrote:
> > Yes this is a known issue. "lines terminated by" is not supported yet
> > because the text input format do not allow configurable line
> > separators yet.
> >
> >
> > Zheng
> >
> >
> > On 2/24/09, Johan Oskarsson <jo...@oskarsson.nu> wrote:
> >> I've been trying to use a text file with the field separator \001 and
> >> line separator \002\n in Hive, similar to what's described here
> >> http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL.
> >>
> >> I've set the "lines terminated by" to \002 but when selecting data the
> >> last column still includes that character, if the last col is a string.
> >> If it's an int the value fails to parse and is left as null.
> >>
> >> Is this a known issue? I can't find a ticket for it.
> >> I assume the TextInputFormat takes care of the \n so that I only need to
> >> use \002 as the termination.
> >>
> >> Example create queries
> >>
> >> create table artistsong (export_date bigint, artist_id int, song_id int)
> >> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' lines terminated by
> '\002'
> >>
> >> and this one:
> >>
> >> create table artist (export_date bigint, artist_id int, name string,
> >> is_actual_artist int, view_url string) ROW FORMAT DELIMITED FIELDS
> >> TERMINATED BY '\001' lines terminated by '\002'
> >>
> >> /Johan
> >>
> >
>
>


-- 
Yours,
Zheng

Re: Lines terminated by

Posted by Johan Oskarsson <jo...@oskarsson.nu>.

Ok, I'll create a Hive jira for it then. I can't find a Hadoop one for
adding custom line separators in the TextInputFormat either, if there is
none I'll create that too.

/Johan

Zheng Shao wrote:
> Yes this is a known issue. "lines terminated by" is not supported yet
> because the text input format do not allow configurable line
> separators yet.
> 
> 
> Zheng
> 
> 
> On 2/24/09, Johan Oskarsson <jo...@oskarsson.nu> wrote:
>> I've been trying to use a text file with the field separator \001 and
>> line separator \002\n in Hive, similar to what's described here
>> http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL.
>>
>> I've set the "lines terminated by" to \002 but when selecting data the
>> last column still includes that character, if the last col is a string.
>> If it's an int the value fails to parse and is left as null.
>>
>> Is this a known issue? I can't find a ticket for it.
>> I assume the TextInputFormat takes care of the \n so that I only need to
>> use \002 as the termination.
>>
>> Example create queries
>>
>> create table artistsong (export_date bigint, artist_id int, song_id int)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' lines terminated by '\002'
>>
>> and this one:
>>
>> create table artist (export_date bigint, artist_id int, name string,
>> is_actual_artist int, view_url string) ROW FORMAT DELIMITED FIELDS
>> TERMINATED BY '\001' lines terminated by '\002'
>>
>> /Johan
>>
>

Re: Lines terminated by

Posted by Zheng Shao <zs...@gmail.com>.

Yes this is a known issue. "lines terminated by" is not supported yet
because the text input format do not allow configurable line
separators yet.


Zheng


On 2/24/09, Johan Oskarsson <jo...@oskarsson.nu> wrote:
> I've been trying to use a text file with the field separator \001 and
> line separator \002\n in Hive, similar to what's described here
> http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL.
>
> I've set the "lines terminated by" to \002 but when selecting data the
> last column still includes that character, if the last col is a string.
> If it's an int the value fails to parse and is left as null.
>
> Is this a known issue? I can't find a ticket for it.
> I assume the TextInputFormat takes care of the \n so that I only need to
> use \002 as the termination.
>
> Example create queries
>
> create table artistsong (export_date bigint, artist_id int, song_id int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' lines terminated by '\002'
>
> and this one:
>
> create table artist (export_date bigint, artist_id int, name string,
> is_actual_artist int, view_url string) ROW FORMAT DELIMITED FIELDS
> TERMINATED BY '\001' lines terminated by '\002'
>
> /Johan
>

-- 
Sent from Gmail for mobile | mobile.google.com

Yours,
Zheng