You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by 周梦想 <ab...@gmail.com> on 2013/03/11 05:04:53 UTC
how to handle variable format data of text file?
I have files like this:
03/11/13 10:59:52 00000ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11
0:2775
03/11/13 10:59:52 00000744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
the format is separated by blank space:
date time threadid gid userid [variable formated data grouped by fields
separated by space ]
I'd like to create a table like:
hive> create external table handresult (hdate string,htime string, thid
string, gid int, userid string,ldata string) row format delimited fields
terminated by " ";
OK
but the above table will only have a part of the data.
select * from handresult;
03/11/13 10:59:52 00000ec0 1009 180538126 92041
03/11/13 10:59:52 00000744 1010 178343610 92042
the remain data like "2300 0 0 7 21|47|20|33|11 0:2775 " I can't get.
while ldata may be variance length and format separated by " " or an array,
the ldata we will parse diferent by each gid.
how do this?
Thanks,
Andy Zhou
Re: how to handle variable format data of text file?
Posted by Ramki Palle <ra...@gmail.com>.
One way you can try is to make your ldata as a map field as it contains
variable formatted data and write a UDF to get whatever information you
need get.
Regards,
Ramki.
On Mon, Mar 18, 2013 at 1:23 AM, Zhiwen Sun <pe...@gmail.com> wrote:
> As u defined in create table hql: fields delimited by blank space. So, the
> other data is omitted
>
> if you wanna contain rest data at the end of line. I suggest you use
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe row format instead of
> default delimited format.
>
>
> Zhiwen Sun
>
>
>
> On Mon, Mar 11, 2013 at 12:04 PM, 周梦想 <ab...@gmail.com> wrote:
>
>> I have files like this:
>> 03/11/13 10:59:52 00000ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11
>> 0:2775
>> 03/11/13 10:59:52 00000744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
>> the format is separated by blank space:
>> date time threadid gid userid [variable formated data grouped by fields
>> separated by space ]
>>
>> I'd like to create a table like:
>>
>> hive> create external table handresult (hdate string,htime string, thid
>> string, gid int, userid string,ldata string) row format delimited fields
>> terminated by " ";
>> OK
>>
>> but the above table will only have a part of the data.
>> select * from handresult;
>> 03/11/13 10:59:52 00000ec0 1009 180538126 92041
>> 03/11/13 10:59:52 00000744 1010 178343610 92042
>>
>> the remain data like "2300 0 0 7 21|47|20|33|11 0:2775 " I can't get.
>>
>> while ldata may be variance length and format separated by " " or an
>> array, the ldata we will parse diferent by each gid.
>>
>> how do this?
>>
>> Thanks,
>> Andy Zhou
>>
>
>
Re: how to handle variable format data of text file?
Posted by Zhiwen Sun <pe...@gmail.com>.
As u defined in create table hql: fields delimited by blank space. So, the
other data is omitted
if you wanna contain rest data at the end of line. I suggest you use
org.apache.hadoop.hive.contrib.serde2.RegexSerDe row format instead of
default delimited format.
Zhiwen Sun
On Mon, Mar 11, 2013 at 12:04 PM, 周梦想 <ab...@gmail.com> wrote:
> I have files like this:
> 03/11/13 10:59:52 00000ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11
> 0:2775
> 03/11/13 10:59:52 00000744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
> the format is separated by blank space:
> date time threadid gid userid [variable formated data grouped by fields
> separated by space ]
>
> I'd like to create a table like:
>
> hive> create external table handresult (hdate string,htime string, thid
> string, gid int, userid string,ldata string) row format delimited fields
> terminated by " ";
> OK
>
> but the above table will only have a part of the data.
> select * from handresult;
> 03/11/13 10:59:52 00000ec0 1009 180538126 92041
> 03/11/13 10:59:52 00000744 1010 178343610 92042
>
> the remain data like "2300 0 0 7 21|47|20|33|11 0:2775 " I can't get.
>
> while ldata may be variance length and format separated by " " or an
> array, the ldata we will parse diferent by each gid.
>
> how do this?
>
> Thanks,
> Andy Zhou
>