You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Richard <co...@163.com> on 2012/05/21 11:44:14 UTC
user define data format
Hi, I want to use Hive on some data in the following format:
<doc>\0x01
field1=val1\0x01
field2=val2\0x01
...
</doc>\0x01
the lines between <doc> and </doc> are a record. How should I define the table?
thanks.
Richard
Re: user define data format
Posted by Edward Capriolo <ed...@gmail.com>.
A crafty trick would be to use streaming and only emit data once you
see the end tag as a pre-processing step.
On Tue, May 22, 2012 at 12:10 PM, Mark Grover <mg...@oanda.com> wrote:
> Hi Richard,
> What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily.
>
> Mark
>
>
> ----- Original Message -----
> From: "Bejoy Ks" <be...@yahoo.com>
> To: user@hive.apache.org
> Sent: Monday, May 21, 2012 7:22:58 AM
> Subject: Re: user define data format
>
>
>
> Hi Richard
>
>
> In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.
>
>
> Regards
> Bejoy KS
>
>
>
>
>
> From: Richard <co...@163.com>
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Sent: Monday, May 21, 2012 3:14 PM
> Subject: user define data format
>
>
>
> Hi, I want to use Hive on some data in the following format:
> <doc>\0x01
> field1=val1\0x01
> field2=val2\0x01
> ...
> </doc>\0x01
>
> the lines between <doc> and </doc> are a record. How should I define the table?
>
> thanks.
> Richard
>
>
>
>
Re: user define data format
Posted by Mark Grover <mg...@oanda.com>.
Hi Richard,
What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily.
Mark
----- Original Message -----
From: "Bejoy Ks" <be...@yahoo.com>
To: user@hive.apache.org
Sent: Monday, May 21, 2012 7:22:58 AM
Subject: Re: user define data format
Hi Richard
In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.
Regards
Bejoy KS
From: Richard <co...@163.com>
To: "user@hive.apache.org" <us...@hive.apache.org>
Sent: Monday, May 21, 2012 3:14 PM
Subject: user define data format
Hi, I want to use Hive on some data in the following format:
<doc>\0x01
field1=val1\0x01
field2=val2\0x01
...
</doc>\0x01
the lines between <doc> and </doc> are a record. How should I define the table?
thanks.
Richard
Re: user define data format
Posted by Bejoy Ks <be...@yahoo.com>.
Hi Richard
In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.
Regards
Bejoy KS
________________________________
From: Richard <co...@163.com>
To: "user@hive.apache.org" <us...@hive.apache.org>
Sent: Monday, May 21, 2012 3:14 PM
Subject: user define data format
Hi, I want to use Hive on some data in the following format:
<doc>\0x01
field1=val1\0x01
field2=val2\0x01
...
</doc>\0x01
the lines between <doc> and </doc> are a record. How should I define the table?
thanks.
Richard