You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Richard <co...@163.com> on 2012/05/21 11:44:14 UTC

user define data format

 Hi, I want to use Hive on some data in the following format:
<doc>\0x01
field1=val1\0x01
field2=val2\0x01
...
</doc>\0x01

the lines between <doc> and </doc> are a record. How should I define the table?

thanks.
Richard

Re: user define data format

Posted by Edward Capriolo <ed...@gmail.com>.
A crafty trick would be to use streaming and only emit data once you
see the end tag as a pre-processing step.

On Tue, May 22, 2012 at 12:10 PM, Mark Grover <mg...@oanda.com> wrote:
> Hi Richard,
> What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily.
>
> Mark
>
>
> ----- Original Message -----
> From: "Bejoy Ks" <be...@yahoo.com>
> To: user@hive.apache.org
> Sent: Monday, May 21, 2012 7:22:58 AM
> Subject: Re: user define data format
>
>
>
> Hi Richard
>
>
> In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.
>
>
> Regards
> Bejoy KS
>
>
>
>
>
> From: Richard <co...@163.com>
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Sent: Monday, May 21, 2012 3:14 PM
> Subject: user define data format
>
>
>
> Hi, I want to use Hive on some data in the following format:
> <doc>\0x01
> field1=val1\0x01
> field2=val2\0x01
> ...
> </doc>\0x01
>
> the lines between <doc> and </doc> are a record. How should I define the table?
>
> thanks.
> Richard
>
>
>
>

Re: user define data format

Posted by Mark Grover <mg...@oanda.com>.
Hi Richard,
What Bejoy said is correct. However, another way to get around it would be pre-process your data between <doc> and </doc> to not contain any newlines. Then, you should be able to treat that data as string and parse it out relatively easily.

Mark


----- Original Message -----
From: "Bejoy Ks" <be...@yahoo.com>
To: user@hive.apache.org
Sent: Monday, May 21, 2012 7:22:58 AM
Subject: Re: user define data format



Hi Richard 


In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data. 


Regards 
Bejoy KS 





From: Richard <co...@163.com> 
To: "user@hive.apache.org" <us...@hive.apache.org> 
Sent: Monday, May 21, 2012 3:14 PM 
Subject: user define data format 



Hi, I want to use Hive on some data in the following format: 
<doc>\0x01 
field1=val1\0x01 
field2=val2\0x01 
... 
</doc>\0x01 

the lines between <doc> and </doc> are a record. How should I define the table? 

thanks. 
Richard 





Re: user define data format

Posted by Bejoy Ks <be...@yahoo.com>.
Hi Richard

      In hive the default record delimiter is the next line character. In your sample data set, a single row/record is spread across multiple lines. AFAIK The only possible option here is to write a custom serde for your data.

Regards
Bejoy KS


________________________________
 From: Richard <co...@163.com>
To: "user@hive.apache.org" <us...@hive.apache.org> 
Sent: Monday, May 21, 2012 3:14 PM
Subject: user define data format
 

 Hi, I want to use Hive on some data in the following format:
<doc>\0x01
field1=val1\0x01
field2=val2\0x01
...
</doc>\0x01

the lines between <doc> and </doc> are a record. How should I define the table?

thanks.
Richard