You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by iwannaplay games <fu...@gmail.com> on 2012/11/20 08:59:38 UTC

populating xml data in hive

Hi All,

I have a csv file ( separated by |) where data is like

id               data
                                       date
1            apple
                                  24-nov-2011
2            mango
                                26-nov-2011
3            <?xml version="1.0" encoding="utf-8"?>
                 <a>fruits</a>
                                28-nov-2011
4             papaya
                                 30-nov-2011


Since id=3 has new line in data field hive  takes only first
line and treats second line as different row.I want my full xml field
to be taken inside data in hive table .

it seems hive doesnt support            lines terminated by '|'

How to treat xml data in hive

Thanks & Regards
Prabhjot

Re: populating xml data in hive

Posted by Nitin Pawar <ni...@gmail.com>.
You can simply write a mapreduce job which will do the job for you
That will be readily available for hive table
On Nov 20, 2012 2:29 PM, "iwannaplay games" <fu...@gmail.com>
wrote:

> How to preprocess data where millions of records are there out of
> which only few thousands contain xml data
>
>
> On 11/20/12, Nitin Pawar <ni...@gmail.com> wrote:
> > Hive currently supports only new line as record separator. If you got
> > newline in in column values then you will need to preprocess your data
> and
> > remove new line from column values
> > On Nov 20, 2012 1:30 PM, "iwannaplay games" <fu...@gmail.com>
> > wrote:
> >
> >> Hi All,
> >>
> >> I have a csv file ( separated by |) where data is like
> >>
> >> id               data
> >>                                        date
> >> 1            apple
> >>                                   24-nov-2011
> >> 2            mango
> >>                                 26-nov-2011
> >> 3            <?xml version="1.0" encoding="utf-8"?>
> >>                  <a>fruits</a>
> >>                                 28-nov-2011
> >> 4             papaya
> >>                                  30-nov-2011
> >>
> >>
> >> Since id=3 has new line in data field hive  takes only first
> >> line and treats second line as different row.I want my full xml field
> >> to be taken inside data in hive table .
> >>
> >> it seems hive doesnt support            lines terminated by '|'
> >>
> >> How to treat xml data in hive
> >>
> >> Thanks & Regards
> >> Prabhjot
> >>
> >
>

Re: populating xml data in hive

Posted by Bejoy KS <be...@yahoo.com>.
You can use your custom mapreduce code. Just check the record type and if xml then preprocess to avoid new lines.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: iwannaplay games <fu...@gmail.com>
Date: Tue, 20 Nov 2012 14:29:18 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: populating xml data in hive

How to preprocess data where millions of records are there out of
which only few thousands contain xml data


On 11/20/12, Nitin Pawar <ni...@gmail.com> wrote:
> Hive currently supports only new line as record separator. If you got
> newline in in column values then you will need to preprocess your data and
> remove new line from column values
> On Nov 20, 2012 1:30 PM, "iwannaplay games" <fu...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I have a csv file ( separated by |) where data is like
>>
>> id               data
>>                                        date
>> 1            apple
>>                                   24-nov-2011
>> 2            mango
>>                                 26-nov-2011
>> 3            <?xml version="1.0" encoding="utf-8"?>
>>                  <a>fruits</a>
>>                                 28-nov-2011
>> 4             papaya
>>                                  30-nov-2011
>>
>>
>> Since id=3 has new line in data field hive  takes only first
>> line and treats second line as different row.I want my full xml field
>> to be taken inside data in hive table .
>>
>> it seems hive doesnt support            lines terminated by '|'
>>
>> How to treat xml data in hive
>>
>> Thanks & Regards
>> Prabhjot
>>
>

Re: populating xml data in hive

Posted by iwannaplay games <fu...@gmail.com>.
How to preprocess data where millions of records are there out of
which only few thousands contain xml data


On 11/20/12, Nitin Pawar <ni...@gmail.com> wrote:
> Hive currently supports only new line as record separator. If you got
> newline in in column values then you will need to preprocess your data and
> remove new line from column values
> On Nov 20, 2012 1:30 PM, "iwannaplay games" <fu...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I have a csv file ( separated by |) where data is like
>>
>> id               data
>>                                        date
>> 1            apple
>>                                   24-nov-2011
>> 2            mango
>>                                 26-nov-2011
>> 3            <?xml version="1.0" encoding="utf-8"?>
>>                  <a>fruits</a>
>>                                 28-nov-2011
>> 4             papaya
>>                                  30-nov-2011
>>
>>
>> Since id=3 has new line in data field hive  takes only first
>> line and treats second line as different row.I want my full xml field
>> to be taken inside data in hive table .
>>
>> it seems hive doesnt support            lines terminated by '|'
>>
>> How to treat xml data in hive
>>
>> Thanks & Regards
>> Prabhjot
>>
>

Re: populating xml data in hive

Posted by Nitin Pawar <ni...@gmail.com>.
Hive currently supports only new line as record separator. If you got
newline in in column values then you will need to preprocess your data and
remove new line from column values
On Nov 20, 2012 1:30 PM, "iwannaplay games" <fu...@gmail.com>
wrote:

> Hi All,
>
> I have a csv file ( separated by |) where data is like
>
> id               data
>                                        date
> 1            apple
>                                   24-nov-2011
> 2            mango
>                                 26-nov-2011
> 3            <?xml version="1.0" encoding="utf-8"?>
>                  <a>fruits</a>
>                                 28-nov-2011
> 4             papaya
>                                  30-nov-2011
>
>
> Since id=3 has new line in data field hive  takes only first
> line and treats second line as different row.I want my full xml field
> to be taken inside data in hive table .
>
> it seems hive doesnt support            lines terminated by '|'
>
> How to treat xml data in hive
>
> Thanks & Regards
> Prabhjot
>