You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Himanish Kushary <hi...@gmail.com> on 2012/08/15 17:43:25 UTC

Reducer throwing warning during join operations.Defaulting int columns to 0

Hi,

I have uploaded few csv files from windows into hive and configured few
external tables using them. When I am trying to run a join on two tables
one of the int columns
get changed to 0. The structure of the tables are as follows:


Table-1                                        Table-2
------------                                        -----------

Id(int)                                          id(int)   datetime
eid(int)
--                                                  ----     ------------
   -----
1                                                    1   2011-02-01   3
2                                                    1   2011-03-01   4
3                                                    2   2011-04-01   5
                                                      4   2011-05-01   6
                                                      6   2011-06-01   7


The join query is - select a.* from Table-2 a join Table-1 b on (a.id=b.id);

The output is:

1  2011-02-01   0
1  2011-03-01   0
2  2011-04-01   0


I checked the logs and noticed the following warning : WARN
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct: Extra bytes
detected at the end of the row! Ignoring similar problems.Could this be
causing it ?

When I turn on hive.auto.convert.join=true , the error goes away as there
is no reduce phase.The output is:

1  2011-02-01   3
1  2011-03-01   4
2  2011-04-01   5

Could somebody please help me figure out why we get the wrong results when
running through the reducer.
-- 
Thanks

Re: Reducer throwing warning during join operations.Defaulting int columns to 0

Posted by Himanish Kushary <hi...@gmail.com>.
Hi,

To address this issue , for now I have changed the all my fields in the
external tables to STRING datatype.The joins on external tables are working
fine now. Will try to change the datatype while transforming to Hive
managed table and re-execute the joins on the new tables.

Any other suggestions to handle this issue ?

Thanks

On Wed, Aug 15, 2012 at 1:20 PM, Himanish Kushary <hi...@gmail.com>wrote:

> Thanks Nitin..but to take care of that I had cleaned the csv files of
> leading and trailing spaces before putting into hdfs.Also ran the dos2unix
> command on the csv files.
>
> Only if I define the external table with all fields data type as STRING
> the joins perform properly.Even when load the data initially into a table
> with all STRING fields and at a latter point copy the data to a different
> table with proper data type, the joins give wrong result on the new table
> also.
>
>
> On Wed, Aug 15, 2012 at 1:14 PM, Nitin Pawar <ni...@gmail.com>wrote:
>
>> it might be the case that there are few empty spaces at the end of
>> each row which are being handled when you are reading and writing from
>> disc
>>
>> but when you set autoconvert then looks like one of these  tables is
>> really small and it is converted into mapside join
>> which means the entire table is loaded into map memory and there is no
>> need of reduce
>>
>> On Wed, Aug 15, 2012 at 9:13 PM, Himanish Kushary <hi...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I have uploaded few csv files from windows into hive and configured few
>> > external tables using them. When I am trying to run a join on two
>> tables one
>> > of the int columns
>> > get changed to 0. The structure of the tables are as follows:
>> >
>> >
>> > Table-1                                        Table-2
>> > ------------                                        -----------
>> >
>> > Id(int)                                          id(int)   datetime
>> > eid(int)
>> > --                                                  ----
>> ------------
>> > -----
>> > 1                                                    1   2011-02-01   3
>> > 2                                                    1   2011-03-01   4
>> > 3                                                    2   2011-04-01   5
>> >                                                       4   2011-05-01   6
>> >                                                       6   2011-06-01   7
>> >
>> >
>> > The join query is - select a.* from Table-2 a join Table-1 b on (a.id=
>> b.id);
>> >
>> > The output is:
>> >
>> > 1  2011-02-01   0
>> > 1  2011-03-01   0
>> > 2  2011-04-01   0
>> >
>> >
>> > I checked the logs and noticed the following warning : WARN
>> > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct: Extra bytes
>> > detected at the end of the row! Ignoring similar problems.Could this be
>> > causing it ?
>> >
>> > When I turn on hive.auto.convert.join=true , the error goes away as
>> there is
>> > no reduce phase.The output is:
>> >
>> > 1  2011-02-01   3
>> > 1  2011-03-01   4
>> > 2  2011-04-01   5
>> >
>> > Could somebody please help me figure out why we get the wrong results
>> when
>> > running through the reducer.
>> > --
>> > Thanks
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Thanks & Regards
> Himanish
>



-- 
Thanks & Regards
Himanish

Re: Reducer throwing warning during join operations.Defaulting int columns to 0

Posted by Himanish Kushary <hi...@gmail.com>.
Thanks Nitin..but to take care of that I had cleaned the csv files of
leading and trailing spaces before putting into hdfs.Also ran the dos2unix
command on the csv files.

Only if I define the external table with all fields data type as STRING the
joins perform properly.Even when load the data initially into a table with
all STRING fields and at a latter point copy the data to a different table
with proper data type, the joins give wrong result on the new table also.


On Wed, Aug 15, 2012 at 1:14 PM, Nitin Pawar <ni...@gmail.com>wrote:

> it might be the case that there are few empty spaces at the end of
> each row which are being handled when you are reading and writing from
> disc
>
> but when you set autoconvert then looks like one of these  tables is
> really small and it is converted into mapside join
> which means the entire table is loaded into map memory and there is no
> need of reduce
>
> On Wed, Aug 15, 2012 at 9:13 PM, Himanish Kushary <hi...@gmail.com>
> wrote:
> > Hi,
> >
> > I have uploaded few csv files from windows into hive and configured few
> > external tables using them. When I am trying to run a join on two tables
> one
> > of the int columns
> > get changed to 0. The structure of the tables are as follows:
> >
> >
> > Table-1                                        Table-2
> > ------------                                        -----------
> >
> > Id(int)                                          id(int)   datetime
> > eid(int)
> > --                                                  ----     ------------
> > -----
> > 1                                                    1   2011-02-01   3
> > 2                                                    1   2011-03-01   4
> > 3                                                    2   2011-04-01   5
> >                                                       4   2011-05-01   6
> >                                                       6   2011-06-01   7
> >
> >
> > The join query is - select a.* from Table-2 a join Table-1 b on (a.id=
> b.id);
> >
> > The output is:
> >
> > 1  2011-02-01   0
> > 1  2011-03-01   0
> > 2  2011-04-01   0
> >
> >
> > I checked the logs and noticed the following warning : WARN
> > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct: Extra bytes
> > detected at the end of the row! Ignoring similar problems.Could this be
> > causing it ?
> >
> > When I turn on hive.auto.convert.join=true , the error goes away as
> there is
> > no reduce phase.The output is:
> >
> > 1  2011-02-01   3
> > 1  2011-03-01   4
> > 2  2011-04-01   5
> >
> > Could somebody please help me figure out why we get the wrong results
> when
> > running through the reducer.
> > --
> > Thanks
>
>
>
> --
> Nitin Pawar
>



-- 
Thanks & Regards
Himanish

Re: Reducer throwing warning during join operations.Defaulting int columns to 0

Posted by Nitin Pawar <ni...@gmail.com>.
it might be the case that there are few empty spaces at the end of
each row which are being handled when you are reading and writing from
disc

but when you set autoconvert then looks like one of these  tables is
really small and it is converted into mapside join
which means the entire table is loaded into map memory and there is no
need of reduce

On Wed, Aug 15, 2012 at 9:13 PM, Himanish Kushary <hi...@gmail.com> wrote:
> Hi,
>
> I have uploaded few csv files from windows into hive and configured few
> external tables using them. When I am trying to run a join on two tables one
> of the int columns
> get changed to 0. The structure of the tables are as follows:
>
>
> Table-1                                        Table-2
> ------------                                        -----------
>
> Id(int)                                          id(int)   datetime
> eid(int)
> --                                                  ----     ------------
> -----
> 1                                                    1   2011-02-01   3
> 2                                                    1   2011-03-01   4
> 3                                                    2   2011-04-01   5
>                                                       4   2011-05-01   6
>                                                       6   2011-06-01   7
>
>
> The join query is - select a.* from Table-2 a join Table-1 b on (a.id=b.id);
>
> The output is:
>
> 1  2011-02-01   0
> 1  2011-03-01   0
> 2  2011-04-01   0
>
>
> I checked the logs and noticed the following warning : WARN
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct: Extra bytes
> detected at the end of the row! Ignoring similar problems.Could this be
> causing it ?
>
> When I turn on hive.auto.convert.join=true , the error goes away as there is
> no reduce phase.The output is:
>
> 1  2011-02-01   3
> 1  2011-03-01   4
> 2  2011-04-01   5
>
> Could somebody please help me figure out why we get the wrong results when
> running through the reducer.
> --
> Thanks



-- 
Nitin Pawar