You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Pradeep Kamath <pr...@yahoo-inc.com> on 2010/07/08 01:46:46 UTC

Complex types, lateral view and RCFile

Hi,

  I have data with complex types (map, struct, array of maps) stored as
a text file. I am able to successfully create an external table based on
this data and further build a lateral view on it:

 

hive -e 'select rownum, bag_item from complex_text LATERAL VIEW
explode(bagofmap) explodedTable AS bag_item ;'

1       {"k1":"v1","k2":"v2"}

1       {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}

2       {"a1":"b1","a2":"b2"}

2       {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}

 

Here is how I created this table:

CREATE external TABLE if not exists complex_text (

mymap map<string, string>,

mytuple struct<num:int,str:string,dbl:double>,

bagofmap array<map<string,string>>,

rownum int

)

row format DELIMITED FIELDS TERMINATED BY  '\001' COLLECTION ITEMS
TERMINATED BY '\002'

          MAP KEYS TERMINATED BY '\003' 

 

stored as textfile

location '/user/pradeepk/complex_text';

 

The data contents are (^C stands for ctrl-C .i.e '\003'):

mymapk1^Cmymapv1^Bmymapk2^Cmymapv2^A1^Bhello^B2.5^Ak1^Dv1^Ck2^Dv2^Bk3^Dv
3^Ck4^Dv4^Ck5^Dv5^Ck6^Dv6^A1

mymapk3^Cmymapv3^Bmymapk4^Cmymapv4^A2^Bbye^B3.5^Aa1^Db1^Ca2^Db2^Ba3^Db3^
Ca4^Db4^Ca5^Db5^Ca6^Db6^A2

 

Now I created a table using RCFile for storage based on the above table
as follows:

create table complex_rcfile

stored as RCFile

location '/user/pradeepk/complex_rcfile'

as select

    mymap,

    mytuple,

    bagofmap,

    rownum

from

    complex_text;

 

The same query against this table gives incorrect results (nulls for the
rownum column):

hive -e 'select rownum, bag_item from complex_rcfile LATERAL VIEW
explode(bagofmap) explodedTable AS bag_item ;'

NULL    {"a1":"b1","a2":"b2"}

NULL    {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}

NULL    {"k1":"v1","k2":"v2"}

NULL    {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}

 

I have a feeling the delimiters are not being correctly interpreted in
RCFile format. Strangely a non lateral view query works fine:

hive -e 'select rownum, bagofmap from complex_rcfile;'

2
[{"a1":"b1","a2":"b2"},{"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}]

1
[{"k1":"v1","k2":"v2"},{"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}]

 

Any pointers?

 

Thanks,

Pradeep


Re: Complex types, lateral view and RCFile

Posted by yongqiang he <he...@gmail.com>.
It seem hive-1418 did not solve the problem completely. will open a
jira for this.

On Thu, Jul 8, 2010 at 5:23 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> Yes indeed! – it did work fine with that setting – I wonder why it shows
> with RCFile and not with text though?
>
>
>
> ________________________________
>
> From: Paul Yang [mailto:pyang@facebook.com]
> Sent: Thursday, July 08, 2010 4:14 PM
>
> To: hive-user@hadoop.apache.org; heyongqiangict@gmail.com
> Subject: RE: Complex types, lateral view and RCFile
>
>
>
> This seems like an issue with the column pruner – can you try ‘set
> hive.optimize.cp=false’ and then re-run the query?
>
>
>
> From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
> Sent: Thursday, July 08, 2010 3:57 PM
> To: hive-user@hadoop.apache.org; heyongqiangict@gmail.com
> Subject: RE: Complex types, lateral view and RCFile
>
>
>
> Any pointers?
>
>
>
> Thanks,
>
> Pradeep
>
>
>
> ________________________________
>
> From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
> Sent: Wednesday, July 07, 2010 4:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Complex types, lateral view and RCFile
>
>
>
> Hi,
>
>   I have data with complex types (map, struct, array of maps) stored as a
> text file. I am able to successfully create an external table based on this
> data and further build a lateral view on it:
>
>
>
> hive -e 'select rownum, bag_item from complex_text LATERAL VIEW
> explode(bagofmap) explodedTable AS bag_item ;'
>
> 1       {"k1":"v1","k2":"v2"}
>
> 1       {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}
>
> 2       {"a1":"b1","a2":"b2"}
>
> 2       {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}
>
>
>
> Here is how I created this table:
>
> CREATE external TABLE if not exists complex_text (
>
> mymap map<string, string>,
>
> mytuple struct<num:int,str:string,dbl:double>,
>
> bagofmap array<map<string,string>>,
>
> rownum int
>
> )
>
> row format DELIMITED FIELDS TERMINATED BY  '\001' COLLECTION ITEMS
> TERMINATED BY '\002'
>
>           MAP KEYS TERMINATED BY '\003'
>
>
>
> stored as textfile
>
> location '/user/pradeepk/complex_text';
>
>
>
> The data contents are (^C stands for ctrl-C .i.e ‘\003’):
>
> mymapk1^Cmymapv1^Bmymapk2^Cmymapv2^A1^Bhello^B2.5^Ak1^Dv1^Ck2^Dv2^Bk3^Dv3^Ck4^Dv4^Ck5^Dv5^Ck6^Dv6^A1
>
> mymapk3^Cmymapv3^Bmymapk4^Cmymapv4^A2^Bbye^B3.5^Aa1^Db1^Ca2^Db2^Ba3^Db3^Ca4^Db4^Ca5^Db5^Ca6^Db6^A2
>
>
>
> Now I created a table using RCFile for storage based on the above table as
> follows:
>
> create table complex_rcfile
>
> stored as RCFile
>
> location '/user/pradeepk/complex_rcfile'
>
> as select
>
>     mymap,
>
>     mytuple,
>
>     bagofmap,
>
>     rownum
>
> from
>
>     complex_text;
>
>
>
> The same query against this table gives incorrect results (nulls for the
> rownum column):
>
> hive -e 'select rownum, bag_item from complex_rcfile LATERAL VIEW
> explode(bagofmap) explodedTable AS bag_item ;'
>
> NULL    {"a1":"b1","a2":"b2"}
>
> NULL    {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}
>
> NULL    {"k1":"v1","k2":"v2"}
>
> NULL    {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}
>
>
>
> I have a feeling the delimiters are not being correctly interpreted in
> RCFile format. Strangely a non lateral view query works fine:
>
> hive -e 'select rownum, bagofmap from complex_rcfile;’
>
> 2       [{"a1":"b1","a2":"b2"},{"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}]
>
> 1       [{"k1":"v1","k2":"v2"},{"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}]
>
>
>
> Any pointers?
>
>
>
> Thanks,
>
> Pradeep

Fwd: Complex types, lateral view and RCFile

Posted by yongqiang he <he...@gmail.com>.
forgot hive-user

---------- Forwarded message ----------
From: yongqiang he <he...@gmail.com>
Date: Thu, Jul 8, 2010 at 7:53 PM
Subject: Re: Complex types, lateral view and RCFile
To: Pradeep Kamath <pr...@yahoo-inc.com>


The reason is that RCFile use the column prunning results more
aggressively (to determine which columns needed to be read). For other
formats, all columns will be needed.

Paul, i think you have a jira for this issue?

On Thu, Jul 8, 2010 at 5:23 PM, Pradeep Kamath <pr...@yahoo-inc.com> wrote:
> Yes indeed! – it did work fine with that setting – I wonder why it shows
> with RCFile and not with text though?
>
>
>
> ________________________________
>
> From: Paul Yang [mailto:pyang@facebook.com]
> Sent: Thursday, July 08, 2010 4:14 PM
>
> To: hive-user@hadoop.apache.org; heyongqiangict@gmail.com
> Subject: RE: Complex types, lateral view and RCFile
>
>
>
> This seems like an issue with the column pruner – can you try ‘set
> hive.optimize.cp=false’ and then re-run the query?
>
>
>
> From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
> Sent: Thursday, July 08, 2010 3:57 PM
> To: hive-user@hadoop.apache.org; heyongqiangict@gmail.com
> Subject: RE: Complex types, lateral view and RCFile
>
>
>
> Any pointers?
>
>
>
> Thanks,
>
> Pradeep
>
>
>
> ________________________________
>
> From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
> Sent: Wednesday, July 07, 2010 4:47 PM
> To: hive-user@hadoop.apache.org
> Subject: Complex types, lateral view and RCFile
>
>
>
> Hi,
>
>   I have data with complex types (map, struct, array of maps) stored as a
> text file. I am able to successfully create an external table based on this
> data and further build a lateral view on it:
>
>
>
> hive -e 'select rownum, bag_item from complex_text LATERAL VIEW
> explode(bagofmap) explodedTable AS bag_item ;'
>
> 1       {"k1":"v1","k2":"v2"}
>
> 1       {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}
>
> 2       {"a1":"b1","a2":"b2"}
>
> 2       {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}
>
>
>
> Here is how I created this table:
>
> CREATE external TABLE if not exists complex_text (
>
> mymap map<string, string>,
>
> mytuple struct<num:int,str:string,dbl:double>,
>
> bagofmap array<map<string,string>>,
>
> rownum int
>
> )
>
> row format DELIMITED FIELDS TERMINATED BY  '\001' COLLECTION ITEMS
> TERMINATED BY '\002'
>
>           MAP KEYS TERMINATED BY '\003'
>
>
>
> stored as textfile
>
> location '/user/pradeepk/complex_text';
>
>
>
> The data contents are (^C stands for ctrl-C .i.e ‘\003’):
>
> mymapk1^Cmymapv1^Bmymapk2^Cmymapv2^A1^Bhello^B2.5^Ak1^Dv1^Ck2^Dv2^Bk3^Dv3^Ck4^Dv4^Ck5^Dv5^Ck6^Dv6^A1
>
> mymapk3^Cmymapv3^Bmymapk4^Cmymapv4^A2^Bbye^B3.5^Aa1^Db1^Ca2^Db2^Ba3^Db3^Ca4^Db4^Ca5^Db5^Ca6^Db6^A2
>
>
>
> Now I created a table using RCFile for storage based on the above table as
> follows:
>
> create table complex_rcfile
>
> stored as RCFile
>
> location '/user/pradeepk/complex_rcfile'
>
> as select
>
>     mymap,
>
>     mytuple,
>
>     bagofmap,
>
>     rownum
>
> from
>
>     complex_text;
>
>
>
> The same query against this table gives incorrect results (nulls for the
> rownum column):
>
> hive -e 'select rownum, bag_item from complex_rcfile LATERAL VIEW
> explode(bagofmap) explodedTable AS bag_item ;'
>
> NULL    {"a1":"b1","a2":"b2"}
>
> NULL    {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}
>
> NULL    {"k1":"v1","k2":"v2"}
>
> NULL    {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}
>
>
>
> I have a feeling the delimiters are not being correctly interpreted in
> RCFile format. Strangely a non lateral view query works fine:
>
> hive -e 'select rownum, bagofmap from complex_rcfile;’
>
> 2       [{"a1":"b1","a2":"b2"},{"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}]
>
> 1       [{"k1":"v1","k2":"v2"},{"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}]
>
>
>
> Any pointers?
>
>
>
> Thanks,
>
> Pradeep

RE: Complex types, lateral view and RCFile

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
Yes indeed! - it did work fine with that setting - I wonder why it shows
with RCFile and not with text though?

 

________________________________

From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Thursday, July 08, 2010 4:14 PM
To: hive-user@hadoop.apache.org; heyongqiangict@gmail.com
Subject: RE: Complex types, lateral view and RCFile

 

This seems like an issue with the column pruner - can you try 'set
hive.optimize.cp=false' and then re-run the query?

 

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Thursday, July 08, 2010 3:57 PM
To: hive-user@hadoop.apache.org; heyongqiangict@gmail.com
Subject: RE: Complex types, lateral view and RCFile

 

Any pointers?

 

Thanks,

Pradeep

 

________________________________

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Wednesday, July 07, 2010 4:47 PM
To: hive-user@hadoop.apache.org
Subject: Complex types, lateral view and RCFile

 

Hi,

  I have data with complex types (map, struct, array of maps) stored as
a text file. I am able to successfully create an external table based on
this data and further build a lateral view on it:

 

hive -e 'select rownum, bag_item from complex_text LATERAL VIEW
explode(bagofmap) explodedTable AS bag_item ;'

1       {"k1":"v1","k2":"v2"}

1       {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}

2       {"a1":"b1","a2":"b2"}

2       {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}

 

Here is how I created this table:

CREATE external TABLE if not exists complex_text (

mymap map<string, string>,

mytuple struct<num:int,str:string,dbl:double>,

bagofmap array<map<string,string>>,

rownum int

)

row format DELIMITED FIELDS TERMINATED BY  '\001' COLLECTION ITEMS
TERMINATED BY '\002'

          MAP KEYS TERMINATED BY '\003' 

 

stored as textfile

location '/user/pradeepk/complex_text';

 

The data contents are (^C stands for ctrl-C .i.e '\003'):

mymapk1^Cmymapv1^Bmymapk2^Cmymapv2^A1^Bhello^B2.5^Ak1^Dv1^Ck2^Dv2^Bk3^Dv
3^Ck4^Dv4^Ck5^Dv5^Ck6^Dv6^A1

mymapk3^Cmymapv3^Bmymapk4^Cmymapv4^A2^Bbye^B3.5^Aa1^Db1^Ca2^Db2^Ba3^Db3^
Ca4^Db4^Ca5^Db5^Ca6^Db6^A2

 

Now I created a table using RCFile for storage based on the above table
as follows:

create table complex_rcfile

stored as RCFile

location '/user/pradeepk/complex_rcfile'

as select

    mymap,

    mytuple,

    bagofmap,

    rownum

from

    complex_text;

 

The same query against this table gives incorrect results (nulls for the
rownum column):

hive -e 'select rownum, bag_item from complex_rcfile LATERAL VIEW
explode(bagofmap) explodedTable AS bag_item ;'

NULL    {"a1":"b1","a2":"b2"}

NULL    {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}

NULL    {"k1":"v1","k2":"v2"}

NULL    {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}

 

I have a feeling the delimiters are not being correctly interpreted in
RCFile format. Strangely a non lateral view query works fine:

hive -e 'select rownum, bagofmap from complex_rcfile;'

2
[{"a1":"b1","a2":"b2"},{"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}]

1
[{"k1":"v1","k2":"v2"},{"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}]

 

Any pointers?

 

Thanks,

Pradeep


RE: Complex types, lateral view and RCFile

Posted by Paul Yang <py...@facebook.com>.
This seems like an issue with the column pruner - can you try 'set hive.optimize.cp=false' and then re-run the query?

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
Sent: Thursday, July 08, 2010 3:57 PM
To: hive-user@hadoop.apache.org; heyongqiangict@gmail.com
Subject: RE: Complex types, lateral view and RCFile

Any pointers?

Thanks,
Pradeep

________________________________
From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com]
Sent: Wednesday, July 07, 2010 4:47 PM
To: hive-user@hadoop.apache.org
Subject: Complex types, lateral view and RCFile

Hi,
  I have data with complex types (map, struct, array of maps) stored as a text file. I am able to successfully create an external table based on this data and further build a lateral view on it:

hive -e 'select rownum, bag_item from complex_text LATERAL VIEW explode(bagofmap) explodedTable AS bag_item ;'
1       {"k1":"v1","k2":"v2"}
1       {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}
2       {"a1":"b1","a2":"b2"}
2       {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}

Here is how I created this table:
CREATE external TABLE if not exists complex_text (
mymap map<string, string>,
mytuple struct<num:int,str:string,dbl:double>,
bagofmap array<map<string,string>>,
rownum int
)
row format DELIMITED FIELDS TERMINATED BY  '\001' COLLECTION ITEMS TERMINATED BY '\002'
          MAP KEYS TERMINATED BY '\003'

stored as textfile
location '/user/pradeepk/complex_text';

The data contents are (^C stands for ctrl-C .i.e '\003'):
mymapk1^Cmymapv1^Bmymapk2^Cmymapv2^A1^Bhello^B2.5^Ak1^Dv1^Ck2^Dv2^Bk3^Dv3^Ck4^Dv4^Ck5^Dv5^Ck6^Dv6^A1
mymapk3^Cmymapv3^Bmymapk4^Cmymapv4^A2^Bbye^B3.5^Aa1^Db1^Ca2^Db2^Ba3^Db3^Ca4^Db4^Ca5^Db5^Ca6^Db6^A2

Now I created a table using RCFile for storage based on the above table as follows:
create table complex_rcfile
stored as RCFile
location '/user/pradeepk/complex_rcfile'
as select
    mymap,
    mytuple,
    bagofmap,
    rownum
from
    complex_text;

The same query against this table gives incorrect results (nulls for the rownum column):
hive -e 'select rownum, bag_item from complex_rcfile LATERAL VIEW explode(bagofmap) explodedTable AS bag_item ;'
NULL    {"a1":"b1","a2":"b2"}
NULL    {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}
NULL    {"k1":"v1","k2":"v2"}
NULL    {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}

I have a feeling the delimiters are not being correctly interpreted in RCFile format. Strangely a non lateral view query works fine:
hive -e 'select rownum, bagofmap from complex_rcfile;'
2       [{"a1":"b1","a2":"b2"},{"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}]
1       [{"k1":"v1","k2":"v2"},{"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}]

Any pointers?

Thanks,
Pradeep

RE: Complex types, lateral view and RCFile

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
Any pointers?

 

Thanks,

Pradeep

 

________________________________

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Wednesday, July 07, 2010 4:47 PM
To: hive-user@hadoop.apache.org
Subject: Complex types, lateral view and RCFile

 

Hi,

  I have data with complex types (map, struct, array of maps) stored as
a text file. I am able to successfully create an external table based on
this data and further build a lateral view on it:

 

hive -e 'select rownum, bag_item from complex_text LATERAL VIEW
explode(bagofmap) explodedTable AS bag_item ;'

1       {"k1":"v1","k2":"v2"}

1       {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}

2       {"a1":"b1","a2":"b2"}

2       {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}

 

Here is how I created this table:

CREATE external TABLE if not exists complex_text (

mymap map<string, string>,

mytuple struct<num:int,str:string,dbl:double>,

bagofmap array<map<string,string>>,

rownum int

)

row format DELIMITED FIELDS TERMINATED BY  '\001' COLLECTION ITEMS
TERMINATED BY '\002'

          MAP KEYS TERMINATED BY '\003' 

 

stored as textfile

location '/user/pradeepk/complex_text';

 

The data contents are (^C stands for ctrl-C .i.e '\003'):

mymapk1^Cmymapv1^Bmymapk2^Cmymapv2^A1^Bhello^B2.5^Ak1^Dv1^Ck2^Dv2^Bk3^Dv
3^Ck4^Dv4^Ck5^Dv5^Ck6^Dv6^A1

mymapk3^Cmymapv3^Bmymapk4^Cmymapv4^A2^Bbye^B3.5^Aa1^Db1^Ca2^Db2^Ba3^Db3^
Ca4^Db4^Ca5^Db5^Ca6^Db6^A2

 

Now I created a table using RCFile for storage based on the above table
as follows:

create table complex_rcfile

stored as RCFile

location '/user/pradeepk/complex_rcfile'

as select

    mymap,

    mytuple,

    bagofmap,

    rownum

from

    complex_text;

 

The same query against this table gives incorrect results (nulls for the
rownum column):

hive -e 'select rownum, bag_item from complex_rcfile LATERAL VIEW
explode(bagofmap) explodedTable AS bag_item ;'

NULL    {"a1":"b1","a2":"b2"}

NULL    {"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}

NULL    {"k1":"v1","k2":"v2"}

NULL    {"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}

 

I have a feeling the delimiters are not being correctly interpreted in
RCFile format. Strangely a non lateral view query works fine:

hive -e 'select rownum, bagofmap from complex_rcfile;'

2
[{"a1":"b1","a2":"b2"},{"a3":"b3","a4":"b4","a5":"b5","a6":"b6"}]

1
[{"k1":"v1","k2":"v2"},{"k3":"v3","k4":"v4","k5":"v5","k6":"v6"}]

 

Any pointers?

 

Thanks,

Pradeep