You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Josh Ferguson <jo...@besquared.net> on 2008/12/02 10:35:55 UTC

Index Expressions

I'm getting a bunch of errors during my reduce step while doing a join  
between two string fields for some reason.

java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate  
index expression on string
	at  
org 
.apache 
.hadoop 
.hive 
.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
	at  
org 
.apache 
.hadoop 
.hive 
.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
	at  
org 
.apache 
.hadoop 
.hive 
.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
	at  
org 
.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java: 
67)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
	at  
org 
.apache 
.hadoop 
.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java: 
257)
	at  
org 
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
477)
	at  
org 
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
467)
	at  
org 
.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
467)
	at  
org 
.apache 
.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java: 
507)
	at  
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java: 
489)
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java: 
140)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
	at org.apache.hadoop.mapred.Child.main(Child.java:155)

The Query is
SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON  
activities.actor_id = users.id WHERE activities.dataset='poke' AND  
activities.properties['verb'] = 'Dance';
hive> DESCRIBE activities;
OK
actor_id	string
actee_id	string
properties	map<string,string>
account	string
application	string
dataset	string
hour	int

hive> DESCRIBE users;
OK
id	string
properties	map<string,string>
account	string
application	string
dataset	string
hour	int

Thanks

Josh

Re: Index Expressions

Posted by Zheng Shao <zs...@gmail.com>.

Then it's specific to the join operator. Most probably after join operator
everything is converted to string (which is incorrect)..
Can you file a jira issue for that?

Zheng

On Tue, Dec 2, 2008 at 8:18 PM, Josh Ferguson <jo...@besquared.net> wrote:

> Indeed it does
> Josh
>
> On Dec 2, 2008, at 8:07 PM, Zheng Shao wrote:
>
> Does the following simpler query work?
>
> SELECT activities.* FROM activities WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance';
>
>
> Zheng
>
> On Tue, Dec 2, 2008 at 7:44 PM, Josh Ferguson <jo...@besquared.net> wrote:
>
>> These tables were already defined with those
>> CREATE TABLE activities
>> (actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>)
>> PARTITIONED BY (account STRING, application STRING, dataset STRING, hour
>> INT)
>> CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS
>> ROW FORMAT DELIMITED
>> COLLECTION ITEMS TERMINATED BY '44'
>> MAP KEYS TERMINATED BY '58'
>> STORED AS TEXTFILE;
>>
>> Detailed Table Information:
>> Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
>> FieldSchema(name:actee_id,type:string,comment:null),
>> FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
>> actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
>> FieldSchema(name:application,type:string,comment:null),
>> FieldSchema(name:dataset,type:string,comment:null),
>> FieldSchema(name:hour,type:int,comment:null)],parameters:{})
>>
>>
>> CREATE TABLE users
>> (id STRING, properties MAP<STRING, STRING>)
>> PARTITIONED BY (account STRING, application STRING, dataset STRING, hour
>> INT)
>> CLUSTERED BY (id) INTO 32 BUCKETS
>> ROW FORMAT DELIMITED
>> COLLECTION ITEMS TERMINATED BY '44'
>> MAP KEYS TERMINATED BY '58'
>> STORED AS TEXTFILE;
>>
>> Detailed Table Information:
>> Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
>> FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
>> FieldSchema(name:application,type:string,comment:null),
>> FieldSchema(name:dataset,type:string,comment:null),
>> FieldSchema(name:hour,type:int,comment:null)],parameters:{})
>>
>> The Query Was:
>>
>> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance';
>>
>>
>> Josh Ferguson
>>
>> On Dec 2, 2008, at 1:44 AM, Zheng Shao wrote:
>>
>>  Hi Josh,
>>
>> This is a known problem.
>>
>> If any of the columns are map<xxx,xxx> or list<xxx,xxx>, you have to
>> specify the ITEM terminator and KEY terminator. Please see the complete
>> CREATE table command syntax.
>>
>> Let me know if it works or not.
>>
>> Zheng
>> *From:* Josh Ferguson [mailto:josh@besquared.net <jo...@besquared.net>]
>> *Sent:* Tuesday, December 02, 2008 1:36 AM
>> *To:* hive-user@hadoop.apache.org
>> *Subject:* Index Expressions
>>
>> I'm getting a bunch of errors during my reduce step while doing a join
>> between two string fields for some reason.
>>
>> java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index
>> expression on string
>>
>>           at org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
>>
>>           at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
>>
>>           at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
>>
>>           at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
>>
>>           at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
>>
>>           at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
>>
>>           at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
>>
>>           at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
>>
>>           at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
>>
>>           at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
>>
>>           at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
>>
>>           at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
>>
>>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
>>
>>           at org.apache.hadoop.mapred.Child.main(Child.java:155)
>>
>>
>>
>> The Query is
>>
>> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance';
>>
>> hive> DESCRIBE activities;
>> OK
>> actor_id        string
>> actee_id        string
>> properties      map<string,string>
>> account string
>> application     string
>> dataset string
>> hour    int
>>
>> hive> DESCRIBE users;
>> OK
>> id      string
>> properties      map<string,string>
>> account string
>> application     string
>> dataset string
>> hour    int
>>
>> Thanks
>>
>> Josh
>>
>>
>>
>>
>
>
> --
> Yours,
> Zheng
>
>
>


-- 
Yours,
Zheng

Re: Index Expressions

Posted by Josh Ferguson <jo...@besquared.net>.

Indeed it does

Josh

On Dec 2, 2008, at 8:07 PM, Zheng Shao wrote:

> Does the following simpler query work?
>
> SELECT activities.* FROM activities WHERE activities.dataset='poke'  
> AND activities.properties['verb'] = 'Dance';
>
>
>
> Zheng
>
> On Tue, Dec 2, 2008 at 7:44 PM, Josh Ferguson <jo...@besquared.net>  
> wrote:
> These tables were already defined with those
>
> CREATE TABLE activities
> (actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>)
> PARTITIONED BY (account STRING, application STRING, dataset STRING,  
> hour INT)
> CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '44'
> MAP KEYS TERMINATED BY '58'
> STORED AS TEXTFILE;
>
> Detailed Table Information:
> Table(tableName:activities,dbName:default,owner:Josh,createTime: 
> 1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
> [FieldSchema(name:actor_id,type:string,comment:null),  
> FieldSchema(name:actee_id,type:string,comment:null),  
> FieldSchema 
> (name:properties,type:map<string,string>,comment:null)],location:/ 
> user/hive/warehouse/ 
> activities 
> ,inputFormat:org 
> .apache 
> .hadoop 
> .mapred 
> .TextInputFormat 
> ,outputFormat:org 
> .apache 
> .hadoop 
> .hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets: 
> 32 
> ,serdeInfo:SerDeInfo 
> (name:null 
> ,serializationLib:org 
> .apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters: 
> {colelction 
> .delim 
> = 
> 44 
> ,mapkey 
> .delim 
> = 
> 58 
> ,serialization 
> .format 
> = 
> org 
> .apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols: 
> [actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys: 
> [FieldSchema(name:account,type:string,comment:null),  
> FieldSchema(name:application,type:string,comment:null),  
> FieldSchema(name:dataset,type:string,comment:null),  
> FieldSchema(name:hour,type:int,comment:null)],parameters:{})
>
>
> CREATE TABLE users
> (id STRING, properties MAP<STRING, STRING>)
> PARTITIONED BY (account STRING, application STRING, dataset STRING,  
> hour INT)
> CLUSTERED BY (id) INTO 32 BUCKETS
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '44'
> MAP KEYS TERMINATED BY '58'
> STORED AS TEXTFILE;
>
> Detailed Table Information:
> Table(tableName:users,dbName:default,owner:Josh,createTime: 
> 1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
> [FieldSchema(name:id,type:string,comment:null),  
> FieldSchema 
> (name:properties,type:map<string,string>,comment:null)],location:/ 
> user/hive/warehouse/ 
> users 
> ,inputFormat:org 
> .apache 
> .hadoop 
> .mapred 
> .TextInputFormat 
> ,outputFormat:org 
> .apache 
> .hadoop 
> .hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets: 
> 32 
> ,serdeInfo:SerDeInfo 
> (name:null 
> ,serializationLib:org 
> .apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters: 
> {colelction 
> .delim 
> = 
> 44 
> ,mapkey 
> .delim 
> = 
> 58 
> ,serialization 
> .format 
> = 
> org 
> .apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols: 
> [id],sortCols:[],parameters:{}),partitionKeys: 
> [FieldSchema(name:account,type:string,comment:null),  
> FieldSchema(name:application,type:string,comment:null),  
> FieldSchema(name:dataset,type:string,comment:null),  
> FieldSchema(name:hour,type:int,comment:null)],parameters:{})
>
> The Query Was:
>
>> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users  
>> ON activities.actor_id = users.id WHERE activities.dataset='poke'  
>> AND activities.properties['verb'] = 'Dance';
>
>
> Josh Ferguson
>
> On Dec 2, 2008, at 1:44 AM, Zheng Shao wrote:
>
>> Hi Josh,
>>
>> This is a known problem.
>>
>> If any of the columns are map<xxx,xxx> or list<xxx,xxx>, you have  
>> to specify the ITEM terminator and KEY terminator. Please see the  
>> complete CREATE table command syntax.
>>
>> Let me know if it works or not.
>>
>> Zheng
>> From: Josh Ferguson [mailto:josh@besquared.net]
>> Sent: Tuesday, December 02, 2008 1:36 AM
>> To: hive-user@hadoop.apache.org
>> Subject: Index Expressions
>>
>> I'm getting a bunch of errors during my reduce step while doing a  
>> join between two string fields for some reason.
>>
>> java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate  
>> index expression on string
>>           at  
>> org 
>> .apache 
>> .hadoop 
>> .hive 
>> .ql 
>> .exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
>>           at  
>> org 
>> .apache 
>> .hadoop 
>> .hive 
>> .ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java: 
>> 72)
>>           at  
>> org 
>> .apache 
>> .hadoop 
>> .hive 
>> .ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java: 
>> 72)
>>           at  
>> org 
>> .apache 
>> .hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
>>           at  
>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
>>           at  
>> org 
>> .apache 
>> .hadoop 
>> .hive 
>> .ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
>>           at  
>> org 
>> .apache 
>> .hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
>>           at  
>> org 
>> .apache 
>> .hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
>>           at  
>> org 
>> .apache 
>> .hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
>>           at  
>> org 
>> .apache 
>> .hadoop 
>> .hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
>>           at  
>> org 
>> .apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java: 
>> 489)
>>           at  
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java: 
>> 140)
>>           at  
>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
>>           at org.apache.hadoop.mapred.Child.main(Child.java:155)
>>
>> The Query is
>> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users  
>> ON activities.actor_id = users.id WHERE activities.dataset='poke'  
>> AND activities.properties['verb'] = 'Dance';
>> hive> DESCRIBE activities;
>> OK
>> actor_id        string
>> actee_id        string
>> properties      map<string,string>
>> account string
>> application     string
>> dataset string
>> hour    int
>>
>> hive> DESCRIBE users;
>> OK
>> id      string
>> properties      map<string,string>
>> account string
>> application     string
>> dataset string
>> hour    int
>>
>> Thanks
>>
>> Josh
>>
>
>
>
>
> -- 
> Yours,
> Zheng

Re: Index Expressions

Posted by Zheng Shao <zs...@gmail.com>.

Does the following simpler query work?

SELECT activities.* FROM activities WHERE activities.dataset='poke'
AND activities.properties['verb'] = 'Dance';


Zheng

On Tue, Dec 2, 2008 at 7:44 PM, Josh Ferguson <jo...@besquared.net> wrote:

> These tables were already defined with those
> CREATE TABLE activities
> (actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>)
> PARTITIONED BY (account STRING, application STRING, dataset STRING, hour
> INT)
> CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '44'
> MAP KEYS TERMINATED BY '58'
> STORED AS TEXTFILE;
>
> Detailed Table Information:
> Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
> FieldSchema(name:actee_id,type:string,comment:null),
> FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
> actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
> FieldSchema(name:application,type:string,comment:null),
> FieldSchema(name:dataset,type:string,comment:null),
> FieldSchema(name:hour,type:int,comment:null)],parameters:{})
>
>
> CREATE TABLE users
> (id STRING, properties MAP<STRING, STRING>)
> PARTITIONED BY (account STRING, application STRING, dataset STRING, hour
> INT)
> CLUSTERED BY (id) INTO 32 BUCKETS
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '44'
> MAP KEYS TERMINATED BY '58'
> STORED AS TEXTFILE;
>
> Detailed Table Information:
> Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
> FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
> FieldSchema(name:application,type:string,comment:null),
> FieldSchema(name:dataset,type:string,comment:null),
> FieldSchema(name:hour,type:int,comment:null)],parameters:{})
>
> The Query Was:
>
> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance';
>
>
> Josh Ferguson
>
> On Dec 2, 2008, at 1:44 AM, Zheng Shao wrote:
>
> Hi Josh,
>
> This is a known problem.
>
> If any of the columns are map<xxx,xxx> or list<xxx,xxx>, you have to
> specify the ITEM terminator and KEY terminator. Please see the complete
> CREATE table command syntax.
>
> Let me know if it works or not.
>
> Zheng
> *From:* Josh Ferguson [mailto:josh@besquared.net <jo...@besquared.net>]
> *Sent:* Tuesday, December 02, 2008 1:36 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Index Expressions
>
> I'm getting a bunch of errors during my reduce step while doing a join
> between two string fields for some reason.
>
> java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index
> expression on string
>
>           at org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
>
>           at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
>
>           at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
>
>           at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
>
>           at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
>
>           at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
>
>           at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
>
>           at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
>
>           at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
>
>           at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
>
>           at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
>
>           at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
>
>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
>
>           at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
>
>
> The Query is
>
> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance';
>
> hive> DESCRIBE activities;
> OK
> actor_id        string
> actee_id        string
> properties      map<string,string>
> account string
> application     string
> dataset string
> hour    int
>
> hive> DESCRIBE users;
> OK
> id      string
> properties      map<string,string>
> account string
> application     string
> dataset string
> hour    int
>
> Thanks
>
> Josh
>
>
>
>


-- 
Yours,
Zheng

Re: Index Expressions

Posted by Josh Ferguson <jo...@besquared.net>.

These tables were already defined with those

CREATE TABLE activities
(actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING,  
hour INT)
CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

Detailed Table Information:
Table(tableName:activities,dbName:default,owner:Josh,createTime: 
1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
[FieldSchema(name:actor_id,type:string,comment:null),  
FieldSchema(name:actee_id,type:string,comment:null),  
FieldSchema 
(name:properties,type:map<string,string>,comment:null)],location:/user/ 
hive/warehouse/ 
activities 
,inputFormat:org 
.apache 
.hadoop 
.mapred 
.TextInputFormat 
,outputFormat:org 
.apache 
.hadoop 
.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets: 
32 
,serdeInfo:SerDeInfo 
(name:null 
,serializationLib:org 
.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters: 
{colelction 
.delim 
= 
44 
,mapkey 
.delim 
= 
58 
,serialization 
.format 
= 
org 
.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols: 
[actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys: 
[FieldSchema(name:account,type:string,comment:null),  
FieldSchema(name:application,type:string,comment:null),  
FieldSchema(name:dataset,type:string,comment:null),  
FieldSchema(name:hour,type:int,comment:null)],parameters:{})


CREATE TABLE users
(id STRING, properties MAP<STRING, STRING>)
PARTITIONED BY (account STRING, application STRING, dataset STRING,  
hour INT)
CLUSTERED BY (id) INTO 32 BUCKETS
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

Detailed Table Information:
Table(tableName:users,dbName:default,owner:Josh,createTime: 
1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
[FieldSchema(name:id,type:string,comment:null),  
FieldSchema 
(name:properties,type:map<string,string>,comment:null)],location:/user/ 
hive/warehouse/ 
users 
,inputFormat:org 
.apache 
.hadoop 
.mapred 
.TextInputFormat 
,outputFormat:org 
.apache 
.hadoop 
.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets: 
32 
,serdeInfo:SerDeInfo 
(name:null 
,serializationLib:org 
.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters: 
{colelction 
.delim 
= 
44 
,mapkey 
.delim 
= 
58 
,serialization 
.format 
= 
org 
.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols: 
[id],sortCols:[],parameters:{}),partitionKeys: 
[FieldSchema(name:account,type:string,comment:null),  
FieldSchema(name:application,type:string,comment:null),  
FieldSchema(name:dataset,type:string,comment:null),  
FieldSchema(name:hour,type:int,comment:null)],parameters:{})

The Query Was:

> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users  
> ON activities.actor_id = users.id WHERE activities.dataset='poke'  
> AND activities.properties['verb'] = 'Dance';


Josh Ferguson

On Dec 2, 2008, at 1:44 AM, Zheng Shao wrote:

> Hi Josh,
>
> This is a known problem.
>
> If any of the columns are map<xxx,xxx> or list<xxx,xxx>, you have to  
> specify the ITEM terminator and KEY terminator. Please see the  
> complete CREATE table command syntax.
>
> Let me know if it works or not.
>
> Zheng
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, December 02, 2008 1:36 AM
> To: hive-user@hadoop.apache.org
> Subject: Index Expressions
>
> I'm getting a bunch of errors during my reduce step while doing a  
> join between two string fields for some reason.
>
> java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate  
> index expression on string
>           at  
> org 
> .apache 
> .hadoop 
> .hive 
> .ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java: 
> 64)
>           at  
> org 
> .apache 
> .hadoop 
> .hive 
> .ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
>           at  
> org 
> .apache 
> .hadoop 
> .hive 
> .ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
>           at  
> org 
> .apache 
> .hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
>           at  
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
>           at  
> org 
> .apache 
> .hadoop 
> .hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java: 
> 257)
>           at  
> org 
> .apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
> 477)
>           at  
> org 
> .apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
> 467)
>           at  
> org 
> .apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java: 
> 467)
>           at  
> org 
> .apache 
> .hadoop 
> .hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
>           at  
> org 
> .apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java: 
> 489)
>           at  
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java: 
> 140)
>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: 
> 430)
>           at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
> The Query is
> SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users  
> ON activities.actor_id = users.id WHERE activities.dataset='poke'  
> AND activities.properties['verb'] = 'Dance';
> hive> DESCRIBE activities;
> OK
> actor_id        string
> actee_id        string
> properties      map<string,string>
> account string
> application     string
> dataset string
> hour    int
>
> hive> DESCRIBE users;
> OK
> id      string
> properties      map<string,string>
> account string
> application     string
> dataset string
> hour    int
>
> Thanks
>
> Josh
>

RE: Index Expressions

Posted by Zheng Shao <zs...@facebook.com>.

Hi Josh,

This is a known problem.

If any of the columns are map<xxx,xxx> or list<xxx,xxx>, you have to specify the ITEM terminator and KEY terminator. Please see the complete CREATE table command syntax.

Let me know if it works or not.

Zheng
From: Josh Ferguson [mailto:josh@besquared.net]
Sent: Tuesday, December 02, 2008 1:36 AM
To: hive-user@hadoop.apache.org
Subject: Index Expressions

I'm getting a bunch of errors during my reduce step while doing a join between two string fields for some reason.

java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index expression on string

          at org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)

          at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)

          at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)

          at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)

          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)

          at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)

          at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)

          at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)

          at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)

          at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)

          at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)

          at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)

          at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)

          at org.apache.hadoop.mapred.Child.main(Child.java:155)



The Query is

SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance';
hive> DESCRIBE activities;
OK
actor_id        string
actee_id        string
properties      map<string,string>
account string
application     string
dataset string
hour    int

hive> DESCRIBE users;
OK
id      string
properties      map<string,string>
account string
application     string
dataset string
hour    int

Thanks

Josh