You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Andrew Psaltis <An...@Webtrends.com> on 2013/05/03 19:20:10 UTC

ORC with Map Column Type using Hive 0.11.0-RC1

Hello,
I am trying to evaluate Hive 0.11.0-RC1, in particular I am very interested in the ORC storage mechanism. We have a need to have one column be a Map<String,String> in a table and from what I have read this is supported with the ORC format, however when trying to do a select on a table with a Map column I get an exception (the stack trace and more details is below). Here is what I have done to test this:


Environment:
OS X 10.8, Java 1.6
Hadoop 1.0.4 running in local mode
Hive 0.11.0-RC1 (from -- http://people.apache.org/~hashutosh/hive-0.11.0-rc1/) using Deby for Metadata

Steps to repeat

  *   Create table:

create table tempWithMap(name String, Map<String,String>);

  *   Load data into the table from a local file.
  *   Create an ORC backed table off of the TextFile backed table doing:

create table orcTableWithMap stored as ORC as select * from tempWithMap;

  *   Select name field from orcTableWithMap table doing:

select name from orcTableWithMap;


I then get the following output:


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201304301245_0033, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201304301245_0033
Kill Command = /Users/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201304301245_0033
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-05-03 10:55:52,796 Stage-1 map = 0%,  reduce = 0%
2013-05-03 10:56:22,889 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201304301245_0033 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201304301245_0033
Examining task ID: task_201304301245_0033_m_000002 (and more) from job job_201304301245_0033

Task with the most failures(4):
-----
Task ID:
  task_201304301245_0033_m_000000

URL:
  http://localhost:50030/taskdetails.jsp?jobid=job_201304301245_0033&tipid=task_201304301245_0033_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:522)
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:90)
... 22 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:307)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:270)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:482)
... 23 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec


Seeing this exception raised at least two questions in my mind:

  1.  I was under the impression, albeit perhaps wrong, that with ORC only the columns being selected would be deserialized. If that is true, then why would the map be deserialized when my query was for the column that is a string type and the map is not needed to satisfy the query?
  2.  Is there something I am doing wrong here? If not what can I do to help track down the source of the problem? I have tried this test using a map<int,int> and get the same results. Also, I have been trying to run the ORC Junit tests with Eclipse but have been having a dickens of time getting that to work.

Any input/insight would be greatly appreciated.


Thanks in advance,
Andrew


Re: ORC with Map Column Type using Hive 0.11.0-RC1

Posted by Andrew Psaltis <An...@Webtrends.com>.
Owen,
Thanks for the quick response, filing the Jira, and explaining the way the reading of the columns works.

RE: > Ironically, doing select * from the table works fine.

I just retested this to make sure I was not mistaken and indeed it works. The only thing I can gather is there is a different code path when there are no M/R jobs involved, as doing a select * does not involve any M/R jobs. Of course, I may be totally off base here.


Thanks again,
Andrew


From: Owen O'Malley <om...@apache.org>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Friday, May 3, 2013 2:20 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: ORC with Map Column Type using Hive 0.11.0-RC1




On Fri, May 3, 2013 at 10:20 AM, Andrew Psaltis <An...@webtrends.com>> wrote:
Hello,
I am trying to evaluate Hive 0.11.0-RC1, in particular I am very interested in the ORC storage mechanism. We have a need to have one column be a Map<String,String> in a table and from what I have read this is supported with the ORC format, however when trying to do a select on a table with a Map column I get an exception (the stack trace and more details is below). Here is what I have done to test this:

I've create https://issues.apache.org/jira/browse/HIVE-4494 with the issue.


  1.  I was under the impression, albeit perhaps wrong, that with ORC only the columns being selected would be deserialized. If that is true, then why would the map be deserialized when my query was for the column that is a string type and the map is not needed to satisfy the query?

ORC only reads and deserializes the columns that Hive requests, but the rows returned are required to have all of the columns. The values for the ignored column will always be null. You are hitting a bug in setting up the objectinspectors for reading the columns.

Ironically, doing select * from the table works fine.

  1.  Is there something I am doing wrong here? If not what can I do to help track down the source of the problem? I have tried this test using a map<int,int> and get the same results. Also, I have been trying to run the ORC Junit tests with Eclipse but have been having a dickens of time getting that to work.

It looks at a first pass that I need to make OrcMapObjectInspector implement SettableMapObjectInspector.

-- Owen

Re: ORC with Map Column Type using Hive 0.11.0-RC1

Posted by Owen O'Malley <om...@apache.org>.
On Fri, May 3, 2013 at 10:20 AM, Andrew Psaltis <
Andrew.Psaltis@webtrends.com> wrote:

>  Hello,
> I am trying to evaluate Hive 0.11.0-RC1, in particular I am very
> interested in the ORC storage mechanism. We have a need to have one column
> be a Map<String,String> in a table and from what I have read this is
> supported with the ORC format, however when trying to do a select on a
> table with a Map column I get an exception (the stack trace and more
> details is below). Here is what I have done to test this:
>

I've create https://issues.apache.org/jira/browse/HIVE-4494 with the issue.


>    1. I was under the impression, albeit perhaps wrong, that with ORC
>    only the columns being selected would be deserialized. If that is true,
>    then why would the map be deserialized when my query was for the column
>    that is a string type and the map is not needed to satisfy the query?
>
> ORC only reads and deserializes the columns that Hive requests, but the
rows returned are required to have all of the columns. The values for the
ignored column will always be null. You are hitting a bug in setting up the
objectinspectors for reading the columns.

Ironically, doing select * from the table works fine.

>
>    1. Is there something I am doing wrong here? If not what can I do to
>    help track down the source of the problem? I have tried this test using a
>    map<int,int> and get the same results. Also, I have been trying to run the
>    ORC Junit tests with Eclipse but have been having a dickens of time getting
>    that to work.
>
> It looks at a first pass that I need to make OrcMapObjectInspector
implement SettableMapObjectInspector.

-- Owen