You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sasanka Vemavarapu <sa...@hotmail.com> on 2013/12/06 13:00:13 UTC
RegexSerDe Issue
Hi,
I have a HIVE table with ROW FORMAT SERDE, loaded data into it, could successfully query all columns, but am unable to query specific columns. Could you please help me understand this issue about why I am unable to query individual columns? If possible provide me a solution.
Steps to Replicate the Issue are as follows:
1. Create a text file 'confirmation_sample' with the following data
xxx$40052568xxx/99900000002
xxx$40052568xxx/99900000004
xxx$40052568xxx/99900000006
xxx$40052568xxx/99900000008
xxx$40052568xxx/99900000010
xxx$40052568xxx/99900000012
2. Create a hive table using the following script
create table confirmation (account1 STRING, account2 STRING,account3 STRING, account4 STRING, account5 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ( "input.regex" = "((xxx[\$])([0-9]{8})(xxx/)(.*))","output.format.string" = "account1:%1$s account2:%2$s account3:%3$s account4:%4$s account5:%5$s")
STORED AS TEXTFILE;
3. Load data from 'confirmation_sample' into the confirmation table using the following hive command
load data local inpath '/home/sasanka/usecases/data/confirmation_sample' overwrite into table confirmation;
4. Query confirmation table using select * .. ( This command should work and return output as given below)
hive>
> select * from confirmation limit 6;
OK
xxx$40052568xxx/99900000002 xxx$ 40052568 xxx/ 99900000002
xxx$40052568xxx/99900000004 xxx$ 40052568 xxx/ 99900000004
xxx$40052568xxx/99900000006 xxx$ 40052568 xxx/ 99900000006
xxx$40052568xxx/99900000008 xxx$ 40052568 xxx/ 99900000008
xxx$40052568xxx/99900000010 xxx$ 40052568 xxx/ 99900000010
xxx$40052568xxx/99900000012 xxx$ 40052568 xxx/ 99900000012
Time taken: 0.135 seconds
5. query confirmation table using select account1 from ... ( Failing)
sasanka@sasanka:~$ select account1 from confirmation;
bash: syntax error near unexpected token `from'
sasanka@sasanka:~$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/sasanka/hive_job_log_e11c1bd0-a200-4879-b15f-fdde82a1717f_1920840164.txt
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
hive> select account1 from confirmation;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201312061236_0020, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201312061236_0020
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201312061236_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-12-06 17:27:30,247 Stage-1 map = 0%, reduce = 0%
2013-12-06 17:28:01,497 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201312061236_0020 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201312061236_0020
Examining task ID: task_201312061236_0020_m_000002 (and more) from job job_201312061236_0020
Task with the most failures(4):
-----
Task ID:
task_201312061236_0020_m_000000
URL:
http://localhost:50030/taskdetails.jsp?jobid=job_201312061236_0020&tipid=task_201312061236_0020_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Thanks,
Sasanka.