You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sasanka Vemavarapu <sa...@hotmail.com> on 2013/12/06 13:00:13 UTC

RegexSerDe Issue

Hi,

I have a HIVE table with ROW FORMAT SERDE, loaded data into it, could successfully query all columns, but am unable to query specific columns. Could you please help me understand this issue about why I am unable to query individual columns? If possible provide me a solution.

Steps to Replicate the Issue are as follows:

1. Create a text file 'confirmation_sample' with the following data

xxx$40052568xxx/99900000002
xxx$40052568xxx/99900000004
xxx$40052568xxx/99900000006
xxx$40052568xxx/99900000008
xxx$40052568xxx/99900000010
xxx$40052568xxx/99900000012

2. Create a hive table using the following script

create table confirmation (account1 STRING, account2 STRING,account3 STRING, account4 STRING, account5 STRING) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ( "input.regex" = "((xxx[\$])([0-9]{8})(xxx/)(.*))","output.format.string" = "account1:%1$s account2:%2$s account3:%3$s account4:%4$s account5:%5$s")
STORED AS TEXTFILE;

3. Load data from 'confirmation_sample' into the confirmation table using the following hive command

load data local inpath '/home/sasanka/usecases/data/confirmation_sample' overwrite into table confirmation;

4. Query confirmation table using select * .. ( This command should work and return output as given below)
     
     hive>
    > select * from confirmation limit 6;
    OK
    xxx$40052568xxx/99900000002    xxx$    40052568    xxx/    99900000002
    xxx$40052568xxx/99900000004    xxx$    40052568    xxx/    99900000004
    xxx$40052568xxx/99900000006    xxx$    40052568    xxx/    99900000006
    xxx$40052568xxx/99900000008    xxx$    40052568    xxx/    99900000008
    xxx$40052568xxx/99900000010    xxx$    40052568    xxx/    99900000010
    xxx$40052568xxx/99900000012    xxx$    40052568    xxx/    99900000012
    Time taken: 0.135 seconds

5. query confirmation table using select account1 from ... ( Failing)

sasanka@sasanka:~$ select account1 from confirmation;
bash: syntax error near unexpected token `from'
sasanka@sasanka:~$ hive 
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/sasanka/hive_job_log_e11c1bd0-a200-4879-b15f-fdde82a1717f_1920840164.txt
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
hive> select account1 from confirmation;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201312061236_0020, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201312061236_0020
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_201312061236_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-12-06 17:27:30,247 Stage-1 map = 0%,  reduce = 0%
2013-12-06 17:28:01,497 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201312061236_0020 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201312061236_0020
Examining task ID: task_201312061236_0020_m_000002 (and more) from job job_201312061236_0020

Task with the most failures(4): 
-----
Task ID:
  task_201312061236_0020_m_000000

URL:
  http://localhost:50030/taskdetails.jsp?jobid=job_201312061236_0020&tipid=task_201312061236_0020_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec



Thanks,
Sasanka.