You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Aniket Daoo <ad...@infocepts.com> on 2012/09/18 17:14:54 UTC
Issue with Inserting/Selecting Data From a ROW FORMAT SERDE table
Hi,
I have a ROW FORMAT SERDE table created using the following DDL.
CREATE external TABLE multivalset_u6
(
col1 string,
col2 string,
col3 string,
col4 string,
col5 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES
(
"input.regex" = "(.*)\\t(.*)\\t~([0-9]{6})~(.*)~(.*)",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
STORED AS TEXTFILE
LOCATION '/user/admin/u6/parsed/';
The above table LOCATION contains the file to be read by this table. I need the original file to be parsed and stored as a tab delimited file with 5 columns.
Sample row from original file:
02-15-2012-11:34:56 873801356593332362 ~3261961~1~10.0
Sample row from the expected parsed file:
02-15-2012-11:34:56 873801356593332362 3261961 1 10.0
To do this, I was trying to create a table with 5 columns at another location and insert data from the table multivalset_u6 into it. I encountered the following message on the console while doing so.
Ended Job = job_201209171421_0029 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201209171421_0029_m_000002 (and more) from job job_201209171421_0029
Exception in thread "Thread-47" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://10.40.35.54:9103/tasklog?taskid=attempt_201209171421_0029_m_000000_0&start=-8193
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.URL.openStream(URL.java:1010)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more
Counters:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I have observed that when I execute a SELECT * FROM multivalset_u6, I get the output with all the columns as expected. However, on executing a SELECT on individual columns like SELECT col1, col2, col3, col4, col5 FROM multivalset_u6, a similar error message appears.
Am I missing something here? Is there a way to work around this?
Thanks,
Aniket
Re: Issue with Inserting/Selecting Data From a ROW FORMAT SERDE table
Posted by MiaoMiao <li...@gmail.com>.
The uri provided in your log is wrong, check
http://10.40.35.54:9103/tasklog?attemptid=attempt_201209171421_0029_m_000000_0&start=-8193
Shouldn't this misleading uri issue be a bug?
On Tue, Sep 18, 2012 at 11:14 PM, Aniket Daoo <ad...@infocepts.com> wrote:
> Hi,
>
>
>
> I have a ROW FORMAT SERDE table created using the following DDL.
>
>
>
> CREATE external TABLE multivalset_u6
>
> (
>
> col1 string,
>
> col2 string,
>
> col3 string,
>
> col4 string,
>
> col5 string
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>
> WITH SERDEPROPERTIES
>
> (
>
> "input.regex" = "(.*)\\t(.*)\\t~([0-9]{6})~(.*)~(.*)",
>
> "output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
>
> )
>
> STORED AS TEXTFILE
>
> LOCATION '/user/admin/u6/parsed/';
>
>
>
> The above table LOCATION contains the file to be read by this table. I need
> the original file to be parsed and stored as a tab delimited file with 5
> columns.
>
>
>
> Sample row from original file:
>
> 02-15-2012-11:34:56 873801356593332362 ~3261961~1~10.0
>
>
>
> Sample row from the expected parsed file:
>
> 02-15-2012-11:34:56 873801356593332362 3261961 1 10.0
>
>
>
> To do this, I was trying to create a table with 5 columns at another
> location and insert data from the table multivalset_u6 into it. I
> encountered the following message on the console while doing so.
>
>
>
> Ended Job = job_201209171421_0029 with errors
>
> Error during job, obtaining debugging information...
>
> Examining task ID: task_201209171421_0029_m_000002 (and more) from job
> job_201209171421_0029
>
> Exception in thread "Thread-47" java.lang.RuntimeException: Error while
> reading from task log url
>
> at
> org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
>
> at
> org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
>
> at
> org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
>
> at java.lang.Thread.run(Thread.java:662)
>
> Caused by: java.io.IOException: Server returned HTTP response code: 400 for
> URL:
> http://10.40.35.54:9103/tasklog?taskid=attempt_201209171421_0029_m_000000_0&start=-8193
>
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
>
> at java.net.URL.openStream(URL.java:1010)
>
> at
> org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
>
> ... 3 more
>
> Counters:
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> MapReduce Jobs Launched:
>
> Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
>
> Total MapReduce CPU Time Spent: 0 msec
>
>
>
> I have observed that when I execute a SELECT * FROM multivalset_u6, I get
> the output with all the columns as expected. However, on executing a SELECT
> on individual columns like SELECT col1, col2, col3, col4, col5 FROM
> multivalset_u6, a similar error message appears.
>
>
>
> Am I missing something here? Is there a way to work around this?
>
>
>
> Thanks,
>
> Aniket