You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tez.apache.org by Артем Великородный <ar...@gmail.com> on 2017/02/06 11:33:54 UTC

Hive 2.1.1 on Tez engine with Sqoop - Error while Run sqoop import as parquet files

I tried to import some data to Hive as parquet through sqoop using this
command:

sqoop import --connect jdbc:mysql://node1:3306/sqoop --username root
--password 123456 --table devidents --hive-import --hive-table
galinqewra --create-hive-table -m 1 --as-parquetfile

in mapred-site.xml i set mapreduce.framework.name to yarn-tez
and
in hive-site.xml hive.execution.engine to tez

and it fails with this exception:

17/02/03 01:07:45 INFO client.TezClient: Submitting DAG to YARN,
applicationId=application_1486051443218_0001,
dagName=codegen_devidents.jar
17/02/03 01:07:46 INFO impl.YarnClientImpl: Submitted application
application_1486051443218_0001
17/02/03 01:07:46 INFO client.TezClient: The url to track the Tez AM:
http://node1:8088/proxy/application_1486051443218_0001/
17/02/03 01:07:59 INFO mapreduce.Job: The url to track the job:
http://node1:8088/proxy/application_1486051443218_0001/
17/02/03 01:07:59 INFO mapreduce.Job: Running job: job_1486051443218_0001
17/02/03 01:08:00 INFO mapreduce.Job: Job job_1486051443218_0001
running in uber mode : false
17/02/03 01:08:00 INFO mapreduce.Job:  map 0% reduce 0%
17/02/03 01:08:27 INFO mapreduce.Job: Job job_1486051443218_0001
failed with state FAILED due to: Vertex failed, vertexName=initialmap,
vertexId=vertex_1486051443218_0001_1_00, diagnostics=[Task failed,
taskId=task_1486051443218_0001_1_00_000000, diagnostics=[TaskAttempt 0
failed, info=[Error: Error while running task ( failure ) :
attempt_1486051443218_0001_1_00_000000_0:org.kitesdk.data.DatasetNotFoundException:
Descriptor location does not exist:
hdfs:/tmp/default/.temp/job_14860514432180_0001/mr/job_14860514432180_0001/.metadata
	at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562)
	at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605)
	at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114)
	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
	at org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448)
	at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:399)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:533)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:516)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:501)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:0, Vertex vertex_1486051443218_0001_1_00 [initialmap]
killed/failed due to:OWN_TASK_FAILURE]. DAG did not succeed due to
VERTEX_FAILURE. failedVertices:1 killedVertices:0
17/02/03 01:08:27 INFO mapreduce.Job: Counters: 0
17/02/03 01:08:27 WARN mapreduce.Counters: Group FileSystemCounters is
deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
17/02/03 01:08:27 INFO mapreduce.ImportJobBase: Transferred 0 bytes in
63.4853 seconds (0 bytes/sec)
17/02/03 01:08:27 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
17/02/03 01:08:27 INFO mapreduce.ImportJobBase: Retrieved 0 records.
17/02/03 01:08:27 ERROR tool.ImportTool: Error during import: Import job failed!

Hive table is created but no any data in it.

if i start job on MapReduce mode it successfully completed
also it pass if i run without '--as-parquetfile'

any suggestions?

Re: Hive 2.1.1 on Tez engine with Sqoop - Error while Run sqoop import as parquet files

Posted by Siddharth Seth <ss...@apache.org>.

It's tough to say what is going on here. Who generates the .metadata file.
What are the contents of this directory on HDFS? (This look like the
mapreduce staging directory)
Tez mode for MR jobs will work in most cases, but it is not completely
compatible with all MR jobs.

On Mon, Feb 6, 2017 at 3:33 AM, Артем Великородный <ar...@gmail.com>
wrote:

> I tried to import some data to Hive as parquet through sqoop using this
> command:
>
> sqoop import --connect jdbc:mysql://node1:3306/sqoop --username root --password 123456 --table devidents --hive-import --hive-table galinqewra --create-hive-table -m 1 --as-parquetfile
>
> in mapred-site.xml i set mapreduce.framework.name to yarn-tez
> and
> in hive-site.xml hive.execution.engine to tez
>
> and it fails with this exception:
>
> 17/02/03 01:07:45 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1486051443218_0001, dagName=codegen_devidents.jar
> 17/02/03 01:07:46 INFO impl.YarnClientImpl: Submitted application application_1486051443218_0001
> 17/02/03 01:07:46 INFO client.TezClient: The url to track the Tez AM: http://node1:8088/proxy/application_1486051443218_0001/
> 17/02/03 <http://node1:8088/proxy/application_1486051443218_0001/17/02/03> 01:07:59 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1486051443218_0001/
> 17/02/03 <http://node1:8088/proxy/application_1486051443218_0001/17/02/03> 01:07:59 INFO mapreduce.Job: Running job: job_1486051443218_0001
> 17/02/03 01:08:00 INFO mapreduce.Job: Job job_1486051443218_0001 running in uber mode : false
> 17/02/03 01:08:00 INFO mapreduce.Job:  map 0% reduce 0%
> 17/02/03 01:08:27 INFO mapreduce.Job: Job job_1486051443218_0001 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1486051443218_0001_1_00, diagnostics=[Task failed, taskId=task_1486051443218_0001_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1486051443218_0001_1_00_000000_0:org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs:/tmp/default/.temp/job_14860514432180_0001/mr/job_14860514432180_0001/.metadata
> 	at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562)
> 	at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605)
> 	at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
> 	at org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615)
> 	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448)
> 	at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:399)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:533)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:516)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:501)
> 	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1486051443218_0001_1_00 [initialmap] killed/failed due to:OWN_TASK_FAILURE]. DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> 17/02/03 01:08:27 INFO mapreduce.Job: Counters: 0
> 17/02/03 01:08:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
> 17/02/03 01:08:27 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 63.4853 seconds (0 bytes/sec)
> 17/02/03 01:08:27 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
> 17/02/03 01:08:27 INFO mapreduce.ImportJobBase: Retrieved 0 records.
> 17/02/03 01:08:27 ERROR tool.ImportTool: Error during import: Import job failed!
>
> Hive table is created but no any data in it.
>
> if i start job on MapReduce mode it successfully completed
> also it pass if i run without '--as-parquetfile'
>
> any suggestions?
>