You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Sowjanya Kakarala <so...@agrible.com> on 2018/05/03 16:05:43 UTC

org.apache.hadoop.ipc.RemoteException: when writing data from Sqoop to Hive

Hi everyone,

I Have a general question, about how the mappers are assigned to a job that
are running. I have looked for this in aws documents and apache hadoop
sites for knowledge but still am little confused by seeing the error I am
getting.

Scenario:
So I was running Sqoop commands(75873280 jobs) in aws EC2 instance and I
have 3data nodes and one master node. Each instance have 120GB storage
capacity. And my data is ~10-20GB in total. Configurations with 1
replication, 8 mappers set in "mapreduce.tasktracker.map.tasks.maximum" and
"mapreduce.job.maps" column in mapred-site.xml. and my sqoop command have
number of mappers set to 4.

When I start running these Sqoop job using script file it runs for ~60000
jobs and then I have been getting the below error:


`Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/user/hive/warehouse/test.db/test_12km_p/_SCRATCH0.03130391930415266/time_stamp=2016-05-11/_temporary/1/_temporary/attempt_1525190353973_2740_m_000002_0/part-m-00002
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 3 datanode(s) running and no node(s) are excluded in this operation.`


And after some time, if I rerun the job, instead of restarting the hadoop
deamons, it is executing perfectly for some time and throw the same error
at somepoint again.


Is there any permanent fix for this issue?

Thanks for any suggestions.