You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Saravanan Nagarajan <sa...@gmail.com> on 2015/12/18 06:18:18 UTC

Issue while running sqoop script parallel

Hi,

Need your expert guidance to resolve a sqoop script Error. We are using
sqoop & invoking TDCH(Teradata Hadoop Connector) to archive data from
Teradata to HADOOP Hive Tables .

We have created a generic Sqoop script which accepts source DB, View
Name,target name as input parameter & loads into Hive Tables. If I try to
parallely invoke the same script with different set of parameters both
instances  of the scripts are failing with below error.



"

Error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
Failed to create file [/user/svc_it/temp_055830/part-m-00000] for
[DFSClient_attempt_1439568235974_1009_m_000000_1_909694314_1] for client
[39.6.64.13], because this file is already being created by
[DFSClient_attempt_1439568235974_1010_m_000000_0_-997224822_1] on
[39.6.64.13]

"


The issue in the Map step & its trying to write the file to
HDFS disk. Looks like one instance is trying to overwrite the files created
by other instance as the Temp folder being created by mapper is with the
same Name (/user/svc_it/temp_05583 )



Please let me know how to fix this issue.


Thanks,
NS Saravanan
https://www.linkedin.com/in/saravanan303

Re: Issue while running sqoop script parallel

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saravanan,

HDFS implements a single-writer model for its files, so if 2 clients concurrently try to open the same file path for write or append, then one of them will receive an error.  It looks to me like tasks from 2 different job submissions collided on the same path.  I think you're on the right track investigating why this application used the same temp directory.  Is the temp directory something that is controlled by the parameters that you pass to your script?  Do you know how the "055830" gets determined in this example?

--Chris Nauroth

From: Saravanan Nagarajan <sa...@gmail.com>>
Date: Thursday, December 17, 2015 at 9:18 PM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>" <us...@sqoop.apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Issue while running sqoop script parallel

Hi,


Need your expert guidance to resolve a sqoop script Error. We are using sqoop & invoking TDCH(Teradata Hadoop Connector) to archive data from Teradata to HADOOP Hive Tables .

We have created a generic Sqoop script which accepts source DB, View Name,target name as input parameter & loads into Hive Tables. If I try to parallely invoke the same script with different set of parameters both instances  of the scripts are failing with below error.



"

Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/user/svc_it/temp_055830/part-m-00000] for [DFSClient_attempt_1439568235974_1009_m_000000_1_909694314_1] for client [39.6.64.13], because this file is already being created by [DFSClient_attempt_1439568235974_1010_m_000000_0_-997224822_1] on [39.6.64.13]

"


The issue in the Map step & its trying to write the file to HDFS disk. Looks like one instance is trying to overwrite the files created by other instance as the Temp folder being created by mapper is with the same Name (/user/svc_it/temp_05583 )



Please let me know how to fix this issue.


Thanks,
NS Saravanan
https://www.linkedin.com/in/saravanan303


Re: Issue while running sqoop script parallel

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saravanan,

HDFS implements a single-writer model for its files, so if 2 clients concurrently try to open the same file path for write or append, then one of them will receive an error.  It looks to me like tasks from 2 different job submissions collided on the same path.  I think you're on the right track investigating why this application used the same temp directory.  Is the temp directory something that is controlled by the parameters that you pass to your script?  Do you know how the "055830" gets determined in this example?

--Chris Nauroth

From: Saravanan Nagarajan <sa...@gmail.com>>
Date: Thursday, December 17, 2015 at 9:18 PM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>" <us...@sqoop.apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Issue while running sqoop script parallel

Hi,


Need your expert guidance to resolve a sqoop script Error. We are using sqoop & invoking TDCH(Teradata Hadoop Connector) to archive data from Teradata to HADOOP Hive Tables .

We have created a generic Sqoop script which accepts source DB, View Name,target name as input parameter & loads into Hive Tables. If I try to parallely invoke the same script with different set of parameters both instances  of the scripts are failing with below error.



"

Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/user/svc_it/temp_055830/part-m-00000] for [DFSClient_attempt_1439568235974_1009_m_000000_1_909694314_1] for client [39.6.64.13], because this file is already being created by [DFSClient_attempt_1439568235974_1010_m_000000_0_-997224822_1] on [39.6.64.13]

"


The issue in the Map step & its trying to write the file to HDFS disk. Looks like one instance is trying to overwrite the files created by other instance as the Temp folder being created by mapper is with the same Name (/user/svc_it/temp_05583 )



Please let me know how to fix this issue.


Thanks,
NS Saravanan
https://www.linkedin.com/in/saravanan303


Re: Issue while running sqoop script parallel

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saravanan,

HDFS implements a single-writer model for its files, so if 2 clients concurrently try to open the same file path for write or append, then one of them will receive an error.  It looks to me like tasks from 2 different job submissions collided on the same path.  I think you're on the right track investigating why this application used the same temp directory.  Is the temp directory something that is controlled by the parameters that you pass to your script?  Do you know how the "055830" gets determined in this example?

--Chris Nauroth

From: Saravanan Nagarajan <sa...@gmail.com>>
Date: Thursday, December 17, 2015 at 9:18 PM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>" <us...@sqoop.apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Issue while running sqoop script parallel

Hi,


Need your expert guidance to resolve a sqoop script Error. We are using sqoop & invoking TDCH(Teradata Hadoop Connector) to archive data from Teradata to HADOOP Hive Tables .

We have created a generic Sqoop script which accepts source DB, View Name,target name as input parameter & loads into Hive Tables. If I try to parallely invoke the same script with different set of parameters both instances  of the scripts are failing with below error.



"

Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/user/svc_it/temp_055830/part-m-00000] for [DFSClient_attempt_1439568235974_1009_m_000000_1_909694314_1] for client [39.6.64.13], because this file is already being created by [DFSClient_attempt_1439568235974_1010_m_000000_0_-997224822_1] on [39.6.64.13]

"


The issue in the Map step & its trying to write the file to HDFS disk. Looks like one instance is trying to overwrite the files created by other instance as the Temp folder being created by mapper is with the same Name (/user/svc_it/temp_05583 )



Please let me know how to fix this issue.


Thanks,
NS Saravanan
https://www.linkedin.com/in/saravanan303


Re: Issue while running sqoop script parallel

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saravanan,

HDFS implements a single-writer model for its files, so if 2 clients concurrently try to open the same file path for write or append, then one of them will receive an error.  It looks to me like tasks from 2 different job submissions collided on the same path.  I think you're on the right track investigating why this application used the same temp directory.  Is the temp directory something that is controlled by the parameters that you pass to your script?  Do you know how the "055830" gets determined in this example?

--Chris Nauroth

From: Saravanan Nagarajan <sa...@gmail.com>>
Date: Thursday, December 17, 2015 at 9:18 PM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>" <us...@sqoop.apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Issue while running sqoop script parallel

Hi,


Need your expert guidance to resolve a sqoop script Error. We are using sqoop & invoking TDCH(Teradata Hadoop Connector) to archive data from Teradata to HADOOP Hive Tables .

We have created a generic Sqoop script which accepts source DB, View Name,target name as input parameter & loads into Hive Tables. If I try to parallely invoke the same script with different set of parameters both instances  of the scripts are failing with below error.



"

Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/user/svc_it/temp_055830/part-m-00000] for [DFSClient_attempt_1439568235974_1009_m_000000_1_909694314_1] for client [39.6.64.13], because this file is already being created by [DFSClient_attempt_1439568235974_1010_m_000000_0_-997224822_1] on [39.6.64.13]

"


The issue in the Map step & its trying to write the file to HDFS disk. Looks like one instance is trying to overwrite the files created by other instance as the Temp folder being created by mapper is with the same Name (/user/svc_it/temp_05583 )



Please let me know how to fix this issue.


Thanks,
NS Saravanan
https://www.linkedin.com/in/saravanan303


Re: Issue while running sqoop script parallel

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Saravanan,

HDFS implements a single-writer model for its files, so if 2 clients concurrently try to open the same file path for write or append, then one of them will receive an error.  It looks to me like tasks from 2 different job submissions collided on the same path.  I think you're on the right track investigating why this application used the same temp directory.  Is the temp directory something that is controlled by the parameters that you pass to your script?  Do you know how the "055830" gets determined in this example?

--Chris Nauroth

From: Saravanan Nagarajan <sa...@gmail.com>>
Date: Thursday, December 17, 2015 at 9:18 PM
To: "user@sqoop.apache.org<ma...@sqoop.apache.org>" <us...@sqoop.apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Issue while running sqoop script parallel

Hi,


Need your expert guidance to resolve a sqoop script Error. We are using sqoop & invoking TDCH(Teradata Hadoop Connector) to archive data from Teradata to HADOOP Hive Tables .

We have created a generic Sqoop script which accepts source DB, View Name,target name as input parameter & loads into Hive Tables. If I try to parallely invoke the same script with different set of parameters both instances  of the scripts are failing with below error.



"

Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/user/svc_it/temp_055830/part-m-00000] for [DFSClient_attempt_1439568235974_1009_m_000000_1_909694314_1] for client [39.6.64.13], because this file is already being created by [DFSClient_attempt_1439568235974_1010_m_000000_0_-997224822_1] on [39.6.64.13]

"


The issue in the Map step & its trying to write the file to HDFS disk. Looks like one instance is trying to overwrite the files created by other instance as the Temp folder being created by mapper is with the same Name (/user/svc_it/temp_05583 )



Please let me know how to fix this issue.


Thanks,
NS Saravanan
https://www.linkedin.com/in/saravanan303