You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by "Sethuramaswamy, Suresh " <su...@credit-suisse.com> on 2014/08/14 16:57:40 UTC

Sqoop Import parallel sessions - Question

Team,

We had to initiate Sqoop import for a month old records in a session, similarly I need to initiate 12 such statements in parallel in order to read 1 year worth of data, while I do this,

I keep getting the error <SCHEMA>.<TABLENAME> folder already exists.  This is because of all these sessions being initiated with same uid and the mapred temporary hdfs folder under the user's home directory until it completes.

Is there a better option for me to accomplish .?


Thanks
Suresh Sethuramaswamy



=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================

RE: Sqoop Import parallel sessions - Question

Posted by "Sethuramaswamy, Suresh " <su...@credit-suisse.com>.

Thanks for the suggestions Gwen Shapira. 



-----Original Message-----
From: Gwen Shapira [mailto:gshapira@cloudera.com] 
Sent: Thursday, August 14, 2014 12:24 PM
To: user@sqoop.apache.org
Subject: Re: Sqoop Import parallel sessions - Question

Sqoop needs to write to a directory that doesn't exist yet. Since both
your jobs try to write a single directory, one will complain that the
directory exists.

You can use --warehouse-dir  or --target-dir parameters to make sure
each job writes to its own directory.
Or, you can use --partition-key and --partition value parameters to
import the data into separate Hive partitions (makes sense from table
design perspective too)

On Thu, Aug 14, 2014 at 9:12 AM, Sethuramaswamy, Suresh
<su...@credit-suisse.com> wrote:
> Sure.
>
> This is my command. When I run 2 commands in parallel , I get the exception as mentioned below.
>
> sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table <Table_name>   --where "date between '01-JAN-2013' and '30-JAN-2013'" -m 1 --hive-import  --hive-table <hive tablename>  --compression-codec org.apache.hadoop.io.compress.SnappyCodec --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims;
>
> ...
> ...
>
> ..
>
> sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table <Table_name>   --where "date between '01-DEC-2013' and '31-DEC-2013'" -m 1 --hive-import  --hive-table <hive tablename>  --compression-codec org.apache.hadoop.io.compress.SnappyCodec --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims;
>
>
>
> Exception:
>
>
> 14/08/14 12:04:57 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <SCHEMA>.<TABLENAME> already exists
>         at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:987)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
>         at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
>         at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
>         at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
>         at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:614)
>         at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:436)
>         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
>
>
> -----Original Message-----
> From: Jarek Jarcec Cecho [mailto:jarcec@gmail.com] On Behalf Of Jarek Jarcec Cecho
> Sent: Thursday, August 14, 2014 11:41 AM
> To: user@sqoop.apache.org
> Subject: Re: Sqoop Import parallel sessions - Question
>
> It would be helpful if you could share your entire Sqoop commands and the exact exception with it's stack trace.
>
> Jarcec
>
> On Aug 14, 2014, at 7:57 AM, Sethuramaswamy, Suresh <su...@credit-suisse.com> wrote:
>
>> Team,
>>
>> We had to initiate Sqoop import for a month old records in a session, similarly I need to initiate 12 such statements in parallel in order to read 1 year worth of data, while I do this,
>>
>> I keep getting the error <SCHEMA>.<TABLENAME> folder already exists.  This is because of all these sessions being initiated with same uid and the mapred temporary hdfs folder under the user's home directory until it completes.
>>
>> Is there a better option for me to accomplish .?
>>
>>
>> Thanks
>> Suresh Sethuramaswamy
>>
>>
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>> ==============================================================================
>
>
>
> ===============================================================================
> Please access the attached hyperlink for an important electronic communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===============================================================================
>


=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================

Re: Sqoop Import parallel sessions - Question

Posted by Gwen Shapira <gs...@cloudera.com>.

Sqoop needs to write to a directory that doesn't exist yet. Since both
your jobs try to write a single directory, one will complain that the
directory exists.

You can use --warehouse-dir  or --target-dir parameters to make sure
each job writes to its own directory.
Or, you can use --partition-key and --partition value parameters to
import the data into separate Hive partitions (makes sense from table
design perspective too)

On Thu, Aug 14, 2014 at 9:12 AM, Sethuramaswamy, Suresh
<su...@credit-suisse.com> wrote:
> Sure.
>
> This is my command. When I run 2 commands in parallel , I get the exception as mentioned below.
>
> sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table <Table_name>   --where "date between '01-JAN-2013' and '30-JAN-2013'" -m 1 --hive-import  --hive-table <hive tablename>  --compression-codec org.apache.hadoop.io.compress.SnappyCodec --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims;
>
> ...
> ...
>
> ..
>
> sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table <Table_name>   --where "date between '01-DEC-2013' and '31-DEC-2013'" -m 1 --hive-import  --hive-table <hive tablename>  --compression-codec org.apache.hadoop.io.compress.SnappyCodec --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims;
>
>
>
> Exception:
>
>
> 14/08/14 12:04:57 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <SCHEMA>.<TABLENAME> already exists
>         at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:987)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
>         at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
>         at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
>         at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
>         at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:614)
>         at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:436)
>         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
>
>
> -----Original Message-----
> From: Jarek Jarcec Cecho [mailto:jarcec@gmail.com] On Behalf Of Jarek Jarcec Cecho
> Sent: Thursday, August 14, 2014 11:41 AM
> To: user@sqoop.apache.org
> Subject: Re: Sqoop Import parallel sessions - Question
>
> It would be helpful if you could share your entire Sqoop commands and the exact exception with it's stack trace.
>
> Jarcec
>
> On Aug 14, 2014, at 7:57 AM, Sethuramaswamy, Suresh <su...@credit-suisse.com> wrote:
>
>> Team,
>>
>> We had to initiate Sqoop import for a month old records in a session, similarly I need to initiate 12 such statements in parallel in order to read 1 year worth of data, while I do this,
>>
>> I keep getting the error <SCHEMA>.<TABLENAME> folder already exists.  This is because of all these sessions being initiated with same uid and the mapred temporary hdfs folder under the user's home directory until it completes.
>>
>> Is there a better option for me to accomplish .?
>>
>>
>> Thanks
>> Suresh Sethuramaswamy
>>
>>
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>> ==============================================================================
>
>
>
> ===============================================================================
> Please access the attached hyperlink for an important electronic communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===============================================================================
>

RE: Sqoop Import parallel sessions - Question

Posted by "Sethuramaswamy, Suresh " <su...@credit-suisse.com>.

Sure.

This is my command. When I run 2 commands in parallel , I get the exception as mentioned below.

sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table <Table_name>   --where "date between '01-JAN-2013' and '30-JAN-2013'" -m 1 --hive-import  --hive-table <hive tablename>  --compression-codec org.apache.hadoop.io.compress.SnappyCodec --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims;

...
...

..

sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table <Table_name>   --where "date between '01-DEC-2013' and '31-DEC-2013'" -m 1 --hive-import  --hive-table <hive tablename>  --compression-codec org.apache.hadoop.io.compress.SnappyCodec --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims;



Exception: 


14/08/14 12:04:57 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <SCHEMA>.<TABLENAME> already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:987)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:614)
        at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:436)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:240)


-----Original Message-----
From: Jarek Jarcec Cecho [mailto:jarcec@gmail.com] On Behalf Of Jarek Jarcec Cecho
Sent: Thursday, August 14, 2014 11:41 AM
To: user@sqoop.apache.org
Subject: Re: Sqoop Import parallel sessions - Question

It would be helpful if you could share your entire Sqoop commands and the exact exception with it's stack trace.

Jarcec

On Aug 14, 2014, at 7:57 AM, Sethuramaswamy, Suresh <su...@credit-suisse.com> wrote:

> Team,
>  
> We had to initiate Sqoop import for a month old records in a session, similarly I need to initiate 12 such statements in parallel in order to read 1 year worth of data, while I do this,
>  
> I keep getting the error <SCHEMA>.<TABLENAME> folder already exists.  This is because of all these sessions being initiated with same uid and the mapred temporary hdfs folder under the user's home directory until it completes.
>  
> Is there a better option for me to accomplish .?
>  
>  
> Thanks
> Suresh Sethuramaswamy
>  
> 
> 
> ==============================================================================
> Please access the attached hyperlink for an important electronic communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ==============================================================================



=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================

Re: Sqoop Import parallel sessions - Question

Posted by Jarek Jarcec Cecho <ja...@apache.org>.

It would be helpful if you could share your entire Sqoop commands and the exact exception with it’s stack trace.

Jarcec

On Aug 14, 2014, at 7:57 AM, Sethuramaswamy, Suresh <su...@credit-suisse.com> wrote:

> Team,
>  
> We had to initiate Sqoop import for a month old records in a session, similarly I need to initiate 12 such statements in parallel in order to read 1 year worth of data, while I do this,
>  
> I keep getting the error <SCHEMA>.<TABLENAME> folder already exists.  This is because of all these sessions being initiated with same uid and the mapred temporary hdfs folder under the user’s home directory until it completes.
>  
> Is there a better option for me to accomplish .?
>  
>  
> Thanks
> Suresh Sethuramaswamy
>  
> 
> 
> ==============================================================================
> Please access the attached hyperlink for an important electronic communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ==============================================================================