You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Artem Ervits <ar...@nyp.org> on 2013/02/12 16:26:50 UTC

Calling Sqoop incremental job from Map/Reduce code

Hello all,

I'd like to know if there's a way to execute an incremental job from a map/reduce program. If there is a way, please point to a user guide I can take a look at to achieve it. In case it is possible, does Sqoop need to be installed on every node of the Hadoop cluster? I'm aware of the fact that Oozie would be able to achieve this but I was wondering if there are other ways. Right now I have a script that first calls the Sqoop job and then executes the M/R job.

Thank you.

Artem Ervits
New York Presbyterian Hospital



--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




RE: Calling Sqoop incremental job from Map/Reduce code

Posted by Artem Ervits <ar...@nyp.org>.
Thank you Jarek, I can now see where I went wrong with my idea.

-----Original Message-----
From: Jarek Jarcec Cecho [mailto:jarcec@apache.org] 
Sent: Tuesday, February 12, 2013 10:44 AM
To: user@sqoop.apache.org
Subject: Re: Calling Sqoop incremental job from Map/Reduce code

Hi Artem,
would you mind describe your use case in a more details? I'm especially interested to know more about what do you mean by executing from map/reduce program.

Sqoop itself will span a mapreduce job, so executing it from another map/reduce job do not make much sense as you would get exponencial load. Imagine 50 mappers where each will span Sqoop job that will again span 50 mappers, thats 50 * 50 = 2500 running map tasks that most likely would kill your remote database. Thus it might be more appropriate to execute Sqoop prior running your mapreduce job as you've mentioned that you're already doing.

About your question whether Sqoop needs to be installed on each node, it do not. Hadoop is providing facility called DistributedCache [1] that allows you to distribute arbitrary files with your job. The benefit is that jars will be automatically added to application classpath.

Jarcec

Links:
1: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

On Tue, Feb 12, 2013 at 03:26:50PM +0000, Artem Ervits wrote:
> Hello all,
> 
> I'd like to know if there's a way to execute an incremental job from a map/reduce program. If there is a way, please point to a user guide I can take a look at to achieve it. In case it is possible, does Sqoop need to be installed on every node of the Hadoop cluster? I'm aware of the fact that Oozie would be able to achieve this but I was wondering if there are other ways. Right now I have a script that first calls the Sqoop job and then executes the M/R job.
> 
> Thank you.
> 
> Artem Ervits
> New York Presbyterian Hospital
> 
> 
> 
> --------------------
> 
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
> 
> 
> 
> 
> --------------------
> 
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
> 
> 
> 


--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




Re: Calling Sqoop incremental job from Map/Reduce code

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Artem,
would you mind describe your use case in a more details? I'm especially interested to know more about what do you mean by executing from map/reduce program.

Sqoop itself will span a mapreduce job, so executing it from another map/reduce job do not make much sense as you would get exponencial load. Imagine 50 mappers where each will span Sqoop job that will again span 50 mappers, thats 50 * 50 = 2500 running map tasks that most likely would kill your remote database. Thus it might be more appropriate to execute Sqoop prior running your mapreduce job as you've mentioned that you're already doing.

About your question whether Sqoop needs to be installed on each node, it do not. Hadoop is providing facility called DistributedCache [1] that allows you to distribute arbitrary files with your job. The benefit is that jars will be automatically added to application classpath.

Jarcec

Links:
1: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

On Tue, Feb 12, 2013 at 03:26:50PM +0000, Artem Ervits wrote:
> Hello all,
> 
> I'd like to know if there's a way to execute an incremental job from a map/reduce program. If there is a way, please point to a user guide I can take a look at to achieve it. In case it is possible, does Sqoop need to be installed on every node of the Hadoop cluster? I'm aware of the fact that Oozie would be able to achieve this but I was wondering if there are other ways. Right now I have a script that first calls the Sqoop job and then executes the M/R job.
> 
> Thank you.
> 
> Artem Ervits
> New York Presbyterian Hospital
> 
> 
> 
> --------------------
> 
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
> 
> 
> 
> 
> --------------------
> 
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
> 
> 
>