You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Artem Ervits <ar...@nyp.org> on 2013/02/22 05:07:46 UTC

question about -target-dir

Hello all,

I'm running Sqoop 1.4.2. I'm running into a weird issue when I execute the following command:

sqoop job --exec IncrementalLoad -- -target-dir /PROD/SQOOP/$(date +%Y%m%d%H%M%S)

The table I'm loading has 31 million rows. When job completes, the target-dir still remains to be something like this: _sqoop/21225255558v_tblNameSqoop and not the designated directory. It works as expected on smaller datasets but for some reason on this much data, it doesn't. Also, I just added two nodes to the 4-node clusters, why does Sqoop by default chooses to use 4 input paths and not 6?

Thank you.

Artem Ervits
Artem@nyp.org<ma...@nyp.org>
New York Presbyterian Hospital



--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




RE: question about -target-dir

Posted by Artem Ervits <ar...@nyp.org>.
Thank you, these are indeed some of the things I will need to look at.

-----Original Message-----
From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com] 
Sent: Thursday, March 07, 2013 12:11 PM
To: user@sqoop.apache.org
Subject: Re: question about -target-dir

May be you are traversing through some routers.  You can change the sshd config on the server and ssh config on the client to set TCPKeepAlive, ServerAliveInterval, ServerAliveCountMax  and similarly on the client (ClientAliveInterval, ClientAliveCountMax)

The default TCPKEEPALIVE is 2 hours on some systems (carry over from
old).    May be yours also have it.   Set TCPKeepAlive to 30,
AliveInterval to 30 and AilveCountMax to 10

This tells that after 30 seconds, send a KeepAlive packet every 30 econds for 10 times to determine if the connection is alive


There is another issue that could affect also - it is login timeout in
shells.   If you are using bash, set TMOUT in your .bashrc to 0 or
unset it

Venkat

On Thu, Mar 7, 2013 at 8:28 AM, Artem Ervits <ar...@nyp.org> wrote:
> It has something to do with my ssh connection. Ssh session terminates before the job is finished. Since $(date +%Y%m%d%H%M%S) happens as the last step, and if session is dead by then, it has no way of knowing the final target dir. Is there a way to avoid it? I am using screen tool to start ssh session, then use the sqoop command and then detach the session, that way session is never terminated.
>
> -----Original Message-----
> From: Jarek Jarcec Cecho [mailto:jarcec@apache.org]
> Sent: Saturday, February 23, 2013 1:15 PM
> To: user@sqoop.apache.org
> Subject: Re: question about -target-dir
>
> Hi Artem,
> would you mind sharing entire Sqoop log and associated mapreduce job configuration object? The later can be retrieved in form of XML file from JobTracker web UI.
>
> Jarcec
>
> On Fri, Feb 22, 2013 at 04:07:46AM +0000, Artem Ervits wrote:
>> Hello all,
>>
>> I'm running Sqoop 1.4.2. I'm running into a weird issue when I execute the following command:
>>
>> sqoop job --exec IncrementalLoad -- -target-dir /PROD/SQOOP/$(date 
>> +%Y%m%d%H%M%S)
>>
>> The table I'm loading has 31 million rows. When job completes, the target-dir still remains to be something like this: _sqoop/21225255558v_tblNameSqoop and not the designated directory. It works as expected on smaller datasets but for some reason on this much data, it doesn't. Also, I just added two nodes to the 4-node clusters, why does Sqoop by default chooses to use 4 input paths and not 6?
>>
>> Thank you.
>>
>> Artem Ervits
>> Artem@nyp.org<ma...@nyp.org>
>> New York Presbyterian Hospital
>>
>>
>>
>> --------------------
>>
>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>
>>
>>
>>
>> --------------------
>>
>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>
>>
>>
>
>
> --------------------
>
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>
>
>
>
> --------------------
>
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>
>
>


--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




Re: question about -target-dir

Posted by Venkat Ranganathan <vr...@hortonworks.com>.
May be you are traversing through some routers.  You can change the
sshd config on the server and ssh config on the client to set
TCPKeepAlive, ServerAliveInterval, ServerAliveCountMax  and similarly
on the client (ClientAliveInterval, ClientAliveCountMax)

The default TCPKEEPALIVE is 2 hours on some systems (carry over from
old).    May be yours also have it.   Set TCPKeepAlive to 30,
AliveInterval to 30 and AilveCountMax to 10

This tells that after 30 seconds, send a KeepAlive packet every 30
econds for 10 times to determine if the connection is alive


There is another issue that could affect also - it is login timeout in
shells.   If you are using bash, set TMOUT in your .bashrc to 0 or
unset it

Venkat

On Thu, Mar 7, 2013 at 8:28 AM, Artem Ervits <ar...@nyp.org> wrote:
> It has something to do with my ssh connection. Ssh session terminates before the job is finished. Since $(date +%Y%m%d%H%M%S) happens as the last step, and if session is dead by then, it has no way of knowing the final target dir. Is there a way to avoid it? I am using screen tool to start ssh session, then use the sqoop command and then detach the session, that way session is never terminated.
>
> -----Original Message-----
> From: Jarek Jarcec Cecho [mailto:jarcec@apache.org]
> Sent: Saturday, February 23, 2013 1:15 PM
> To: user@sqoop.apache.org
> Subject: Re: question about -target-dir
>
> Hi Artem,
> would you mind sharing entire Sqoop log and associated mapreduce job configuration object? The later can be retrieved in form of XML file from JobTracker web UI.
>
> Jarcec
>
> On Fri, Feb 22, 2013 at 04:07:46AM +0000, Artem Ervits wrote:
>> Hello all,
>>
>> I'm running Sqoop 1.4.2. I'm running into a weird issue when I execute the following command:
>>
>> sqoop job --exec IncrementalLoad -- -target-dir /PROD/SQOOP/$(date +%Y%m%d%H%M%S)
>>
>> The table I'm loading has 31 million rows. When job completes, the target-dir still remains to be something like this: _sqoop/21225255558v_tblNameSqoop and not the designated directory. It works as expected on smaller datasets but for some reason on this much data, it doesn't. Also, I just added two nodes to the 4-node clusters, why does Sqoop by default chooses to use 4 input paths and not 6?
>>
>> Thank you.
>>
>> Artem Ervits
>> Artem@nyp.org<ma...@nyp.org>
>> New York Presbyterian Hospital
>>
>>
>>
>> --------------------
>>
>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>
>>
>>
>>
>> --------------------
>>
>> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>>
>>
>>
>
>
> --------------------
>
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>
>
>
>
> --------------------
>
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
>
>
>

RE: question about -target-dir

Posted by Artem Ervits <ar...@nyp.org>.
It has something to do with my ssh connection. Ssh session terminates before the job is finished. Since $(date +%Y%m%d%H%M%S) happens as the last step, and if session is dead by then, it has no way of knowing the final target dir. Is there a way to avoid it? I am using screen tool to start ssh session, then use the sqoop command and then detach the session, that way session is never terminated.

-----Original Message-----
From: Jarek Jarcec Cecho [mailto:jarcec@apache.org] 
Sent: Saturday, February 23, 2013 1:15 PM
To: user@sqoop.apache.org
Subject: Re: question about -target-dir

Hi Artem,
would you mind sharing entire Sqoop log and associated mapreduce job configuration object? The later can be retrieved in form of XML file from JobTracker web UI.

Jarcec

On Fri, Feb 22, 2013 at 04:07:46AM +0000, Artem Ervits wrote:
> Hello all,
> 
> I'm running Sqoop 1.4.2. I'm running into a weird issue when I execute the following command:
> 
> sqoop job --exec IncrementalLoad -- -target-dir /PROD/SQOOP/$(date +%Y%m%d%H%M%S)
> 
> The table I'm loading has 31 million rows. When job completes, the target-dir still remains to be something like this: _sqoop/21225255558v_tblNameSqoop and not the designated directory. It works as expected on smaller datasets but for some reason on this much data, it doesn't. Also, I just added two nodes to the 4-node clusters, why does Sqoop by default chooses to use 4 input paths and not 6?
> 
> Thank you.
> 
> Artem Ervits
> Artem@nyp.org<ma...@nyp.org>
> New York Presbyterian Hospital
> 
> 
> 
> --------------------
> 
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
> 
> 
> 
> 
> --------------------
> 
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
> 
> 
> 


--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.




Re: question about -target-dir

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Artem,
would you mind sharing entire Sqoop log and associated mapreduce job configuration object? The later can be retrieved in form of XML file from JobTracker web UI.

Jarcec

On Fri, Feb 22, 2013 at 04:07:46AM +0000, Artem Ervits wrote:
> Hello all,
> 
> I'm running Sqoop 1.4.2. I'm running into a weird issue when I execute the following command:
> 
> sqoop job --exec IncrementalLoad -- -target-dir /PROD/SQOOP/$(date +%Y%m%d%H%M%S)
> 
> The table I'm loading has 31 million rows. When job completes, the target-dir still remains to be something like this: _sqoop/21225255558v_tblNameSqoop and not the designated directory. It works as expected on smaller datasets but for some reason on this much data, it doesn't. Also, I just added two nodes to the 4-node clusters, why does Sqoop by default chooses to use 4 input paths and not 6?
> 
> Thank you.
> 
> Artem Ervits
> Artem@nyp.org<ma...@nyp.org>
> New York Presbyterian Hospital
> 
> 
> 
> --------------------
> 
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
> 
> 
> 
> 
> --------------------
> 
> This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message.  Thank you.
> 
> 
>