You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Syed Akram <ak...@zohocorp.com> on 2015/02/10 13:39:15 UTC

Sqoop2 import for multiple databases

Hi,

      I have one database with 600 tables, Like this i have multiple databases.


If i start doing sqoop import of one database, 


Iam doing following steps:


1. Get SqoopClient object
2. Get MConnection Object for each db
3. Using SqoopClient and MConnection objects , i'm creating jobs for each table in that particular db.(for eg., 600 tables in a db creates 600 jobs with one connectionid)
4. Each job for each table, i submit the job.


This is the process iam following, please let me know if iam going wrong or need to be optimized somewhere in this.


I'm setting "security.maxConnections=50" 


But the problem here is,
it is creating 11 connections each time and importing table to hdfs and again sleeping for almost 15mins(i don't know why its sleeping for 15 mins)
Please suggest me the ideas or optimizations needed


and the process how to import multiple databases and multiple tables of those databases parallelly, 


Please let me know, 


Thanks




Re: Sqoop2 import for multiple databases

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey Syed,

It sounds like you'll need to do a bit of digging to figure this out. I'm
still not sure where the problem is. To figure out where the problem is,
you'll have to watch what is happening in multiple systems:

   - Yarn
   - HDFS
   - RDBMS
   - Sqoop

Some thoughts:

   - Check your Resource Manager UI to see if the jobs are having trouble
   moving from "Accepted" to "Running" state? If it takes more than a couple
   of minutes, something is very wrong.
   - Is the Sqoop2 service reporting the submissions as "BOOTING" for an
   excessive amount of time?
   - Which database are you using? Do they provide table level or database
   level locking? Are they locked?

-Abe

On Tue, Feb 10, 2015 at 4:55 AM, Syed Akram <ak...@zohocorp.com>
wrote:

> Edit:
>
> i took thread dump, it says like this
>
> "pool-1-thread-5" prio=10 tid=0x00002b45a4029000 nid=0x4d80 runnable
> [0x00002b459dc32000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:152)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         - locked <0x00000000fbb263e8> (a java.io.BufferedInputStream)
>         at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>         at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>         at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>         - locked <0x00000000fbb26498> (a
> sun.net.www.protocol.http.HttpURLConnection)
>         at
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>         at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
>         at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
>         at com.sun.jersey.api.client.Client.handle(Client.java:648)
>         at
> org.apache.sqoop.client.request.Request$ServerExceptionFilter.handle(Request.java:86)
>         at
> com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>         at
> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>         at
> com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
>         at org.apache.sqoop.client.request.Request.get(Request.java:63)
>         at
> org.apache.sqoop.client.request.ConnectionRequest.read(ConnectionRequest.java:42)
>         at
> org.apache.sqoop.client.request.SqoopRequests.readConnection(SqoopRequests.java:102)
>         at
> org.apache.sqoop.client.SqoopClient.getConnection(SqoopClient.java:302)
>         at org.apache.sqoop.client.SqoopClient.newJob(SqoopClient.java:361)
>
>
> There are 11 threads , all the 11 threads hav same stack trace mentioned
> above
>
>
> ---- On Tue, 10 Feb 2015 18:09:15 +0530 *<akram.basha@zohocorp.com
> <ak...@zohocorp.com>>* wrote ----
>
>
> Hi,
>
>       I have one database with 600 tables, Like this i have multiple
> databases.
>
> If i start doing sqoop import of one database,
>
> Iam doing following steps:
>
> 1. Get SqoopClient object
> 2. Get MConnection Object for each db
> 3. Using SqoopClient and MConnection objects , i'm creating jobs for each
> table in that particular db.(for eg., 600 tables in a db creates 600 jobs
> with one connectionid)
> 4. Each job for each table, i submit the job.
>
> This is the process iam following, please let me know if iam going wrong
> or need to be optimized somewhere in this.
>
> I'm setting "security.maxConnections=50"
>
> But the problem here is,
> it is creating 11 connections each time and importing table to hdfs and
> again sleeping for almost 15mins(i don't know why its sleeping for 15 mins)
> Please suggest me the ideas or optimizations needed
>
> and the process how to import multiple databases and multiple tables of
> those databases parallelly,
>
> Please let me know,
>
> Thanks
>
>
>
>

Re: Sqoop2 import for multiple databases

Posted by Syed Akram <ak...@zohocorp.com>.
Edit:

i took thread dump, it says like this


"pool-1-thread-5" prio=10 tid=0x00002b45a4029000 nid=0x4d80 runnable [0x00002b459dc32000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:152)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        - locked &lt;0x00000000fbb263e8&gt; (a java.io.BufferedInputStream)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
        - locked &lt;0x00000000fbb26498&gt; (a sun.net.www.protocol.http.HttpURLConnection)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
        at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
        at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
        at com.sun.jersey.api.client.Client.handle(Client.java:648)
        at org.apache.sqoop.client.request.Request$ServerExceptionFilter.handle(Request.java:86)
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
        at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
        at org.apache.sqoop.client.request.Request.get(Request.java:63)
        at org.apache.sqoop.client.request.ConnectionRequest.read(ConnectionRequest.java:42)
        at org.apache.sqoop.client.request.SqoopRequests.readConnection(SqoopRequests.java:102)
        at org.apache.sqoop.client.SqoopClient.getConnection(SqoopClient.java:302)
        at org.apache.sqoop.client.SqoopClient.newJob(SqoopClient.java:361)




There are 11 threads , all the 11 threads hav same stack trace mentioned above



---- On Tue, 10 Feb 2015 18:09:15 +0530 &lt;akram.basha@zohocorp.com&gt; wrote ---- 


Hi,

      I have one database with 600 tables, Like this i have multiple databases.


If i start doing sqoop import of one database, 


Iam doing following steps:


1. Get SqoopClient object
2. Get MConnection Object for each db
3. Using SqoopClient and MConnection objects , i'm creating jobs for each table in that particular db.(for eg., 600 tables in a db creates 600 jobs with one connectionid)
4. Each job for each table, i submit the job.


This is the process iam following, please let me know if iam going wrong or need to be optimized somewhere in this.


I'm setting "security.maxConnections=50" 


But the problem here is,
it is creating 11 connections each time and importing table to hdfs and again sleeping for almost 15mins(i don't know why its sleeping for 15 mins)
Please suggest me the ideas or optimizations needed


and the process how to import multiple databases and multiple tables of those databases parallelly, 


Please let me know, 


Thanks