You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by "lizhanqiang@inspur.com" <li...@inspur.com> on 2014/09/09 05:33:16 UTC

the confusion of --split-by parameter

Hi,all.
   In sqoop we can specify the parameter --split-by,which can determine which field we will use to split map recored.
But if the split field's data is skew.The workload between maps will be imbalance.I want to know why sqoop does not use 
select count(*) from table/num-maps to determine each map's workload.As I know some other base class of  DataDrivenDBInputFormat's
has the implementation of select count(*) from table/num-maps.Then why sqoop override this.



Re: RE: the confusion of --split-by parameter

Posted by "lizhanqiang@inspur.com" <li...@inspur.com>.
Hi,there:
 Thank you for your reply.The Oraoop connector is excellent.
 I tried what you debet.MySQL's limit will scan the full table,but if I use order by before limit it will just scan (limit+offset) length recored.





顺祝商祺
---------------------------------------------
李占强
浪潮(北京)电子信息产业有限公司
系统软件部
地址:浪潮路1036号浪潮科技园S05楼北楼2层 
手机:15315572926
 
From: David Robson
Date: 2014-09-10 11:33
To: user@sqoop.apache.org
Subject: RE: Re: the confusion of --split-by parameter
In regards to Oracle – with the addition of the direct connector you can split by ROWID, or by partition. This is much faster than using min/max boundaries.
 
I do not know the internals of MySQL – but limit/offset queries would most likely need to sort the data to implement this – so would potentially have an additional overhead.
 
What database are you using? I guess the current splitting by the minimum and maximum value of the column could be considered the generic way of doing it – then each database should implement a custom method. So we wrote the direct connector for Oracle to take advantage of Oracles features and make it better. So if someone could work out a better way of doing it for say MySQL or PostgreSQL then they could enhance the connector for that particular database. I know there is connectors for various databases – but I can’t comment on whether it could be done more efficiently as I have only focused on the Oracle connector. You could try enhancing a connector on a database you are looking at and submit it as a patch if you find a more efficient method.
 
If you are using Oracle – you should try the direct connector in 1.4.5 (formerly known as OraOop) as this doesn’t require a split by column.
 
From: Abraham Elmahrek [mailto:abe@cloudera.com] 
Sent: Wednesday, 10 September 2014 12:00 PM
To: user@sqoop.apache.org
Subject: Re: Re: the confusion of --split-by parameter
 
Good point. The only thing I can think of is that offsets might be slower (since the DB has to scan and keep a count internally) and the expectation that certain ranges of data end up in certain files (though I doubt this one). I'll defer this one to the broader community as I'm not sure myself.
 
On Tue, Sep 9, 2014 at 5:31 PM, lizhanqiang@inspur.com <li...@inspur.com> wrote:
 
 
Hey,brother.
  Glad to hear from you!I think we can use limit/offset(if the database support this operation),or we can use sub-selection(if the database does not support limint/offset)
For example:
For MySQL:select * from table limiit 0,5;select * from table limit 6,10...
For Oracle we can use rownum 
I just can not understand why sqoop override this opreation above.This override can lead to data skew.
 
From: Abraham Elmahrek
Date: 2014-09-10 00:38
To: user@sqoop.apache.org
Subject: Re: the confusion of --split-by parameter
Hey there,
 
For databases, there needs to be a way to actually infer boundaries for a particular column. Simply performing a "select *" would not be enough because we would not know how to query the database.
 
-Abe
 
On Mon, Sep 8, 2014 at 8:33 PM, lizhanqiang@inspur.com <li...@inspur.com> wrote:
Hi,all.
   In sqoop we can specify the parameter --split-by,which can determine which field we will use to split map recored.
But if the split field's data is skew.The workload between maps will be imbalance.I want to know why sqoop does not use 
select count(*) from table/num-maps to determine each map's workload.As I know some other base class of  DataDrivenDBInputFormat's
has the implementation of select count(*) from table/num-maps.Then why sqoop override this.
 
 
 
 

RE: Re: the confusion of --split-by parameter

Posted by David Robson <Da...@software.dell.com>.
In regards to Oracle – with the addition of the direct connector you can split by ROWID, or by partition. This is much faster than using min/max boundaries.

I do not know the internals of MySQL – but limit/offset queries would most likely need to sort the data to implement this – so would potentially have an additional overhead.

What database are you using? I guess the current splitting by the minimum and maximum value of the column could be considered the generic way of doing it – then each database should implement a custom method. So we wrote the direct connector for Oracle to take advantage of Oracles features and make it better. So if someone could work out a better way of doing it for say MySQL or PostgreSQL then they could enhance the connector for that particular database. I know there is connectors for various databases – but I can’t comment on whether it could be done more efficiently as I have only focused on the Oracle connector. You could try enhancing a connector on a database you are looking at and submit it as a patch if you find a more efficient method.

If you are using Oracle – you should try the direct connector in 1.4.5 (formerly known as OraOop) as this doesn’t require a split by column.

From: Abraham Elmahrek [mailto:abe@cloudera.com]
Sent: Wednesday, 10 September 2014 12:00 PM
To: user@sqoop.apache.org
Subject: Re: Re: the confusion of --split-by parameter

Good point. The only thing I can think of is that offsets might be slower (since the DB has to scan and keep a count internally) and the expectation that certain ranges of data end up in certain files (though I doubt this one). I'll defer this one to the broader community as I'm not sure myself.

On Tue, Sep 9, 2014 at 5:31 PM, lizhanqiang@inspur.com<ma...@inspur.com> <li...@inspur.com>> wrote:


Hey,brother.
  Glad to hear from you!I think we can use limit/offset(if the database support this operation),or we can use sub-selection(if the database does not support limint/offset)
For example:
For MySQL:select * from table limiit 0,5;select * from table limit 6,10...
For Oracle we can use rownum
I just can not understand why sqoop override this opreation above.This override can lead to data skew.

From: Abraham Elmahrek<ma...@cloudera.com>
Date: 2014-09-10 00:38
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: the confusion of --split-by parameter
Hey there,

For databases, there needs to be a way to actually infer boundaries for a particular column. Simply performing a "select *" would not be enough because we would not know how to query the database.

-Abe

On Mon, Sep 8, 2014 at 8:33 PM, lizhanqiang@inspur.com<ma...@inspur.com> <li...@inspur.com>> wrote:
Hi,all.
   In sqoop we can specify the parameter --split-by,which can determine which field we will use to split map recored.
But if the split field's data is skew.The workload between maps will be imbalance.I want to know why sqoop does not use
select count(*) from table/num-maps to determine each map's workload.As I know some other base class of  DataDrivenDBInputFormat's
has the implementation of select count(*) from table/num-maps.Then why sqoop override this.





Re: Re: the confusion of --split-by parameter

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Good point. The only thing I can think of is that offsets might be slower
(since the DB has to scan and keep a count internally) and the expectation
that certain ranges of data end up in certain files (though I doubt this
one). I'll defer this one to the broader community as I'm not sure myself.

On Tue, Sep 9, 2014 at 5:31 PM, lizhanqiang@inspur.com <
lizhanqiang@inspur.com> wrote:

>
>
> Hey,brother.
>   Glad to hear from you!I think we can use limit/offset(if the database
> support this operation),or we can use sub-selection(if the database does
> not support limint/offset)
> For example:
> For MySQL:select * from table limiit 0,5;select * from table limit 6,10...
> For Oracle we can use rownum
> I just can not understand why sqoop override this opreation above.This
> override can lead to data skew.
>
>
> *From:* Abraham Elmahrek <ab...@cloudera.com>
> *Date:* 2014-09-10 00:38
> *To:* user@sqoop.apache.org
> *Subject:* Re: the confusion of --split-by parameter
> Hey there,
>
> For databases, there needs to be a way to actually infer boundaries for a
> particular column. Simply performing a "select *" would not be enough
> because we would not know how to query the database.
>
> -Abe
>
> On Mon, Sep 8, 2014 at 8:33 PM, lizhanqiang@inspur.com <
> lizhanqiang@inspur.com> wrote:
>
>> Hi,all.
>>    In sqoop we can specify the parameter --split-by,which can determine
>> which field we will use to split map recored.
>> But if the split field's data is skew.The workload between maps will be imbalance.I
>> want to know why sqoop does not use
>> select count(*) from table/num-maps to determine each map's workload.As I
>> know some other base class of  DataDrivenDBInputFormat's
>> has the implementation of select count(*) from table/num-maps.Then why
>> sqoop override this.
>>
>>
>>
>

Re: Re: the confusion of --split-by parameter

Posted by "lizhanqiang@inspur.com" <li...@inspur.com>.

Hey,brother.
  Glad to hear from you!I think we can use limit/offset(if the database support this operation),or we can use sub-selection(if the database does not support limint/offset)
For example:
For MySQL:select * from table limiit 0,5;select * from table limit 6,10...
For Oracle we can use rownum 
I just can not understand why sqoop override this opreation above.This override can lead to data skew.
 
From: Abraham Elmahrek
Date: 2014-09-10 00:38
To: user@sqoop.apache.org
Subject: Re: the confusion of --split-by parameter
Hey there,

For databases, there needs to be a way to actually infer boundaries for a particular column. Simply performing a "select *" would not be enough because we would not know how to query the database.

-Abe

On Mon, Sep 8, 2014 at 8:33 PM, lizhanqiang@inspur.com <li...@inspur.com> wrote:
Hi,all.
   In sqoop we can specify the parameter --split-by,which can determine which field we will use to split map recored.
But if the split field's data is skew.The workload between maps will be imbalance.I want to know why sqoop does not use 
select count(*) from table/num-maps to determine each map's workload.As I know some other base class of  DataDrivenDBInputFormat's
has the implementation of select count(*) from table/num-maps.Then why sqoop override this.




Re: Importing Table and Column comments

Posted by pratik khadloya <ti...@gmail.com>.
Sorry, i am not sure about that. I looked into sqoop's code and did not
find any reference to that.
Try to look into Oraoop.
Also some info here:
https://community.oracle.com/thread/227790?start=0&tstart=0

On Tue, Sep 30, 2014 at 8:36 AM, Venkat, Ankam <Ankam.Venkat@centurylink.com
> wrote:

>  Thanks Pratik.
>
>
>
> Looks like “Remarks” can be used in Oracle JDBC as well.
>
>
>
> But, how do we use this in SQOOP?
>
>
>
> Regards,
>
> Venkat
>
>
>
> *From:* pratik khadloya [mailto:tispratik@gmail.com]
> *Sent:* Monday, September 29, 2014 9:48 AM
>
> *To:* user@sqoop.apache.org
> *Subject:* Re: Importing Table and Column comments
>
>
>
> I realized that i was wrong about JDBC not providing the column comments
> information.
>
> For mysql, It does provide that information through the mysql connector.
> You might have to check one for your database.
>
> Also check https://issues.apache.org/jira/browse/SQOOP-1456
>
>
>
> Regards,
>
> Pratik
>
>
>
> On Fri, Sep 12, 2014 at 10:18 AM, pratik khadloya <ti...@gmail.com>
> wrote:
>
> Sorry, JDBC does not support USER_COL_COMMENTS either (i think this is
> specific to oracle).
>
> I don't think there is any tool currently available which can export mysql
> tables to hive other than sqoop.
>
>
>
> You might have to write your own custom code for backfilling the comments
> after sqoop is done with its import.
>
> Your custom tool can use something like
> http://stackoverflow.com/a/6752206/238880
>
>
>
>
>
> ~Pratik
>
>
>
> On Fri, Sep 12, 2014 at 7:34 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
>
> Thanks Pratik.
>
>
>
> How about fetching comments from USER_COL_COMMENTS ?
>
>
>
> If not Sqoop, is there any other way to fetch this data?
>
>
>
> Regards,
>
> Venkat
>
>
>
> *From:* pratik khadloya [mailto:tispratik@gmail.com]
> *Sent:* Thursday, September 11, 2014 6:09 PM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Importing Table and Column comments
>
>
>
> The problem is that JDBC does not have support for column comments.
>
> It is not a part of their API.
>
> Ref:
> http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html
>
>
>
> ~Pratik
>
>
>
> On Thu, Sep 11, 2014 at 3:13 PM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
>
> Need this data for our enterprise metadata repository.
>
>
>
> Any workaround for this?
>
>
>
> Regards,
>
> Venkat
>
>
>
> *From:* pratik khadloya [mailto:tispratik@gmail.com]
> *Sent:* Thursday, September 11, 2014 3:13 PM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Importing Table and Column comments
>
>
>
> I don't think that happens today. I was wondering about the same thing.
>
>
>
> On Thu, Sep 11, 2014 at 10:43 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
>
> Is there a way to import table and column comments and store info on Hive?
>
>
>
> Regards,
>
> Venkat Ankam
>
>
>
>
>
>
>
>
>

RE: Importing Table and Column comments

Posted by "Venkat, Ankam" <An...@centurylink.com>.
Thanks Pratik.

Looks like “Remarks” can be used in Oracle JDBC as well.

But, how do we use this in SQOOP?

Regards,
Venkat

From: pratik khadloya [mailto:tispratik@gmail.com]
Sent: Monday, September 29, 2014 9:48 AM
To: user@sqoop.apache.org
Subject: Re: Importing Table and Column comments

I realized that i was wrong about JDBC not providing the column comments information.
For mysql, It does provide that information through the mysql connector. You might have to check one for your database.
Also check https://issues.apache.org/jira/browse/SQOOP-1456

Regards,
Pratik

On Fri, Sep 12, 2014 at 10:18 AM, pratik khadloya <ti...@gmail.com>> wrote:
Sorry, JDBC does not support USER_COL_COMMENTS either (i think this is specific to oracle).
I don't think there is any tool currently available which can export mysql tables to hive other than sqoop.

You might have to write your own custom code for backfilling the comments after sqoop is done with its import.
Your custom tool can use something like http://stackoverflow.com/a/6752206/238880


~Pratik

On Fri, Sep 12, 2014 at 7:34 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
Thanks Pratik.

How about fetching comments from USER_COL_COMMENTS ?

If not Sqoop, is there any other way to fetch this data?

Regards,
Venkat

From: pratik khadloya [mailto:tispratik@gmail.com<ma...@gmail.com>]
Sent: Thursday, September 11, 2014 6:09 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Importing Table and Column comments

The problem is that JDBC does not have support for column comments.
It is not a part of their API.
Ref: http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html

~Pratik

On Thu, Sep 11, 2014 at 3:13 PM, Venkat, Ankam <An...@centurylink.com>> wrote:
Need this data for our enterprise metadata repository.

Any workaround for this?

Regards,
Venkat

From: pratik khadloya [mailto:tispratik@gmail.com<ma...@gmail.com>]
Sent: Thursday, September 11, 2014 3:13 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Importing Table and Column comments

I don't think that happens today. I was wondering about the same thing.

On Thu, Sep 11, 2014 at 10:43 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
Is there a way to import table and column comments and store info on Hive?

Regards,
Venkat Ankam





Re: Importing Table and Column comments

Posted by pratik khadloya <ti...@gmail.com>.
I realized that i was wrong about JDBC not providing the column comments
information.
For mysql, It does provide that information through the mysql connector.
You might have to check one for your database.
Also check https://issues.apache.org/jira/browse/SQOOP-1456

Regards,
Pratik

On Fri, Sep 12, 2014 at 10:18 AM, pratik khadloya <ti...@gmail.com>
wrote:

> Sorry, JDBC does not support USER_COL_COMMENTS either (i think this is
> specific to oracle).
> I don't think there is any tool currently available which can export mysql
> tables to hive other than sqoop.
>
> You might have to write your own custom code for backfilling the comments
> after sqoop is done with its import.
> Your custom tool can use something like
> http://stackoverflow.com/a/6752206/238880
>
>
> ~Pratik
>
> On Fri, Sep 12, 2014 at 7:34 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
>
>>  Thanks Pratik.
>>
>>
>>
>> How about fetching comments from USER_COL_COMMENTS ?
>>
>>
>>
>> If not Sqoop, is there any other way to fetch this data?
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>> *From:* pratik khadloya [mailto:tispratik@gmail.com]
>> *Sent:* Thursday, September 11, 2014 6:09 PM
>> *To:* user@sqoop.apache.org
>> *Subject:* Re: Importing Table and Column comments
>>
>>
>>
>> The problem is that JDBC does not have support for column comments.
>>
>> It is not a part of their API.
>>
>> Ref:
>> http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html
>>
>>
>>
>> ~Pratik
>>
>>
>>
>> On Thu, Sep 11, 2014 at 3:13 PM, Venkat, Ankam <
>> Ankam.Venkat@centurylink.com> wrote:
>>
>> Need this data for our enterprise metadata repository.
>>
>>
>>
>> Any workaround for this?
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>> *From:* pratik khadloya [mailto:tispratik@gmail.com]
>> *Sent:* Thursday, September 11, 2014 3:13 PM
>> *To:* user@sqoop.apache.org
>> *Subject:* Re: Importing Table and Column comments
>>
>>
>>
>> I don't think that happens today. I was wondering about the same thing.
>>
>>
>>
>> On Thu, Sep 11, 2014 at 10:43 AM, Venkat, Ankam <
>> Ankam.Venkat@centurylink.com> wrote:
>>
>> Is there a way to import table and column comments and store info on
>> Hive?
>>
>>
>>
>> Regards,
>>
>> Venkat Ankam
>>
>>
>>
>>
>>
>
>

Re: Importing Table and Column comments

Posted by pratik khadloya <ti...@gmail.com>.
Sorry, JDBC does not support USER_COL_COMMENTS either (i think this is
specific to oracle).
I don't think there is any tool currently available which can export mysql
tables to hive other than sqoop.

You might have to write your own custom code for backfilling the comments
after sqoop is done with its import.
Your custom tool can use something like
http://stackoverflow.com/a/6752206/238880


~Pratik

On Fri, Sep 12, 2014 at 7:34 AM, Venkat, Ankam <Ankam.Venkat@centurylink.com
> wrote:

>  Thanks Pratik.
>
>
>
> How about fetching comments from USER_COL_COMMENTS ?
>
>
>
> If not Sqoop, is there any other way to fetch this data?
>
>
>
> Regards,
>
> Venkat
>
>
>
> *From:* pratik khadloya [mailto:tispratik@gmail.com]
> *Sent:* Thursday, September 11, 2014 6:09 PM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Importing Table and Column comments
>
>
>
> The problem is that JDBC does not have support for column comments.
>
> It is not a part of their API.
>
> Ref:
> http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html
>
>
>
> ~Pratik
>
>
>
> On Thu, Sep 11, 2014 at 3:13 PM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
>
> Need this data for our enterprise metadata repository.
>
>
>
> Any workaround for this?
>
>
>
> Regards,
>
> Venkat
>
>
>
> *From:* pratik khadloya [mailto:tispratik@gmail.com]
> *Sent:* Thursday, September 11, 2014 3:13 PM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Importing Table and Column comments
>
>
>
> I don't think that happens today. I was wondering about the same thing.
>
>
>
> On Thu, Sep 11, 2014 at 10:43 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
>
> Is there a way to import table and column comments and store info on Hive?
>
>
>
> Regards,
>
> Venkat Ankam
>
>
>
>
>

RE: Importing Table and Column comments

Posted by "Venkat, Ankam" <An...@centurylink.com>.
Thanks Pratik.

How about fetching comments from USER_COL_COMMENTS ?

If not Sqoop, is there any other way to fetch this data?

Regards,
Venkat

From: pratik khadloya [mailto:tispratik@gmail.com]
Sent: Thursday, September 11, 2014 6:09 PM
To: user@sqoop.apache.org
Subject: Re: Importing Table and Column comments

The problem is that JDBC does not have support for column comments.
It is not a part of their API.
Ref: http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html

~Pratik

On Thu, Sep 11, 2014 at 3:13 PM, Venkat, Ankam <An...@centurylink.com>> wrote:
Need this data for our enterprise metadata repository.

Any workaround for this?

Regards,
Venkat

From: pratik khadloya [mailto:tispratik@gmail.com<ma...@gmail.com>]
Sent: Thursday, September 11, 2014 3:13 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Importing Table and Column comments

I don't think that happens today. I was wondering about the same thing.

On Thu, Sep 11, 2014 at 10:43 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
Is there a way to import table and column comments and store info on Hive?

Regards,
Venkat Ankam



Re: Importing Table and Column comments

Posted by pratik khadloya <ti...@gmail.com>.
The problem is that JDBC does not have support for column comments.
It is not a part of their API.
Ref:
http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html

~Pratik

On Thu, Sep 11, 2014 at 3:13 PM, Venkat, Ankam <Ankam.Venkat@centurylink.com
> wrote:

>  Need this data for our enterprise metadata repository.
>
>
>
> Any workaround for this?
>
>
>
> Regards,
>
> Venkat
>
>
>
> *From:* pratik khadloya [mailto:tispratik@gmail.com]
> *Sent:* Thursday, September 11, 2014 3:13 PM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Importing Table and Column comments
>
>
>
> I don't think that happens today. I was wondering about the same thing.
>
>
>
> On Thu, Sep 11, 2014 at 10:43 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
>
> Is there a way to import table and column comments and store info on Hive?
>
>
>
> Regards,
>
> Venkat Ankam
>
>
>

RE: Importing Table and Column comments

Posted by "Venkat, Ankam" <An...@centurylink.com>.
Need this data for our enterprise metadata repository.

Any workaround for this?

Regards,
Venkat

From: pratik khadloya [mailto:tispratik@gmail.com]
Sent: Thursday, September 11, 2014 3:13 PM
To: user@sqoop.apache.org
Subject: Re: Importing Table and Column comments

I don't think that happens today. I was wondering about the same thing.

On Thu, Sep 11, 2014 at 10:43 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
Is there a way to import table and column comments and store info on Hive?

Regards,
Venkat Ankam


Re: Importing Table and Column comments

Posted by pratik khadloya <ti...@gmail.com>.
I don't think that happens today. I was wondering about the same thing.

On Thu, Sep 11, 2014 at 10:43 AM, Venkat, Ankam <
Ankam.Venkat@centurylink.com> wrote:

>   Is there a way to import table and column comments and store info on
> Hive?
>
>
>
> Regards,
>
> Venkat Ankam
>

Importing Table and Column comments

Posted by "Venkat, Ankam" <An...@centurylink.com>.
Is there a way to import table and column comments and store info on Hive?

Regards,
Venkat Ankam

Re: the confusion of --split-by parameter

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey there,

For databases, there needs to be a way to actually infer boundaries for a
particular column. Simply performing a "select *" would not be enough
because we would not know how to query the database.

-Abe

On Mon, Sep 8, 2014 at 8:33 PM, lizhanqiang@inspur.com <
lizhanqiang@inspur.com> wrote:

> Hi,all.
>    In sqoop we can specify the parameter --split-by,which can determine
> which field we will use to split map recored.
> But if the split field's data is skew.The workload between maps will be imbalance.I
> want to know why sqoop does not use
> select count(*) from table/num-maps to determine each map's workload.As I
> know some other base class of  DataDrivenDBInputFormat's
> has the implementation of select count(*) from table/num-maps.Then why
> sqoop override this.
>
>
>