You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Raymond Xie <xi...@gmail.com> on 2018/07/11 14:05:38 UTC

sqoop import big table with no integer key

Good day,

In my situation my table has billion rows, it doesn't come with an integer
column as its key, that means if I use sqoop to do the import (into hive),
I would not be able to use multiple mapper.
As table's size is big, it is not realistic to add an extra new integer
field to it.

I do come across a post from hortonworks which seems to suggest it is
possible however was commented that:

1. no guarantees though that sqoop splits your records evenly over your
mappers though.
2. For huge number of row the above options will cause duplicates in the
results set.

https://community.hortonworks.com/questions/26961/sqoop-split-by-on-a-string-varchar-column.html


Any thought?


Thank you very much.

*------------------------------------------------*
*Sincerely yours,*


*Raymond*

Re: sqoop import big table with no integer key

Posted by Nicolas Paris <ni...@gmail.com>.
hi Raymond

Postgresql also have a direct mode that don't need any split by.
Direct mode  is usually a good idea since it does not load the source
database
with multiple threads and as better performance


2018-08-14 10:42 GMT+02:00 Szabolcs Vasas <va...@apache.org>:

> Hi Raymond,
>
> Yes, I can confirm that splitting by a string field can cause issues in
> Sqoop that is why the org.apache.sqoop.splitter.allow_text_splitter property
> was introduced (see: SQOOP-2910).
> Which RDBMS do you use? If you use Oracle then you are lucky because the
> Oracle direct connector does not require a split-by column otherwise I am
> afraid there is no real solution to this problem currently.
>
> Regards,
> Szabolcs
>
> On Wed, Jul 11, 2018 at 4:06 PM Raymond Xie <xi...@gmail.com> wrote:
>
>> Good day,
>>
>> In my situation my table has billion rows, it doesn't come with an
>> integer column as its key, that means if I use sqoop to do the import (into
>> hive), I would not be able to use multiple mapper.
>> As table's size is big, it is not realistic to add an extra new integer
>> field to it.
>>
>> I do come across a post from hortonworks which seems to suggest it is
>> possible however was commented that:
>>
>> 1. no guarantees though that sqoop splits your records evenly over your
>> mappers though.
>> 2. For huge number of row the above options will cause duplicates in the
>> results set.
>>
>> https://community.hortonworks.com/questions/26961/sqoop-
>> split-by-on-a-string-varchar-column.html
>>
>>
>> Any thought?
>>
>>
>> Thank you very much.
>>
>> *------------------------------------------------*
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>> * <http://www.cloudera.com>*
>>
>

Re: sqoop import big table with no integer key

Posted by Szabolcs Vasas <va...@apache.org>.
Hi Raymond,

Yes, I can confirm that splitting by a string field can cause issues in
Sqoop that is why the org.apache.sqoop.splitter.allow_text_splitter property
was introduced (see: SQOOP-2910).
Which RDBMS do you use? If you use Oracle then you are lucky because the
Oracle direct connector does not require a split-by column otherwise I am
afraid there is no real solution to this problem currently.

Regards,
Szabolcs

On Wed, Jul 11, 2018 at 4:06 PM Raymond Xie <xi...@gmail.com> wrote:

> Good day,
>
> In my situation my table has billion rows, it doesn't come with an integer
> column as its key, that means if I use sqoop to do the import (into hive),
> I would not be able to use multiple mapper.
> As table's size is big, it is not realistic to add an extra new integer
> field to it.
>
> I do come across a post from hortonworks which seems to suggest it is
> possible however was commented that:
>
> 1. no guarantees though that sqoop splits your records evenly over your
> mappers though.
> 2. For huge number of row the above options will cause duplicates in the
> results set.
>
>
> https://community.hortonworks.com/questions/26961/sqoop-split-by-on-a-string-varchar-column.html
>
>
> Any thought?
>
>
> Thank you very much.
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*
>
> * <http://www.cloudera.com>*
>