You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Selvam Raman <se...@gmail.com> on 2016/09/23 14:08:43 UTC

sqoop import for UUID(primary key)

Hi,

In Sqoop If i am having primary key (Number value) and number of parallel
task then it will work (max-min/number of task), to pull the data from
table to hdfs.

suppose if i have the primary key as UUID(alpha numeric value), how the
load will be distributed.

Thank you for your help.

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: sqoop import for UUID(primary key)

Posted by Markus Kemper <ma...@cloudera.com>.

Hello Selvam,

You can use a single maptask (no split).  If you are ingesting from Oracle you can us --direct which does not use column keys to generate splits.

Thanks, Markus

> On Sep 25, 2016, at 10:14, Selvam Raman <se...@gmail.com> wrote:
> 
> I have 1 TB of data in databse. Primary key are alphanumeric.
> Now how can I use sqoop.
> 
> Is it possible to use sqoop to import.
> 
> Thanks,
> Selvam R
> +91-97877-87724
> 
>> On Sep 23, 2016 3:17 PM, "Markus Kemper" <ma...@cloudera.com> wrote:
>> As Ravi noted, non-numeric keys are not reliable and can result in both duplicate as well as missing rows.  When using a non-numeric key for split-by you should observe a warning in the debug console output.
>> 
>> 
>> Markus Kemper
>> Customer Operations Engineer
>> 
>> 
>> 
>>> On Fri, Sep 23, 2016 at 10:11 AM, Ravi, Chandramouli <Ch...@vantiv.com> wrote:
>>> It won't work well when Primary key is alpha numeric. I think data will be skewed or won't come back as expected creating non-balanced split files.
>>> 
>>> Specify different numeric index as Split key if numeric primary key is not present.
>>> 
>>>  
>>> 
>>> From: Selvam Raman [mailto:selmna@gmail.com] 
>>> Sent: Friday, September 23, 2016 10:09 AM
>>> To: user@sqoop.apache.org
>>> Subject: sqoop import for UUID(primary key)
>>> 
>>>  
>>> 
>>> Hi,
>>> 
>>>  
>>> 
>>> In Sqoop If i am having primary key (Number value) and number of parallel task then it will work (max-min/number of task), to pull the data from table to hdfs.
>>> 
>>>  
>>> 
>>> suppose if i have the primary key as UUID(alpha numeric value), how the load will be distributed.
>>> 
>>>  
>>> 
>>> Thank you for your help.
>>> 
>>>  
>>> 
>>> --
>>> 
>>> Selvam Raman
>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>> 
>>> 
>>>  **NOTICE: This e-mail message, including any attachments hereto, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information.  If you are not the intended recipient(s), any unauthorized review, use, copying, disclosure or distribution is prohibited.  If you are not the intended recipient(s), please contact the sender by reply e-mail immediately and destroy the original and all copies (including electronic versions) of this message and any of its attachments.
>>> 
>>

RE: sqoop import for UUID(primary key)

Posted by "Ravi, Chandramouli" <Ch...@vantiv.com>.

I am repeating same but in detail.

Any other numeric index which can give even split of data can be used as split key.
Otherwise, use single mapper.

I have tried date field as split key which is alpha numeric.
Sqoop cannot do split ranges accurately and I have seen split range values as unreadable when splits are calculated on Oracle.
So there is no way to know if data is coming back is good or not.
If good, I don’t if all data is coming or extra data is coming.

So I have changed to different index with numeric field which may not be 1st solution but close to what I need.

If you don’t have any numeric index that gives even splits , try to build one on the Source database.

Thanks,
Chandra

From: Selvam Raman [mailto:selmna@gmail.com]
Sent: Sunday, September 25, 2016 10:15 AM
To: Markus Kemper
Cc: user@sqoop.apache.org
Subject: Re: sqoop import for UUID(primary key)


I have 1 TB of data in databse. Primary key are alphanumeric.
Now how can I use sqoop.

Is it possible to use sqoop to import.

Thanks,
Selvam R
+91-97877-87724
On Sep 23, 2016 3:17 PM, "Markus Kemper" <ma...@cloudera.com>> wrote:
As Ravi noted, non-numeric keys are not reliable and can result in both duplicate as well as missing rows.  When using a non-numeric key for split-by you should observe a warning in the debug console output.


Markus Kemper
Customer Operations Engineer
[www.cloudera.com]<http://www.cloudera.com>


On Fri, Sep 23, 2016 at 10:11 AM, Ravi, Chandramouli <Ch...@vantiv.com>> wrote:
It won't work well when Primary key is alpha numeric. I think data will be skewed or won't come back as expected creating non-balanced split files.
Specify different numeric index as Split key if numeric primary key is not present.

From: Selvam Raman [mailto:selmna@gmail.com<ma...@gmail.com>]
Sent: Friday, September 23, 2016 10:09 AM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: sqoop import for UUID(primary key)

Hi,

In Sqoop If i am having primary key (Number value) and number of parallel task then it will work (max-min/number of task), to pull the data from table to hdfs.

suppose if i have the primary key as UUID(alpha numeric value), how the load will be distributed.

Thank you for your help.

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

 **NOTICE: This e-mail message, including any attachments hereto, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information.  If you are not the intended recipient(s), any unauthorized review, use, copying, disclosure or distribution is prohibited.  If you are not the intended recipient(s), please contact the sender by reply e-mail immediately and destroy the original and all copies (including electronic versions) of this message and any of its attachments.


 **NOTICE: This e-mail message, including any attachments hereto, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information.  If you are not the intended recipient(s), any unauthorized review, use, copying, disclosure or distribution is prohibited.  If you are not the intended recipient(s), please contact the sender by reply e-mail immediately and destroy the original and all copies (including electronic versions) of this message and any of its attachments.

Re: sqoop import for UUID(primary key)

Posted by Selvam Raman <se...@gmail.com>.

I have 1 TB of data in databse. Primary key are alphanumeric.
Now how can I use sqoop.

Is it possible to use sqoop to import.

Thanks,
Selvam R
+91-97877-87724
On Sep 23, 2016 3:17 PM, "Markus Kemper" <ma...@cloudera.com> wrote:

> As Ravi noted, non-numeric keys are not reliable and can result in both
> duplicate as well as missing rows.  When using a non-numeric key for
> split-by you should observe a warning in the debug console output.
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Fri, Sep 23, 2016 at 10:11 AM, Ravi, Chandramouli <
> Chandramouli.Ravi@vantiv.com> wrote:
>
>> It won't work well when Primary key is alpha numeric. I think data will
>> be skewed or won't come back as expected creating non-balanced split files.
>>
>> Specify different numeric index as Split key if numeric primary key is
>> not present.
>>
>>
>>
>> *From:* Selvam Raman [mailto:selmna@gmail.com]
>> *Sent:* Friday, September 23, 2016 10:09 AM
>> *To:* user@sqoop.apache.org
>> *Subject:* sqoop import for UUID(primary key)
>>
>>
>>
>> Hi,
>>
>>
>>
>> In Sqoop If i am having primary key (Number value) and number of parallel
>> task then it will work (max-min/number of task), to pull the data from
>> table to hdfs.
>>
>>
>>
>> suppose if i have the primary key as UUID(alpha numeric value), how the
>> load will be distributed.
>>
>>
>>
>> Thank you for your help.
>>
>>
>>
>> --
>>
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>>
>>  **NOTICE: This e-mail message, including any attachments hereto, is for
>> the sole use of the intended recipient(s) and may contain confidential
>> and/or privileged information.  If you are not the intended recipient(s),
>> any unauthorized review, use, copying, disclosure or distribution is
>> prohibited.  If you are not the intended recipient(s), please contact the
>> sender by reply e-mail immediately and destroy the original and all copies
>> (including electronic versions) of this message and any of its attachments.
>>
>
>

Re: sqoop import for UUID(primary key)

Posted by Markus Kemper <ma...@cloudera.com>.

As Ravi noted, non-numeric keys are not reliable and can result in both
duplicate as well as missing rows.  When using a non-numeric key for
split-by you should observe a warning in the debug console output.


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Fri, Sep 23, 2016 at 10:11 AM, Ravi, Chandramouli <
Chandramouli.Ravi@vantiv.com> wrote:

> It won't work well when Primary key is alpha numeric. I think data will be
> skewed or won't come back as expected creating non-balanced split files.
>
> Specify different numeric index as Split key if numeric primary key is not
> present.
>
>
>
> *From:* Selvam Raman [mailto:selmna@gmail.com]
> *Sent:* Friday, September 23, 2016 10:09 AM
> *To:* user@sqoop.apache.org
> *Subject:* sqoop import for UUID(primary key)
>
>
>
> Hi,
>
>
>
> In Sqoop If i am having primary key (Number value) and number of parallel
> task then it will work (max-min/number of task), to pull the data from
> table to hdfs.
>
>
>
> suppose if i have the primary key as UUID(alpha numeric value), how the
> load will be distributed.
>
>
>
> Thank you for your help.
>
>
>
> --
>
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>
>
>  **NOTICE: This e-mail message, including any attachments hereto, is for
> the sole use of the intended recipient(s) and may contain confidential
> and/or privileged information.  If you are not the intended recipient(s),
> any unauthorized review, use, copying, disclosure or distribution is
> prohibited.  If you are not the intended recipient(s), please contact the
> sender by reply e-mail immediately and destroy the original and all copies
> (including electronic versions) of this message and any of its attachments.
>

RE: sqoop import for UUID(primary key)

Posted by "Ravi, Chandramouli" <Ch...@vantiv.com>.

It won't work well when Primary key is alpha numeric. I think data will be skewed or won't come back as expected creating non-balanced split files.
Specify different numeric index as Split key if numeric primary key is not present.

From: Selvam Raman [mailto:selmna@gmail.com]
Sent: Friday, September 23, 2016 10:09 AM
To: user@sqoop.apache.org
Subject: sqoop import for UUID(primary key)

Hi,

In Sqoop If i am having primary key (Number value) and number of parallel task then it will work (max-min/number of task), to pull the data from table to hdfs.

suppose if i have the primary key as UUID(alpha numeric value), how the load will be distributed.

Thank you for your help.

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

 **NOTICE: This e-mail message, including any attachments hereto, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information.  If you are not the intended recipient(s), any unauthorized review, use, copying, disclosure or distribution is prohibited.  If you are not the intended recipient(s), please contact the sender by reply e-mail immediately and destroy the original and all copies (including electronic versions) of this message and any of its attachments.