You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Vineet Mishra <cl...@gmail.com> on 2014/11/21 14:57:10 UTC

Handling Special Character while Sqoop Import

Hi,

I am doing a Sqoop import from mysql as source, recently I figured out that
data imported through sqoop from mysql was having some special characters
and even control character which was loosing its meaning while moved to
sqoop data files.

Looking out for a solution as how to handle this case of special character
or if possible pruning the unwanted data out of my target dataset.

Looking out for resolution at the earliest!

Thanks!

Re: Handling Special Character while Sqoop Import

Posted by Vineet Mishra <cl...@gmail.com>.
Well it seems to be the issue with Mysql Client configuration present on
the datanodes where sqoop is invoking the m/r job.

I performed a test on my local machine dumping the same data to mysql and
did a sqoop import to the hdfs and I can clearly see the data boarded to
HDFS.

This clearly indicates that the issue was in mysql client configuration
which I need to rectify and set character-set type to utf-8(I thought the
default character-set would be set to utf-8).


But still the later part of the question remains same, how do I manage the
control character present in the data as I don't know what could be the
part of data(as I have encountered Control characters), setting delimiter
as Control character would not solve the meaning if the data contained that
character itself.

Looking out for the standard solution.

Thanks!

On Mon, Nov 24, 2014 at 4:20 PM, Vineet Mishra <cl...@gmail.com>
wrote:

> Hi Abe,
>
> Thanks for your mail, well mysql table is defined with utf-8 and even the
> data is visible like mentioned below,
>
> *Data in mysql : *सुरेन्द्र कुमार पाण्डेय
>
> but as I move the same through sqoop import of data gets corrupted, as
> provided in the last thread of this mail.
>
> Well I even tried to set the parameters
> *useUnicode=true&characterEncoding=utf8* and *--direct --
> --default-character-set=utf8* to sqoop import mysql connection string but
> still there's no luck.
>
> Additionally, the data is containing some control character like Ctrl-A
> (x001) and Ctrl-M likewise, which is even violating the field delimeter set
> to sqoop import precisely as Ctrl-A. Is there a way to keep a possible
> delimeter which can handle/work with any special or control character
> introduced.
>
> Looking out for quick response.
>
> Thanks!
>
>
> On Sun, Nov 23, 2014 at 12:40 AM, Abraham Elmahrek <ab...@cloudera.com>
> wrote:
>
>> This could be in 2 places: Loading to HDFS, or extracting from MySQL.
>> Sqoop should load every thing as UTF-8 by default, which supports Hindi.
>>
>> What is your default character set in MySQL? Could you copy/paste your
>> my.cnf? Also, what version of MySQL are you running?
>>
>> On Sat, Nov 22, 2014 at 12:28 AM, Vineet Mishra <cl...@gmail.com>
>> wrote:
>>
>>> Hi Abe,
>>>
>>> Well with the above statement I mean to say that the data which is
>>> residing in mysql is different from what is been imported via sqoop.
>>>
>>> So let me shoot out an example for the same,
>>>
>>> *Data in mysql : *सुरेन्द्र कुमार पाण्डेय
>>> *Data in HDFS(Sqoop import) : * M-`M-$M-8M-`M-%M-
>>>
>>> So this is the kind of changes I am landing into which is completely
>>> loosing the meaning of the data.
>>>
>>> Any help would be appreciated.
>>>
>>> Thanks again!
>>>
>>> On Sat, Nov 22, 2014 at 2:15 AM, Abraham Elmahrek <ab...@cloudera.com>
>>> wrote:
>>>
>>>> Hey there,
>>>>
>>>> Could you explain what you mean by "losing its meaning"? It's possible
>>>> you may need to set the character set:
>>>> http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html
>>>> .
>>>>
>>>> -Abe
>>>>
>>>> On Fri, Nov 21, 2014 at 5:57 AM, Vineet Mishra <cl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am doing a Sqoop import from mysql as source, recently I figured out
>>>>> that data imported through sqoop from mysql was having some special
>>>>> characters and even control character which was loosing its meaning while
>>>>> moved to sqoop data files.
>>>>>
>>>>> Looking out for a solution as how to handle this case of special
>>>>> character or if possible pruning the unwanted data out of my target dataset.
>>>>>
>>>>> Looking out for resolution at the earliest!
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Handling Special Character while Sqoop Import

Posted by Vineet Mishra <cl...@gmail.com>.
Hi Abe,

Thanks for your mail, well mysql table is defined with utf-8 and even the
data is visible like mentioned below,

*Data in mysql : *सुरेन्द्र कुमार पाण्डेय

but as I move the same through sqoop import of data gets corrupted, as
provided in the last thread of this mail.

Well I even tried to set the parameters
*useUnicode=true&characterEncoding=utf8* and *--direct --
--default-character-set=utf8* to sqoop import mysql connection string but
still there's no luck.

Additionally, the data is containing some control character like Ctrl-A
(x001) and Ctrl-M likewise, which is even violating the field delimeter set
to sqoop import precisely as Ctrl-A. Is there a way to keep a possible
delimeter which can handle/work with any special or control character
introduced.

Looking out for quick response.

Thanks!


On Sun, Nov 23, 2014 at 12:40 AM, Abraham Elmahrek <ab...@cloudera.com> wrote:

> This could be in 2 places: Loading to HDFS, or extracting from MySQL.
> Sqoop should load every thing as UTF-8 by default, which supports Hindi.
>
> What is your default character set in MySQL? Could you copy/paste your
> my.cnf? Also, what version of MySQL are you running?
>
> On Sat, Nov 22, 2014 at 12:28 AM, Vineet Mishra <cl...@gmail.com>
> wrote:
>
>> Hi Abe,
>>
>> Well with the above statement I mean to say that the data which is
>> residing in mysql is different from what is been imported via sqoop.
>>
>> So let me shoot out an example for the same,
>>
>> *Data in mysql : *सुरेन्द्र कुमार पाण्डेय
>> *Data in HDFS(Sqoop import) : * M-`M-$M-8M-`M-%M-
>>
>> So this is the kind of changes I am landing into which is completely
>> loosing the meaning of the data.
>>
>> Any help would be appreciated.
>>
>> Thanks again!
>>
>> On Sat, Nov 22, 2014 at 2:15 AM, Abraham Elmahrek <ab...@cloudera.com>
>> wrote:
>>
>>> Hey there,
>>>
>>> Could you explain what you mean by "losing its meaning"? It's possible
>>> you may need to set the character set:
>>> http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html
>>> .
>>>
>>> -Abe
>>>
>>> On Fri, Nov 21, 2014 at 5:57 AM, Vineet Mishra <cl...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am doing a Sqoop import from mysql as source, recently I figured out
>>>> that data imported through sqoop from mysql was having some special
>>>> characters and even control character which was loosing its meaning while
>>>> moved to sqoop data files.
>>>>
>>>> Looking out for a solution as how to handle this case of special
>>>> character or if possible pruning the unwanted data out of my target dataset.
>>>>
>>>> Looking out for resolution at the earliest!
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>
>

Re: Handling Special Character while Sqoop Import

Posted by Abraham Elmahrek <ab...@cloudera.com>.
This could be in 2 places: Loading to HDFS, or extracting from MySQL. Sqoop
should load every thing as UTF-8 by default, which supports Hindi.

What is your default character set in MySQL? Could you copy/paste your
my.cnf? Also, what version of MySQL are you running?

On Sat, Nov 22, 2014 at 12:28 AM, Vineet Mishra <cl...@gmail.com>
wrote:

> Hi Abe,
>
> Well with the above statement I mean to say that the data which is
> residing in mysql is different from what is been imported via sqoop.
>
> So let me shoot out an example for the same,
>
> *Data in mysql : *सुरेन्द्र कुमार पाण्डेय
> *Data in HDFS(Sqoop import) : * M-`M-$M-8M-`M-%M-
>
> So this is the kind of changes I am landing into which is completely
> loosing the meaning of the data.
>
> Any help would be appreciated.
>
> Thanks again!
>
> On Sat, Nov 22, 2014 at 2:15 AM, Abraham Elmahrek <ab...@cloudera.com>
> wrote:
>
>> Hey there,
>>
>> Could you explain what you mean by "losing its meaning"? It's possible
>> you may need to set the character set:
>> http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html
>> .
>>
>> -Abe
>>
>> On Fri, Nov 21, 2014 at 5:57 AM, Vineet Mishra <cl...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am doing a Sqoop import from mysql as source, recently I figured out
>>> that data imported through sqoop from mysql was having some special
>>> characters and even control character which was loosing its meaning while
>>> moved to sqoop data files.
>>>
>>> Looking out for a solution as how to handle this case of special
>>> character or if possible pruning the unwanted data out of my target dataset.
>>>
>>> Looking out for resolution at the earliest!
>>>
>>> Thanks!
>>>
>>
>>
>

Re: Handling Special Character while Sqoop Import

Posted by Vineet Mishra <cl...@gmail.com>.
Hi Abe,

Well with the above statement I mean to say that the data which is residing
in mysql is different from what is been imported via sqoop.

So let me shoot out an example for the same,

*Data in mysql : *सुरेन्द्र कुमार पाण्डेय
*Data in HDFS(Sqoop import) : * M-`M-$M-8M-`M-%M-

So this is the kind of changes I am landing into which is completely
loosing the meaning of the data.

Any help would be appreciated.

Thanks again!

On Sat, Nov 22, 2014 at 2:15 AM, Abraham Elmahrek <ab...@cloudera.com> wrote:

> Hey there,
>
> Could you explain what you mean by "losing its meaning"? It's possible you
> may need to set the character set:
> http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html
> .
>
> -Abe
>
> On Fri, Nov 21, 2014 at 5:57 AM, Vineet Mishra <cl...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am doing a Sqoop import from mysql as source, recently I figured out
>> that data imported through sqoop from mysql was having some special
>> characters and even control character which was loosing its meaning while
>> moved to sqoop data files.
>>
>> Looking out for a solution as how to handle this case of special
>> character or if possible pruning the unwanted data out of my target dataset.
>>
>> Looking out for resolution at the earliest!
>>
>> Thanks!
>>
>
>

Re: Handling Special Character while Sqoop Import

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey there,

Could you explain what you mean by "losing its meaning"? It's possible you
may need to set the character set:
http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html.

-Abe

On Fri, Nov 21, 2014 at 5:57 AM, Vineet Mishra <cl...@gmail.com>
wrote:

> Hi,
>
> I am doing a Sqoop import from mysql as source, recently I figured out
> that data imported through sqoop from mysql was having some special
> characters and even control character which was loosing its meaning while
> moved to sqoop data files.
>
> Looking out for a solution as how to handle this case of special character
> or if possible pruning the unwanted data out of my target dataset.
>
> Looking out for resolution at the earliest!
>
> Thanks!
>