You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Glen Hein <gl...@gmail.com> on 2015/03/10 17:34:47 UTC
having trouble with --hive-delims-replacement
I have an oracle db were some of the records contain \n and \r characters. I
m trying to use --hive-delims-replacement to convert them to spaces. I've
had mixed results. It seems that when the \n and \r appear at the beginning
of the fields data, then they are converted as expected. But when they are
embedded in the middle of the field data, then they are not being replaced.
Here's a hexdump showing the unconverted \n and \r characters:
000000c0 6e 75 6c 6c 01 48 65 6c 6c 6f 2c 0d 0a 20 20 20
|null.Hello,.. |
000000d0 20 20 20 4d 79 20 6e 61 6d 65 20 69 73 20 4d 61 | My name is
Ma|
000000e0 72 79 20 48 75 6e 74 2c 20 49 20 61 6d 20 31 39 |ry Hunt, I am
19|
000000f0 20 79 65 61 72 73 20 6f 6c 64 20 61 6e 64 20 6c | years old and
l|
00000100 69 76 65 20 69 6e 20 43 61 6c 69 66 6f 72 6e 69 |ive in
Californi|
00000110 61 2e 20 49 20 67 72 61 64 75 61 74 65 64 20 68 |a. I graduated
h|
00000120 69 67 68 20 73 63 68 6f 6f 6c 20 6f 6e 20 32 30 |igh school on
20|
00000130 31 31 20 61 6e 64 20 64 65 63 69 64 65 64 20 74 |11 and decided
t|
00000140 6f 20 74 61 6b 65 20 61 20 79 65 61 72 20 6f 66 |o take a year
of|
00000150 66 20 74 6f 20 73 65 65 20 69 66 20 77 68 61 74 |f to see if
what|
00000160 20 49 20 63 68 6f 73 65 20 61 73 20 61 20 66 75 | I chose as a
fu|
00000170 74 75 72 65 20 63 61 72 65 65 72 20 69 73 20 77 |ture career is
w|
00000180 68 61 74 20 49 20 72 65 61 6c 79 20 77 61 6e 74 |hat I realy
want|
00000190 65 64 20 74 6f 20 64 6f 2e 20 4d 79 20 79 65 61 |ed to do. My
yea|
000001a0 72 20 69 73 20 75 70 20 61 6e 64 20 49 20 73 74 |r is up and I
st|
000001b0 69 6c 6c 20 77 61 6e 74 20 74 6f 20 62 65 20 61 |ill want to be
a|
000001c0 20 74 65 61 63 68 65 72 2e 20 49 20 6c 6f 6f 6b | teacher. I
look|
000001d0 20 66 6f 77 61 72 64 20 74 6f 20 6c 65 61 72 6e | foward to
learn|
000001e0 69 6e 67 20 61 6c 6c 20 74 68 61 74 20 49 20 63 |ing all that I
c|
000001f0 61 6e 20 61 6e 64 20 62 65 69 6e 67 20 61 6e 20 |an and being
an |
00000200 65 6c 65 6d 65 6e 74 72 79 20 73 63 68 6f 6f 6c |elementry
school|
00000210 20 74 65 61 63 68 65 72 2e 01 6e 75 6c 6c 01 31 |
teacher..null.1|
You can see in the first row of data there is a "01" that starts the field,
and then a few characters later the 0d and 0a. I'm passing the following
option to sqoop:
<arg>--hive-delims-replacement</arg>
<arg>ABCDEFG</arg>
I've tried several different values for the actual replacement string, but
the results are always the same.
Is there a better way to replace the unwanted characters?
Thanks,
Glen
Re: having trouble with --hive-delims-replacement
Posted by Abraham Elmahrek <ab...@cloudera.com>.
No idea if CLOB handling has changed. I'll let someone else in the
community answer that one.
On that note... here's an interesting google search:
https://www.google.com/search?q=site%3Aissues.apache.org%20sqoop%20clob
On Wed, Mar 11, 2015 at 11:42 AM, Glen Hein <gl...@gmail.com> wrote:
> I meant to say that the older system is running WITHOUT the extra mapping
> arguments.
>
>
> On Wed, Mar 11, 2015 at 10:20 AM, Glen Hein <gl...@gmail.com> wrote:
>
>> I've narrowed down the problem to CLOB fields. I did figure out that if I
>> add the option "--map-column-java BIO=String
>> " (BIO is the name of the CLOB field), then newlines are filtered as
>> expected.
>>
>> We do have an older system running Sqoop 1.3.0 that seems to filter the
>> newlines for the CLOB fields with adding the --map-column-java. Do you know
>> if 1.4.3 differs from 1.3.0 on how CLOB are handled?
>>
>> -Glen
>>
>> On Wed, Mar 11, 2015 at 10:03 AM, Abraham Elmahrek <ab...@cloudera.com>
>> wrote:
>>
>>> Hey man,
>>>
>>> Maybe regexp_replace + free-form query?
>>>
>>>
>>> http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports
>>> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
>>>
>>> -Abe
>>>
>>> On Tue, Mar 10, 2015 at 9:34 AM, Glen Hein <gl...@gmail.com> wrote:
>>>
>>>>
>>>> I have an oracle db were some of the records contain \n and \r
>>>> characters. I
>>>> m trying to use --hive-delims-replacement to convert them to spaces.
>>>> I've had mixed results. It seems that when the \n and \r appear at the
>>>> beginning of the fields data, then they are converted as expected. But when
>>>> they are embedded in the middle of the field data, then they are not being
>>>> replaced. Here's a hexdump showing the unconverted \n and \r characters:
>>>>
>>>> 000000c0 6e 75 6c 6c 01 48 65 6c 6c 6f 2c 0d 0a 20 20 20
>>>> |null.Hello,.. |
>>>> 000000d0 20 20 20 4d 79 20 6e 61 6d 65 20 69 73 20 4d 61 | My name
>>>> is Ma|
>>>> 000000e0 72 79 20 48 75 6e 74 2c 20 49 20 61 6d 20 31 39 |ry Hunt, I
>>>> am 19|
>>>> 000000f0 20 79 65 61 72 73 20 6f 6c 64 20 61 6e 64 20 6c | years old
>>>> and l|
>>>> 00000100 69 76 65 20 69 6e 20 43 61 6c 69 66 6f 72 6e 69 |ive in
>>>> Californi|
>>>> 00000110 61 2e 20 49 20 67 72 61 64 75 61 74 65 64 20 68 |a. I
>>>> graduated h|
>>>> 00000120 69 67 68 20 73 63 68 6f 6f 6c 20 6f 6e 20 32 30 |igh school
>>>> on 20|
>>>> 00000130 31 31 20 61 6e 64 20 64 65 63 69 64 65 64 20 74 |11 and
>>>> decided t|
>>>> 00000140 6f 20 74 61 6b 65 20 61 20 79 65 61 72 20 6f 66 |o take a
>>>> year of|
>>>> 00000150 66 20 74 6f 20 73 65 65 20 69 66 20 77 68 61 74 |f to see
>>>> if what|
>>>> 00000160 20 49 20 63 68 6f 73 65 20 61 73 20 61 20 66 75 | I chose
>>>> as a fu|
>>>> 00000170 74 75 72 65 20 63 61 72 65 65 72 20 69 73 20 77 |ture
>>>> career is w|
>>>> 00000180 68 61 74 20 49 20 72 65 61 6c 79 20 77 61 6e 74 |hat I
>>>> realy want|
>>>> 00000190 65 64 20 74 6f 20 64 6f 2e 20 4d 79 20 79 65 61 |ed to do.
>>>> My yea|
>>>> 000001a0 72 20 69 73 20 75 70 20 61 6e 64 20 49 20 73 74 |r is up
>>>> and I st|
>>>> 000001b0 69 6c 6c 20 77 61 6e 74 20 74 6f 20 62 65 20 61 |ill want
>>>> to be a|
>>>> 000001c0 20 74 65 61 63 68 65 72 2e 20 49 20 6c 6f 6f 6b | teacher.
>>>> I look|
>>>> 000001d0 20 66 6f 77 61 72 64 20 74 6f 20 6c 65 61 72 6e | foward to
>>>> learn|
>>>> 000001e0 69 6e 67 20 61 6c 6c 20 74 68 61 74 20 49 20 63 |ing all
>>>> that I c|
>>>> 000001f0 61 6e 20 61 6e 64 20 62 65 69 6e 67 20 61 6e 20 |an and
>>>> being an |
>>>> 00000200 65 6c 65 6d 65 6e 74 72 79 20 73 63 68 6f 6f 6c |elementry
>>>> school|
>>>> 00000210 20 74 65 61 63 68 65 72 2e 01 6e 75 6c 6c 01 31 |
>>>> teacher..null.1|
>>>>
>>>> You can see in the first row of data there is a "01" that starts the
>>>> field, and then a few characters later the 0d and 0a. I'm passing the
>>>> following option to sqoop:
>>>>
>>>> <arg>--hive-delims-replacement</arg>
>>>> <arg>ABCDEFG</arg>
>>>>
>>>> I've tried several different values for the actual replacement string,
>>>> but the results are always the same.
>>>>
>>>> Is there a better way to replace the unwanted characters?
>>>>
>>>> Thanks,
>>>> Glen
>>>>
>>>>
>>>
>>
>
Re: having trouble with --hive-delims-replacement
Posted by Glen Hein <gl...@gmail.com>.
I meant to say that the older system is running WITHOUT the extra mapping
arguments.
On Wed, Mar 11, 2015 at 10:20 AM, Glen Hein <gl...@gmail.com> wrote:
> I've narrowed down the problem to CLOB fields. I did figure out that if I
> add the option "--map-column-java BIO=String
> " (BIO is the name of the CLOB field), then newlines are filtered as
> expected.
>
> We do have an older system running Sqoop 1.3.0 that seems to filter the
> newlines for the CLOB fields with adding the --map-column-java. Do you know
> if 1.4.3 differs from 1.3.0 on how CLOB are handled?
>
> -Glen
>
> On Wed, Mar 11, 2015 at 10:03 AM, Abraham Elmahrek <ab...@cloudera.com>
> wrote:
>
>> Hey man,
>>
>> Maybe regexp_replace + free-form query?
>>
>>
>> http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports
>> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
>>
>> -Abe
>>
>> On Tue, Mar 10, 2015 at 9:34 AM, Glen Hein <gl...@gmail.com> wrote:
>>
>>>
>>> I have an oracle db were some of the records contain \n and \r
>>> characters. I
>>> m trying to use --hive-delims-replacement to convert them to spaces.
>>> I've had mixed results. It seems that when the \n and \r appear at the
>>> beginning of the fields data, then they are converted as expected. But when
>>> they are embedded in the middle of the field data, then they are not being
>>> replaced. Here's a hexdump showing the unconverted \n and \r characters:
>>>
>>> 000000c0 6e 75 6c 6c 01 48 65 6c 6c 6f 2c 0d 0a 20 20 20
>>> |null.Hello,.. |
>>> 000000d0 20 20 20 4d 79 20 6e 61 6d 65 20 69 73 20 4d 61 | My name
>>> is Ma|
>>> 000000e0 72 79 20 48 75 6e 74 2c 20 49 20 61 6d 20 31 39 |ry Hunt, I
>>> am 19|
>>> 000000f0 20 79 65 61 72 73 20 6f 6c 64 20 61 6e 64 20 6c | years old
>>> and l|
>>> 00000100 69 76 65 20 69 6e 20 43 61 6c 69 66 6f 72 6e 69 |ive in
>>> Californi|
>>> 00000110 61 2e 20 49 20 67 72 61 64 75 61 74 65 64 20 68 |a. I
>>> graduated h|
>>> 00000120 69 67 68 20 73 63 68 6f 6f 6c 20 6f 6e 20 32 30 |igh school
>>> on 20|
>>> 00000130 31 31 20 61 6e 64 20 64 65 63 69 64 65 64 20 74 |11 and
>>> decided t|
>>> 00000140 6f 20 74 61 6b 65 20 61 20 79 65 61 72 20 6f 66 |o take a
>>> year of|
>>> 00000150 66 20 74 6f 20 73 65 65 20 69 66 20 77 68 61 74 |f to see if
>>> what|
>>> 00000160 20 49 20 63 68 6f 73 65 20 61 73 20 61 20 66 75 | I chose as
>>> a fu|
>>> 00000170 74 75 72 65 20 63 61 72 65 65 72 20 69 73 20 77 |ture career
>>> is w|
>>> 00000180 68 61 74 20 49 20 72 65 61 6c 79 20 77 61 6e 74 |hat I realy
>>> want|
>>> 00000190 65 64 20 74 6f 20 64 6f 2e 20 4d 79 20 79 65 61 |ed to do.
>>> My yea|
>>> 000001a0 72 20 69 73 20 75 70 20 61 6e 64 20 49 20 73 74 |r is up and
>>> I st|
>>> 000001b0 69 6c 6c 20 77 61 6e 74 20 74 6f 20 62 65 20 61 |ill want to
>>> be a|
>>> 000001c0 20 74 65 61 63 68 65 72 2e 20 49 20 6c 6f 6f 6b | teacher. I
>>> look|
>>> 000001d0 20 66 6f 77 61 72 64 20 74 6f 20 6c 65 61 72 6e | foward to
>>> learn|
>>> 000001e0 69 6e 67 20 61 6c 6c 20 74 68 61 74 20 49 20 63 |ing all
>>> that I c|
>>> 000001f0 61 6e 20 61 6e 64 20 62 65 69 6e 67 20 61 6e 20 |an and
>>> being an |
>>> 00000200 65 6c 65 6d 65 6e 74 72 79 20 73 63 68 6f 6f 6c |elementry
>>> school|
>>> 00000210 20 74 65 61 63 68 65 72 2e 01 6e 75 6c 6c 01 31 |
>>> teacher..null.1|
>>>
>>> You can see in the first row of data there is a "01" that starts the
>>> field, and then a few characters later the 0d and 0a. I'm passing the
>>> following option to sqoop:
>>>
>>> <arg>--hive-delims-replacement</arg>
>>> <arg>ABCDEFG</arg>
>>>
>>> I've tried several different values for the actual replacement string,
>>> but the results are always the same.
>>>
>>> Is there a better way to replace the unwanted characters?
>>>
>>> Thanks,
>>> Glen
>>>
>>>
>>
>
Re: having trouble with --hive-delims-replacement
Posted by Glen Hein <gl...@gmail.com>.
I've narrowed down the problem to CLOB fields. I did figure out that if I
add the option "--map-column-java BIO=String
" (BIO is the name of the CLOB field), then newlines are filtered as
expected.
We do have an older system running Sqoop 1.3.0 that seems to filter the
newlines for the CLOB fields with adding the --map-column-java. Do you know
if 1.4.3 differs from 1.3.0 on how CLOB are handled?
-Glen
On Wed, Mar 11, 2015 at 10:03 AM, Abraham Elmahrek <ab...@cloudera.com> wrote:
> Hey man,
>
> Maybe regexp_replace + free-form query?
>
>
> http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
>
> -Abe
>
> On Tue, Mar 10, 2015 at 9:34 AM, Glen Hein <gl...@gmail.com> wrote:
>
>>
>> I have an oracle db were some of the records contain \n and \r
>> characters. I
>> m trying to use --hive-delims-replacement to convert them to spaces. I've
>> had mixed results. It seems that when the \n and \r appear at the beginning
>> of the fields data, then they are converted as expected. But when they are
>> embedded in the middle of the field data, then they are not being replaced.
>> Here's a hexdump showing the unconverted \n and \r characters:
>>
>> 000000c0 6e 75 6c 6c 01 48 65 6c 6c 6f 2c 0d 0a 20 20 20
>> |null.Hello,.. |
>> 000000d0 20 20 20 4d 79 20 6e 61 6d 65 20 69 73 20 4d 61 | My name
>> is Ma|
>> 000000e0 72 79 20 48 75 6e 74 2c 20 49 20 61 6d 20 31 39 |ry Hunt, I
>> am 19|
>> 000000f0 20 79 65 61 72 73 20 6f 6c 64 20 61 6e 64 20 6c | years old
>> and l|
>> 00000100 69 76 65 20 69 6e 20 43 61 6c 69 66 6f 72 6e 69 |ive in
>> Californi|
>> 00000110 61 2e 20 49 20 67 72 61 64 75 61 74 65 64 20 68 |a. I
>> graduated h|
>> 00000120 69 67 68 20 73 63 68 6f 6f 6c 20 6f 6e 20 32 30 |igh school
>> on 20|
>> 00000130 31 31 20 61 6e 64 20 64 65 63 69 64 65 64 20 74 |11 and
>> decided t|
>> 00000140 6f 20 74 61 6b 65 20 61 20 79 65 61 72 20 6f 66 |o take a
>> year of|
>> 00000150 66 20 74 6f 20 73 65 65 20 69 66 20 77 68 61 74 |f to see if
>> what|
>> 00000160 20 49 20 63 68 6f 73 65 20 61 73 20 61 20 66 75 | I chose as
>> a fu|
>> 00000170 74 75 72 65 20 63 61 72 65 65 72 20 69 73 20 77 |ture career
>> is w|
>> 00000180 68 61 74 20 49 20 72 65 61 6c 79 20 77 61 6e 74 |hat I realy
>> want|
>> 00000190 65 64 20 74 6f 20 64 6f 2e 20 4d 79 20 79 65 61 |ed to do. My
>> yea|
>> 000001a0 72 20 69 73 20 75 70 20 61 6e 64 20 49 20 73 74 |r is up and
>> I st|
>> 000001b0 69 6c 6c 20 77 61 6e 74 20 74 6f 20 62 65 20 61 |ill want to
>> be a|
>> 000001c0 20 74 65 61 63 68 65 72 2e 20 49 20 6c 6f 6f 6b | teacher. I
>> look|
>> 000001d0 20 66 6f 77 61 72 64 20 74 6f 20 6c 65 61 72 6e | foward to
>> learn|
>> 000001e0 69 6e 67 20 61 6c 6c 20 74 68 61 74 20 49 20 63 |ing all that
>> I c|
>> 000001f0 61 6e 20 61 6e 64 20 62 65 69 6e 67 20 61 6e 20 |an and being
>> an |
>> 00000200 65 6c 65 6d 65 6e 74 72 79 20 73 63 68 6f 6f 6c |elementry
>> school|
>> 00000210 20 74 65 61 63 68 65 72 2e 01 6e 75 6c 6c 01 31 |
>> teacher..null.1|
>>
>> You can see in the first row of data there is a "01" that starts the
>> field, and then a few characters later the 0d and 0a. I'm passing the
>> following option to sqoop:
>>
>> <arg>--hive-delims-replacement</arg>
>> <arg>ABCDEFG</arg>
>>
>> I've tried several different values for the actual replacement string,
>> but the results are always the same.
>>
>> Is there a better way to replace the unwanted characters?
>>
>> Thanks,
>> Glen
>>
>>
>
Re: having trouble with --hive-delims-replacement
Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey man,
Maybe regexp_replace + free-form query?
http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
-Abe
On Tue, Mar 10, 2015 at 9:34 AM, Glen Hein <gl...@gmail.com> wrote:
>
> I have an oracle db were some of the records contain \n and \r characters.
> I
> m trying to use --hive-delims-replacement to convert them to spaces. I've
> had mixed results. It seems that when the \n and \r appear at the beginning
> of the fields data, then they are converted as expected. But when they are
> embedded in the middle of the field data, then they are not being replaced.
> Here's a hexdump showing the unconverted \n and \r characters:
>
> 000000c0 6e 75 6c 6c 01 48 65 6c 6c 6f 2c 0d 0a 20 20 20
> |null.Hello,.. |
> 000000d0 20 20 20 4d 79 20 6e 61 6d 65 20 69 73 20 4d 61 | My name is
> Ma|
> 000000e0 72 79 20 48 75 6e 74 2c 20 49 20 61 6d 20 31 39 |ry Hunt, I am
> 19|
> 000000f0 20 79 65 61 72 73 20 6f 6c 64 20 61 6e 64 20 6c | years old
> and l|
> 00000100 69 76 65 20 69 6e 20 43 61 6c 69 66 6f 72 6e 69 |ive in
> Californi|
> 00000110 61 2e 20 49 20 67 72 61 64 75 61 74 65 64 20 68 |a. I
> graduated h|
> 00000120 69 67 68 20 73 63 68 6f 6f 6c 20 6f 6e 20 32 30 |igh school on
> 20|
> 00000130 31 31 20 61 6e 64 20 64 65 63 69 64 65 64 20 74 |11 and
> decided t|
> 00000140 6f 20 74 61 6b 65 20 61 20 79 65 61 72 20 6f 66 |o take a year
> of|
> 00000150 66 20 74 6f 20 73 65 65 20 69 66 20 77 68 61 74 |f to see if
> what|
> 00000160 20 49 20 63 68 6f 73 65 20 61 73 20 61 20 66 75 | I chose as a
> fu|
> 00000170 74 75 72 65 20 63 61 72 65 65 72 20 69 73 20 77 |ture career
> is w|
> 00000180 68 61 74 20 49 20 72 65 61 6c 79 20 77 61 6e 74 |hat I realy
> want|
> 00000190 65 64 20 74 6f 20 64 6f 2e 20 4d 79 20 79 65 61 |ed to do. My
> yea|
> 000001a0 72 20 69 73 20 75 70 20 61 6e 64 20 49 20 73 74 |r is up and I
> st|
> 000001b0 69 6c 6c 20 77 61 6e 74 20 74 6f 20 62 65 20 61 |ill want to
> be a|
> 000001c0 20 74 65 61 63 68 65 72 2e 20 49 20 6c 6f 6f 6b | teacher. I
> look|
> 000001d0 20 66 6f 77 61 72 64 20 74 6f 20 6c 65 61 72 6e | foward to
> learn|
> 000001e0 69 6e 67 20 61 6c 6c 20 74 68 61 74 20 49 20 63 |ing all that
> I c|
> 000001f0 61 6e 20 61 6e 64 20 62 65 69 6e 67 20 61 6e 20 |an and being
> an |
> 00000200 65 6c 65 6d 65 6e 74 72 79 20 73 63 68 6f 6f 6c |elementry
> school|
> 00000210 20 74 65 61 63 68 65 72 2e 01 6e 75 6c 6c 01 31 |
> teacher..null.1|
>
> You can see in the first row of data there is a "01" that starts the
> field, and then a few characters later the 0d and 0a. I'm passing the
> following option to sqoop:
>
> <arg>--hive-delims-replacement</arg>
> <arg>ABCDEFG</arg>
>
> I've tried several different values for the actual replacement string, but
> the results are always the same.
>
> Is there a better way to replace the unwanted characters?
>
> Thanks,
> Glen
>
>