You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vt...@cisco.com> on 2014/09/22 20:27:37 UTC

Convert new line chars from oracle to hive using sqoop

Hi All,

We are using '<EOL>' string( --hive-delims-replacement '<EOL>') to convert new lines chars in oracle fields while importing data into hive using sqoop.
According to sqoop documentation - http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_large_objects - above parameter should only replace either \n, \r or \01(^A) characters with '<EOL>'.
But we seeing that some special characters are also getting replaced to '<EOL>'

Our scenario:
Oracle Field

Hive Field

Notepad ++

Word

MEIKICOMPANY,LTD

MEIKI<EOL>COMPANY,LTD

[Screen capture]

MEIKI__COMPANY,LTD

AVENTIS@PHARMA

AVENTIS<EOL>@PHARMA

[Screen capture]

AVENTIS_@PHARMA


But, some character in above sample which is NOT visible in Oracle is being shown up as 'SOH' in notepad++ and as '_' in word which is being converted into <EOL> by sqoop.
Please help us understand this behavior.
What does these chars mean to sqoop/hive?
Is sqoop expected to replace these chars which doesn't fall under either \n, \r or \01(^A) ?
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com
Phone: +1 (408)838 4078

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA - 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html




Re: Convert new line chars from oracle to hive using sqoop

Posted by Abraham Elmahrek <ab...@cloudera.com>.
The special characters in your content are actually 0x1 and 0x15. The 0x15
is not being replaced, but the 0x1 is (as defined). You should be able to
use a free-form query to strip these special characters from your result
set.

-Abe

On Mon, Sep 22, 2014 at 4:00 PM, Vikash Talanki -X (vtalanki - INFOSYS
LIMITED at Cisco) <vt...@cisco.com> wrote:

>  Thanks Abhraham for the reply.
>
>
>
> Here is the hexadump output
>
>
>
> Oracle
>
> NotePad++
>
> Hexadump
>
> [image: Screen capture]
>
> [image: Screen capture]
>
> [image: Screen capture]
>
>
>
>
>
> Thanks,
>
> Vikash Talanki
>
> +1 (408)838-4078
>
>
>
> *From:* Abraham Elmahrek [mailto:abe@cloudera.com]
> *Sent:* Monday, September 22, 2014 2:17 PM
> *To:* user@sqoop.apache.org
> *Cc:* Indresh Padmonkar (ipadmonk); Anushree Shipra -X (ashipra); Phani
> Yadavilli -X (pyadavil)
> *Subject:* Re: Convert new line chars from oracle to hive using sqoop
>
>
>
> Hey there,
>
>
>
> Could you please export a few of these lines to a file and run a 'hexdump'
> on the file if possible? It would be interesting to see what exactly those
> characters are.
>
>
>
> -Abe
>
>
>
> On Mon, Sep 22, 2014 at 11:27 AM, Vikash Talanki -X (vtalanki - INFOSYS
> LIMITED at Cisco) <vt...@cisco.com> wrote:
>
> Hi All,
>
>
>
> We are using *‘<EOL>*’ string( *--hive-delims-replacement ‘<EOL>’*) to
> convert new lines chars in oracle fields while importing data into hive
> using sqoop.
>
> According to sqoop documentation -
> http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_large_objects –
> above parameter should only replace either *\n, \r or \01(^A)* characters
> with ‘<EOL>’.
>
> But we seeing that some special characters are also getting replaced to
> ‘<EOL>’
>
>
>
> Our scenario:
>
> *Oracle Field*
>
> *Hive Field*
>
> *Notepad ++*
>
> *Word*
>
> MEIKI COMPANY,LTD
>
> MEIKI<EOL> COMPANY,LTD
>
> [image: Screen capture]
>
> MEIKI__COMPANY,LTD
>
> AVENTIS@PHARMA
>
> AVENTIS<EOL>@PHARMA
>
> [image: Screen capture]
>
> AVENTIS_@PHARMA
>
>
>
> But, some character in above sample which is *NOT visible* in Oracle is
> being shown up as ‘*SOH*’ in notepad++ and as ‘*_*’ in word which is
> being converted into *<EOL>* by sqoop.
>
> Please help us understand this behavior.
>
> What does these chars mean to sqoop/hive?
>
> Is sqoop expected to replace these chars which doesn’t fall under either *\n,
> \r or \01(^A)* ?
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Vikash Talanki*
> Engineer - Software
> vtalanki@cisco.com
> Phone: *+1 (408)838 4078 <%2B1%20%28408%29838%204078>*
>
> *Cisco Systems Limited*
> SJ-J 3
> 255 W Tasman Dr
> San Jose
> CA – 95134
> United States
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
>
>
>
>
>
>

RE: Convert new line chars from oracle to hive using sqoop

Posted by "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vt...@cisco.com>.
Thanks Abhraham for the reply.

Here is the hexadump output

Oracle

NotePad++

Hexadump

[Screen capture]

[Screen capture]

[Screen capture]



Thanks,
Vikash Talanki
+1 (408)838-4078

From: Abraham Elmahrek [mailto:abe@cloudera.com]
Sent: Monday, September 22, 2014 2:17 PM
To: user@sqoop.apache.org
Cc: Indresh Padmonkar (ipadmonk); Anushree Shipra -X (ashipra); Phani Yadavilli -X (pyadavil)
Subject: Re: Convert new line chars from oracle to hive using sqoop

Hey there,

Could you please export a few of these lines to a file and run a 'hexdump' on the file if possible? It would be interesting to see what exactly those characters are.

-Abe

On Mon, Sep 22, 2014 at 11:27 AM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Hi All,

We are using ‘<EOL>’ string( --hive-delims-replacement ‘<EOL>’) to convert new lines chars in oracle fields while importing data into hive using sqoop.
According to sqoop documentation - http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_large_objects – above parameter should only replace either \n, \r or \01(^A) characters with ‘<EOL>’.
But we seeing that some special characters are also getting replaced to ‘<EOL>’

Our scenario:
Oracle Field

Hive Field

Notepad ++

Word

MEIKI COMPANY,LTD

MEIKI<EOL> COMPANY,LTD

[Screen capture]

MEIKI__COMPANY,LTD

AVENTIS@PHARMA

AVENTIS<EOL>@PHARMA

[Screen capture]

AVENTIS_@PHARMA


But, some character in above sample which is NOT visible in Oracle is being shown up as ‘SOH’ in notepad++ and as ‘_’ in word which is being converted into <EOL> by sqoop.
Please help us understand this behavior.
What does these chars mean to sqoop/hive?
Is sqoop expected to replace these chars which doesn’t fall under either \n, \r or \01(^A) ?
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com<ma...@cisco.com>
Phone: +1 (408)838 4078<tel:%2B1%20%28408%29838%204078>

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA – 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html





Re: Convert new line chars from oracle to hive using sqoop

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey there,

Could you please export a few of these lines to a file and run a 'hexdump'
on the file if possible? It would be interesting to see what exactly those
characters are.

-Abe

On Mon, Sep 22, 2014 at 11:27 AM, Vikash Talanki -X (vtalanki - INFOSYS
LIMITED at Cisco) <vt...@cisco.com> wrote:

>  Hi All,
>
>
>
> We are using *‘<EOL>*’ string( *--hive-delims-replacement ‘<EOL>’*) to
> convert new lines chars in oracle fields while importing data into hive
> using sqoop.
>
> According to sqoop documentation -
> http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_large_objects –
> above parameter should only replace either *\n, \r or \01(^A)* characters
> with ‘<EOL>’.
>
> But we seeing that some special characters are also getting replaced to
> ‘<EOL>’
>
>
>
> Our scenario:
>
> *Oracle Field*
>
> *Hive Field*
>
> *Notepad ++*
>
> *Word*
>
> MEIKI COMPANY,LTD
>
> MEIKI<EOL> COMPANY,LTD
>
> [image: Screen capture]
>
> MEIKI__COMPANY,LTD
>
> AVENTIS@PHARMA
>
> AVENTIS<EOL>@PHARMA
>
> [image: Screen capture]
>
> AVENTIS_@PHARMA
>
>
>
> But, some character in above sample which is *NOT visible* in Oracle is
> being shown up as ‘*SOH*’ in notepad++ and as ‘*_*’ in word which is
> being converted into *<EOL>* by sqoop.
>
> Please help us understand this behavior.
>
> What does these chars mean to sqoop/hive?
>
> Is sqoop expected to replace these chars which doesn’t fall under either *\n,
> \r or \01(^A)* ?
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Vikash Talanki*
> Engineer - Software
> vtalanki@cisco.com
> Phone: *+1 (408)838 4078 <%2B1%20%28408%29838%204078>*
>
> *Cisco Systems Limited*
> SJ-J 3
> 255 W Tasman Dr
> San Jose
> CA – 95134
> United States
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
>
>
>
>