You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "Bowling.Steve" <St...@SunTrust.com> on 2019/08/26 13:11:14 UTC

data corruption while loading onto Hadoop

Perhaps someone here can point me in the direction of an answer.

I loaded a 123.5 M row table onto Hadoop using a SAS data step. After completion the reported rows on Hadoop are 212M. With some investigation, the additional 89M rows are coming from an embedded ASCII 13 character. If the data table is cleaned of "off-keyboard" characters, ASCII < 32 and > 126, the data step loads and the correct rows are reported.

We cannot clean 100's of Tb of data. Is there a system parameter on Hadoop that could help?

Thanks so much,
Steve

LEGAL DISCLAIMER
The information transmitted is intended solely for the individual or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this email in error please contact the sender and delete the material from any computer.
By replying to this e-mail, you consent to SunTrust's monitoring activities of all communication that occurs on SunTrust's systems.
SunTrust is a federally registered service mark of SunTrust Banks, Inc.
[ST:XCL]