You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Raj Hadoop <ha...@yahoo.com> on 2013/07/08 22:52:05 UTC

Special characters in web log file causing issues


Hi ,
 
The log file that I am trying to load throuh Hive has some special characters 
 
The field is shown below and the special characters ¿¿are also shown.
     Shockwave Flash
    in;Motive ManagementPlug-in;Google Update;Java(TM)Platform SE 7U21;McAfee SiteAdvisor;McAfee Virtual Technician;Windows     Live¿¿ Photo Gallery;McAfee SecurityCenter;Silverlig
 
 
The above is causing the record to be terminated and loading another line.  How can I avoid this type of issues and how to load the proper data ? Any suggestions please.
Thanks,
Raj;Chrome Remote Desktop Viewer;NativeClient;Chrome PDF Viewer;Adobe Acrobat;Microsoft Office 2010;Motive Plug- 

Re: Special characters in web log file causing issues

Posted by Nitin Pawar <ni...@gmail.com>.
yes Raj,

thats a unix command


On Tue, Jul 9, 2013 at 6:48 AM, Hadoop Raj <ha...@yahoo.com> wrote:

> Hi Sanjay,
>
> Is that a unix trap command or any other thing? Please let me know.
>
>
> Sent from my iPhone
>
> On Jul 8, 2013, at 7:46 PM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:
>
> U may have to remove non-printable chars first, save an intermediate file
> and then load into Hive
>
>  tr -cd '[:print:]\r\n\t'
>
>  Or if u have *strings* function that will only output printable chars
>
>
>   From: Raj Hadoop <ha...@yahoo.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>, Raj Hadoop <
> hadoopraj@yahoo.com>
> Date: Monday, July 8, 2013 1:52 PM
> To: Hive <us...@hive.apache.org>
> Subject: Special characters in web log file causing issues
>
>
>   Hi ,
>
> The log file that I am trying to load throuh Hive has some special
> characters
>
> The field is shown below and the special characters *¿¿***are also shown.
>
>      Shockwave Flash;Chrome Remote Desktop Viewer;Native Client;Chrome
> PDF Viewer;Adobe Acrobat;Microsoft Office 2010;Motive Plug-
>      in;Motive Management Plug-in;Google Update;Java(TM) Platform SE 7 U21
> ;McAfee SiteAdvisor;McAfee Virtual Technician;Windows     Live*¿¿ *Photo
> Gallery;McAfee SecurityCenter;Silverlig
>
>
> The above is causing the record to be terminated and loading another
> line.  How can I avoid this type of issues and how to load the proper data
> ? Any suggestions please.
>
>  Thanks,
> Raj
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>


-- 
Nitin Pawar

Re: Special characters in web log file causing issues

Posted by Hadoop Raj <ha...@yahoo.com>.
Hi Sanjay,

Is that a unix trap command or any other thing? Please let me know.


Sent from my iPhone

On Jul 8, 2013, at 7:46 PM, Sanjay Subramanian <Sa...@wizecommerce.com> wrote:

> U may have to remove non-printable chars first, save an intermediate file and then load into Hive
> 
> tr -cd '[:print:]\r\n\t'
> 
> Or if u have strings function that will only output printable chars
> 
> 
> From: Raj Hadoop <ha...@yahoo.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>, Raj Hadoop <ha...@yahoo.com>
> Date: Monday, July 8, 2013 1:52 PM
> To: Hive <us...@hive.apache.org>
> Subject: Special characters in web log file causing issues
> 
>  
> Hi ,
>  
> The log file that I am trying to load throuh Hive has some special characters 
>  
> The field is shown below and the special characters ¿¿are also shown.
>  
>     Shockwave Flash;Chrome Remote Desktop Viewer;Native Client;Chrome PDF Viewer;Adobe Acrobat;Microsoft Office 2010;Motive Plug-
>     in;Motive Management Plug-in;Google Update;Java(TM) Platform SE 7 U21;McAfee SiteAdvisor;McAfee Virtual Technician;Windows     Live¿¿ Photo Gallery;McAfee SecurityCenter;Silverlig
>  
>  
> The above is causing the record to be terminated and loading another line.  How can I avoid this type of issues and how to load the proper data ? Any suggestions please.
>  
> Thanks,
> Raj
> 
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Special characters in web log file causing issues

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
U may have to remove non-printable chars first, save an intermediate file and then load into Hive

tr -cd '[:print:]\r\n\t'

Or if u have strings function that will only output printable chars


From: Raj Hadoop <ha...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, Raj Hadoop <ha...@yahoo.com>>
Date: Monday, July 8, 2013 1:52 PM
To: Hive <us...@hive.apache.org>>
Subject: Special characters in web log file causing issues


Hi ,

The log file that I am trying to load throuh Hive has some special characters

The field is shown below and the special characters ¿¿are also shown.

    Shockwave Flash;Chrome Remote Desktop Viewer;Native Client;Chrome PDF Viewer;Adobe Acrobat;Microsoft Office 2010;Motive Plug-
    in;Motive Management Plug-in;Google Update;Java(TM) Platform SE 7 U21;McAfee SiteAdvisor;McAfee Virtual Technician;Windows     Live¿¿ Photo Gallery;McAfee SecurityCenter;Silverlig


The above is causing the record to be terminated and loading another line.  How can I avoid this type of issues and how to load the proper data ? Any suggestions please.

Thanks,
Raj

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.