You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Guillermo Ortiz <go...@pragsis.com> on 2014/10/05 15:24:47 UTC

InputFormat for dealing with log files.

I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map.
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel.
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
 in or redistribute any portion of this E-mail.

Re: InputFormat for dealing with log files.

Posted by Guillermo Ortiz <go...@pragsis.com>.
Thank you, I didn't know it. 
I have been looking for some benchmarks joni vs java (defauld package), do you know some web with results? Anyway, I'll try for myself tomorrow. 

----- Mensaje original -----

De: "Ted Yu" <yu...@gmail.com> 
Para: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 22:32:27 
Asunto: Re: InputFormat for dealing with log files. 

Regex processing is not that slow - when adopting best practices. 

This project provides better performance compared to that of Java's: 
https://github.com/jruby/joni 

Cheers 

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 



I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. 

Any better idea? 


De: "Ted Yu" < yuzhihong@gmail.com > 
Para: " common-user@hadoop.apache.org " < user@hadoop.apache.org > 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 

<blockquote>

I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL 
Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo. 
CONFIDENTIALITY WARNING. 
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail. 

</blockquote>




AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.


Re: InputFormat for dealing with log files.

Posted by Guillermo Ortiz <go...@pragsis.com>.
Thank you, I didn't know it. 
I have been looking for some benchmarks joni vs java (defauld package), do you know some web with results? Anyway, I'll try for myself tomorrow. 

----- Mensaje original -----

De: "Ted Yu" <yu...@gmail.com> 
Para: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 22:32:27 
Asunto: Re: InputFormat for dealing with log files. 

Regex processing is not that slow - when adopting best practices. 

This project provides better performance compared to that of Java's: 
https://github.com/jruby/joni 

Cheers 

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 



I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. 

Any better idea? 


De: "Ted Yu" < yuzhihong@gmail.com > 
Para: " common-user@hadoop.apache.org " < user@hadoop.apache.org > 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 

<blockquote>

I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL 
Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo. 
CONFIDENTIALITY WARNING. 
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail. 

</blockquote>




AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.


Re: InputFormat for dealing with log files.

Posted by Guillermo Ortiz <go...@pragsis.com>.
Thank you, I didn't know it. 
I have been looking for some benchmarks joni vs java (defauld package), do you know some web with results? Anyway, I'll try for myself tomorrow. 

----- Mensaje original -----

De: "Ted Yu" <yu...@gmail.com> 
Para: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 22:32:27 
Asunto: Re: InputFormat for dealing with log files. 

Regex processing is not that slow - when adopting best practices. 

This project provides better performance compared to that of Java's: 
https://github.com/jruby/joni 

Cheers 

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 



I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. 

Any better idea? 


De: "Ted Yu" < yuzhihong@gmail.com > 
Para: " common-user@hadoop.apache.org " < user@hadoop.apache.org > 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 

<blockquote>

I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL 
Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo. 
CONFIDENTIALITY WARNING. 
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail. 

</blockquote>




AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.


Re: InputFormat for dealing with log files.

Posted by Guillermo Ortiz <go...@pragsis.com>.
Thank you, I didn't know it. 
I have been looking for some benchmarks joni vs java (defauld package), do you know some web with results? Anyway, I'll try for myself tomorrow. 

----- Mensaje original -----

De: "Ted Yu" <yu...@gmail.com> 
Para: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 22:32:27 
Asunto: Re: InputFormat for dealing with log files. 

Regex processing is not that slow - when adopting best practices. 

This project provides better performance compared to that of Java's: 
https://github.com/jruby/joni 

Cheers 

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 



I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. 

Any better idea? 


De: "Ted Yu" < yuzhihong@gmail.com > 
Para: " common-user@hadoop.apache.org " < user@hadoop.apache.org > 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 

<blockquote>

I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL 
Este correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo. 
CONFIDENTIALITY WARNING. 
This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail. 

</blockquote>




AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.


Re: InputFormat for dealing with log files.

Posted by Ted Yu <yu...@gmail.com>.
Regex processing is not that slow - when adopting best practices.

This project provides better performance compared to that of Java's:
https://github.com/jruby/joni

Cheers

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz <go...@pragsis.com> wrote:

> I thought something like that,, but I guess it should be a little more
> complex because it should look for a pattern, maybe a date format? An idea
> it's if you know that the first 10 digits are the date, you could get them
> and try to match with a date format or something more generic like a RE,
> although it seems too expensive in time process and the operations in the
> InputFormat should be pretty fast.
>
> Any better idea?
>
> ------------------------------
> *De: *"Ted Yu" <yu...@gmail.com>
> *Para: *"common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Enviados: *Domingo, 5 de Octubre 2014 16:27:18
> *Asunto: *Re: InputFormat for dealing with log files.
>
> Have you read http://blog.rguha.net/?p=293?
>
> Cheers
>
> On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <go...@pragsis.com>
> wrote:
>
>>
>> I'd like to know if there's an InputFormat to be able to deal with log
>> files. The problem that I have it's that if I have to read an Tomcat log
>> for example, sometimes the exceptions are typed on several lines, but they
>> should be processed just like one line, I mean all the lines together to
>> the map.
>> Is there something like that implemented? I've been looking for, but I
>> don't find anything and I don't want to reinvent the wheel.
>> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
>> mismo es privada y confidencial y va dirigida exclusivamente a su
>> destinatario. Pragsis informa a quien pueda haber recibido este correo por
>> error que contiene información confidencial cuyo uso, copia, reproducción o
>> distribución está expresamente prohibida. Si no es Vd. el destinatario del
>> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
>> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
>> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
>> contained in or attached to it are private and confidential and intended
>> exclusively for the addressee. Pragsis informs to whom it may receive it in
>> error that it contains privileged information and its use, copy,
>> reproduction or distribution is prohibited. If you are not an intended
>> recipient of this E-mail, please notify the sender, delete it and do not
>> read, act upon, print, disclose, copy, reta
>>  in or redistribute any portion of this E-mail.
>>
>
>
>
> AVISO CONFIDENCIAL
> Este correo y la información contenida o adjunta al mismo es privada y
> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
> informa a quien pueda haber recibido este correo por error que contiene
> información confidencial cuyo uso, copia, reproducción o distribución está
> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
> este correo por error, le rogamos lo ponga en conocimiento del emisor y
> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún
> modo.
> CONFIDENTIALITY WARNING.
> This message and the information contained in or attached to it are
> private and confidential and intended exclusively for the addressee.
> Pragsis informs to whom it may receive it in error that it contains
> privileged information and its use, copy, reproduction or distribution is
> prohibited. If you are not an intended recipient of this E-mail, please
> notify the sender, delete it and do not read, act upon, print, disclose,
> copy, retain or redistribute any portion of this E-mail.
>

Re: InputFormat for dealing with log files.

Posted by Ted Yu <yu...@gmail.com>.
Regex processing is not that slow - when adopting best practices.

This project provides better performance compared to that of Java's:
https://github.com/jruby/joni

Cheers

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz <go...@pragsis.com> wrote:

> I thought something like that,, but I guess it should be a little more
> complex because it should look for a pattern, maybe a date format? An idea
> it's if you know that the first 10 digits are the date, you could get them
> and try to match with a date format or something more generic like a RE,
> although it seems too expensive in time process and the operations in the
> InputFormat should be pretty fast.
>
> Any better idea?
>
> ------------------------------
> *De: *"Ted Yu" <yu...@gmail.com>
> *Para: *"common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Enviados: *Domingo, 5 de Octubre 2014 16:27:18
> *Asunto: *Re: InputFormat for dealing with log files.
>
> Have you read http://blog.rguha.net/?p=293?
>
> Cheers
>
> On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <go...@pragsis.com>
> wrote:
>
>>
>> I'd like to know if there's an InputFormat to be able to deal with log
>> files. The problem that I have it's that if I have to read an Tomcat log
>> for example, sometimes the exceptions are typed on several lines, but they
>> should be processed just like one line, I mean all the lines together to
>> the map.
>> Is there something like that implemented? I've been looking for, but I
>> don't find anything and I don't want to reinvent the wheel.
>> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
>> mismo es privada y confidencial y va dirigida exclusivamente a su
>> destinatario. Pragsis informa a quien pueda haber recibido este correo por
>> error que contiene información confidencial cuyo uso, copia, reproducción o
>> distribución está expresamente prohibida. Si no es Vd. el destinatario del
>> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
>> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
>> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
>> contained in or attached to it are private and confidential and intended
>> exclusively for the addressee. Pragsis informs to whom it may receive it in
>> error that it contains privileged information and its use, copy,
>> reproduction or distribution is prohibited. If you are not an intended
>> recipient of this E-mail, please notify the sender, delete it and do not
>> read, act upon, print, disclose, copy, reta
>>  in or redistribute any portion of this E-mail.
>>
>
>
>
> AVISO CONFIDENCIAL
> Este correo y la información contenida o adjunta al mismo es privada y
> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
> informa a quien pueda haber recibido este correo por error que contiene
> información confidencial cuyo uso, copia, reproducción o distribución está
> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
> este correo por error, le rogamos lo ponga en conocimiento del emisor y
> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún
> modo.
> CONFIDENTIALITY WARNING.
> This message and the information contained in or attached to it are
> private and confidential and intended exclusively for the addressee.
> Pragsis informs to whom it may receive it in error that it contains
> privileged information and its use, copy, reproduction or distribution is
> prohibited. If you are not an intended recipient of this E-mail, please
> notify the sender, delete it and do not read, act upon, print, disclose,
> copy, retain or redistribute any portion of this E-mail.
>

Re: InputFormat for dealing with log files.

Posted by Ted Yu <yu...@gmail.com>.
Regex processing is not that slow - when adopting best practices.

This project provides better performance compared to that of Java's:
https://github.com/jruby/joni

Cheers

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz <go...@pragsis.com> wrote:

> I thought something like that,, but I guess it should be a little more
> complex because it should look for a pattern, maybe a date format? An idea
> it's if you know that the first 10 digits are the date, you could get them
> and try to match with a date format or something more generic like a RE,
> although it seems too expensive in time process and the operations in the
> InputFormat should be pretty fast.
>
> Any better idea?
>
> ------------------------------
> *De: *"Ted Yu" <yu...@gmail.com>
> *Para: *"common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Enviados: *Domingo, 5 de Octubre 2014 16:27:18
> *Asunto: *Re: InputFormat for dealing with log files.
>
> Have you read http://blog.rguha.net/?p=293?
>
> Cheers
>
> On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <go...@pragsis.com>
> wrote:
>
>>
>> I'd like to know if there's an InputFormat to be able to deal with log
>> files. The problem that I have it's that if I have to read an Tomcat log
>> for example, sometimes the exceptions are typed on several lines, but they
>> should be processed just like one line, I mean all the lines together to
>> the map.
>> Is there something like that implemented? I've been looking for, but I
>> don't find anything and I don't want to reinvent the wheel.
>> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
>> mismo es privada y confidencial y va dirigida exclusivamente a su
>> destinatario. Pragsis informa a quien pueda haber recibido este correo por
>> error que contiene información confidencial cuyo uso, copia, reproducción o
>> distribución está expresamente prohibida. Si no es Vd. el destinatario del
>> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
>> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
>> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
>> contained in or attached to it are private and confidential and intended
>> exclusively for the addressee. Pragsis informs to whom it may receive it in
>> error that it contains privileged information and its use, copy,
>> reproduction or distribution is prohibited. If you are not an intended
>> recipient of this E-mail, please notify the sender, delete it and do not
>> read, act upon, print, disclose, copy, reta
>>  in or redistribute any portion of this E-mail.
>>
>
>
>
> AVISO CONFIDENCIAL
> Este correo y la información contenida o adjunta al mismo es privada y
> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
> informa a quien pueda haber recibido este correo por error que contiene
> información confidencial cuyo uso, copia, reproducción o distribución está
> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
> este correo por error, le rogamos lo ponga en conocimiento del emisor y
> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún
> modo.
> CONFIDENTIALITY WARNING.
> This message and the information contained in or attached to it are
> private and confidential and intended exclusively for the addressee.
> Pragsis informs to whom it may receive it in error that it contains
> privileged information and its use, copy, reproduction or distribution is
> prohibited. If you are not an intended recipient of this E-mail, please
> notify the sender, delete it and do not read, act upon, print, disclose,
> copy, retain or redistribute any portion of this E-mail.
>

Re: InputFormat for dealing with log files.

Posted by Ted Yu <yu...@gmail.com>.
Regex processing is not that slow - when adopting best practices.

This project provides better performance compared to that of Java's:
https://github.com/jruby/joni

Cheers

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz <go...@pragsis.com> wrote:

> I thought something like that,, but I guess it should be a little more
> complex because it should look for a pattern, maybe a date format? An idea
> it's if you know that the first 10 digits are the date, you could get them
> and try to match with a date format or something more generic like a RE,
> although it seems too expensive in time process and the operations in the
> InputFormat should be pretty fast.
>
> Any better idea?
>
> ------------------------------
> *De: *"Ted Yu" <yu...@gmail.com>
> *Para: *"common-user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Enviados: *Domingo, 5 de Octubre 2014 16:27:18
> *Asunto: *Re: InputFormat for dealing with log files.
>
> Have you read http://blog.rguha.net/?p=293?
>
> Cheers
>
> On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <go...@pragsis.com>
> wrote:
>
>>
>> I'd like to know if there's an InputFormat to be able to deal with log
>> files. The problem that I have it's that if I have to read an Tomcat log
>> for example, sometimes the exceptions are typed on several lines, but they
>> should be processed just like one line, I mean all the lines together to
>> the map.
>> Is there something like that implemented? I've been looking for, but I
>> don't find anything and I don't want to reinvent the wheel.
>> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
>> mismo es privada y confidencial y va dirigida exclusivamente a su
>> destinatario. Pragsis informa a quien pueda haber recibido este correo por
>> error que contiene información confidencial cuyo uso, copia, reproducción o
>> distribución está expresamente prohibida. Si no es Vd. el destinatario del
>> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
>> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
>> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
>> contained in or attached to it are private and confidential and intended
>> exclusively for the addressee. Pragsis informs to whom it may receive it in
>> error that it contains privileged information and its use, copy,
>> reproduction or distribution is prohibited. If you are not an intended
>> recipient of this E-mail, please notify the sender, delete it and do not
>> read, act upon, print, disclose, copy, reta
>>  in or redistribute any portion of this E-mail.
>>
>
>
>
> AVISO CONFIDENCIAL
> Este correo y la información contenida o adjunta al mismo es privada y
> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
> informa a quien pueda haber recibido este correo por error que contiene
> información confidencial cuyo uso, copia, reproducción o distribución está
> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
> este correo por error, le rogamos lo ponga en conocimiento del emisor y
> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún
> modo.
> CONFIDENTIALITY WARNING.
> This message and the information contained in or attached to it are
> private and confidential and intended exclusively for the addressee.
> Pragsis informs to whom it may receive it in error that it contains
> privileged information and its use, copy, reproduction or distribution is
> prohibited. If you are not an intended recipient of this E-mail, please
> notify the sender, delete it and do not read, act upon, print, disclose,
> copy, retain or redistribute any portion of this E-mail.
>

Re: InputFormat for dealing with log files.

Posted by Guillermo Ortiz <go...@pragsis.com>.
I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. 

Any better idea? 

----- Mensaje original -----

De: "Ted Yu" <yu...@gmail.com> 
Para: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 



I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.


Re: InputFormat for dealing with log files.

Posted by Guillermo Ortiz <go...@pragsis.com>.
I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. 

Any better idea? 

----- Mensaje original -----

De: "Ted Yu" <yu...@gmail.com> 
Para: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 



I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.


Re: InputFormat for dealing with log files.

Posted by Guillermo Ortiz <go...@pragsis.com>.
I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. 

Any better idea? 

----- Mensaje original -----

De: "Ted Yu" <yu...@gmail.com> 
Para: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 



I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.


Re: InputFormat for dealing with log files.

Posted by Guillermo Ortiz <go...@pragsis.com>.
I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. 

Any better idea? 

----- Mensaje original -----

De: "Ted Yu" <yu...@gmail.com> 
Para: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gortiz@pragsis.com > wrote: 



I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.


Re: InputFormat for dealing with log files.

Posted by Ted Yu <yu...@gmail.com>.
Have you read http://blog.rguha.net/?p=293 ?

Cheers

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <go...@pragsis.com> wrote:

>
> I'd like to know if there's an InputFormat to be able to deal with log
> files. The problem that I have it's that if I have to read an Tomcat log
> for example, sometimes the exceptions are typed on several lines, but they
> should be processed just like one line, I mean all the lines together to
> the map.
> Is there something like that implemented? I've been looking for, but I
> don't find anything and I don't want to reinvent the wheel.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
> mismo es privada y confidencial y va dirigida exclusivamente a su
> destinatario. Pragsis informa a quien pueda haber recibido este correo por
> error que contiene información confidencial cuyo uso, copia, reproducción o
> distribución está expresamente prohibida. Si no es Vd. el destinatario del
> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
> contained in or attached to it are private and confidential and intended
> exclusively for the addressee. Pragsis informs to whom it may receive it in
> error that it contains privileged information and its use, copy,
> reproduction or distribution is prohibited. If you are not an intended
> recipient of this E-mail, please notify the sender, delete it and do not
> read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.
>

Re: InputFormat for dealing with log files.

Posted by Ted Yu <yu...@gmail.com>.
Have you read http://blog.rguha.net/?p=293 ?

Cheers

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <go...@pragsis.com> wrote:

>
> I'd like to know if there's an InputFormat to be able to deal with log
> files. The problem that I have it's that if I have to read an Tomcat log
> for example, sometimes the exceptions are typed on several lines, but they
> should be processed just like one line, I mean all the lines together to
> the map.
> Is there something like that implemented? I've been looking for, but I
> don't find anything and I don't want to reinvent the wheel.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
> mismo es privada y confidencial y va dirigida exclusivamente a su
> destinatario. Pragsis informa a quien pueda haber recibido este correo por
> error que contiene información confidencial cuyo uso, copia, reproducción o
> distribución está expresamente prohibida. Si no es Vd. el destinatario del
> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
> contained in or attached to it are private and confidential and intended
> exclusively for the addressee. Pragsis informs to whom it may receive it in
> error that it contains privileged information and its use, copy,
> reproduction or distribution is prohibited. If you are not an intended
> recipient of this E-mail, please notify the sender, delete it and do not
> read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.
>

Re: InputFormat for dealing with log files.

Posted by Ted Yu <yu...@gmail.com>.
Have you read http://blog.rguha.net/?p=293 ?

Cheers

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <go...@pragsis.com> wrote:

>
> I'd like to know if there's an InputFormat to be able to deal with log
> files. The problem that I have it's that if I have to read an Tomcat log
> for example, sometimes the exceptions are typed on several lines, but they
> should be processed just like one line, I mean all the lines together to
> the map.
> Is there something like that implemented? I've been looking for, but I
> don't find anything and I don't want to reinvent the wheel.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
> mismo es privada y confidencial y va dirigida exclusivamente a su
> destinatario. Pragsis informa a quien pueda haber recibido este correo por
> error que contiene información confidencial cuyo uso, copia, reproducción o
> distribución está expresamente prohibida. Si no es Vd. el destinatario del
> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
> contained in or attached to it are private and confidential and intended
> exclusively for the addressee. Pragsis informs to whom it may receive it in
> error that it contains privileged information and its use, copy,
> reproduction or distribution is prohibited. If you are not an intended
> recipient of this E-mail, please notify the sender, delete it and do not
> read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.
>

Re: InputFormat for dealing with log files.

Posted by Ted Yu <yu...@gmail.com>.
Have you read http://blog.rguha.net/?p=293 ?

Cheers

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <go...@pragsis.com> wrote:

>
> I'd like to know if there's an InputFormat to be able to deal with log
> files. The problem that I have it's that if I have to read an Tomcat log
> for example, sometimes the exceptions are typed on several lines, but they
> should be processed just like one line, I mean all the lines together to
> the map.
> Is there something like that implemented? I've been looking for, but I
> don't find anything and I don't want to reinvent the wheel.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
> mismo es privada y confidencial y va dirigida exclusivamente a su
> destinatario. Pragsis informa a quien pueda haber recibido este correo por
> error que contiene información confidencial cuyo uso, copia, reproducción o
> distribución está expresamente prohibida. Si no es Vd. el destinatario del
> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
> contained in or attached to it are private and confidential and intended
> exclusively for the addressee. Pragsis informs to whom it may receive it in
> error that it contains privileged information and its use, copy,
> reproduction or distribution is prohibited. If you are not an intended
> recipient of this E-mail, please notify the sender, delete it and do not
> read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.
>