You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2018/02/07 10:31:41 UTC

Test cases for Drill-6104: Added Logfile Reader

Hello all, 
I submitted this PR for a logfile parser for Drill (https://github.com/apache/drill/pull/1114 <https://github.com/apache/drill/pull/1114>) .  I need to write unit tests for it, however I really have no idea how to do so.  Could someone point me to an example or something so that the PR will pass the CI tests?
TIA,
- C




Re: Test cases for Drill-6104: Added Logfile Reader

Posted by Charles Givre <cg...@gmail.com>.
HAHAHA!
I totally agree.  The log parser I wrote does support some cool stuff.  In addition to breaking up the fields, you can specify data types for each field as well as date and time formats.  The way I set it up in the configuration is not elegant, and would welcome input as to how to do it better.  

> On Feb 8, 2018, at 01:00, Ted Dunning <te...@gmail.com> wrote:
> 
> Awesome.
> 
> I personally think that the only practical solution for multiline logging
> is mandatory sentencing guidelines at the federal level.
> 
> 
> 
> On Wed, Feb 7, 2018 at 4:08 PM, Charles Givre <cg...@gmail.com> wrote:
> 
>> Hi Kunal,
>> As implemented it doesn’t do multiline logfiles.  I wrote this for a
>> specific client a while ago and it’s proven VERY useful so I thought I’d
>> contribute it.
>> I would like to get this in there and then add multiline capability.
>> — C
>> 
>> 
>> 
>>> On Feb 7, 2018, at 21:28, Kunal Khatua <kk...@mapr.com> wrote:
>>> 
>>> I think I'm jumping the gun, because I haven’t yet tried out your PR.
>>> 
>>> But to explain why I mentioned LogStash is because the primary challenge
>> (IMO) of creating a log file reader is that the format can be wildly
>> different and there is no standard format. So, what is needed is a good
>> mechanism to consume the logs with the right Regex feature. LogStash comes
>> with a Grok parser that does (IMHO) a fantastic job of parsing & tokenizing
>> the logs.
>>> 
>>> The logback XML that I have for drill defines this format:
>>> <appender name="FILE" class="ch.qos.logback.core.
>> rolling.RollingFileAppender">
>>>  <encoder>
>>>     <pattern>%date{ISO8601} %property{HOSTNAME} [%thread] %-5level
>> %logger{36} - %msg%n</pattern>
>>>  </encoder>
>>> </appender>
>>> 
>>> The one that comes default with Drill is
>>> <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
>>>   <encoder>
>>>     <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} -
>> %msg%n</pattern>
>>>   </encoder>
>>> </appender>
>>> And
>>>   <appender name="FILE" class="ch.qos.logback.core.
>> rolling.RollingFileAppender">
>>>     <encoder>
>>>       <pattern>%date{ISO8601} [%thread] %-5level %logger{36} -
>> %msg%n</pattern>
>>>     </encoder>
>>>   </appender>
>>> 
>>> Notice how all three patterns are different.
>>> 
>>> A quick glance of the PR hints towards a fairly limited scope of log
>> files that can be processed (though I could be wrong).
>>> 
>>> A good way to test the log reader should be to simply look at the web
>> UI's http://<hostname>:8047/logs link and pick out those logs for
>> processing/parsing.
>>> 
>>> I did stitch up something using ELK (ElasticSearch+LogStash+Kibana) to
>> process Drill logs, but that was back in 2015. If we can get something like
>> that into a storage plugin for Drill, that would probably go much farther.
>> I could share what I did back then and figure out a way to use that
>> approach and libraries to leverage this.
>>> 
>>> -----Original Message-----
>>> From: Charles Givre [mailto:cgivre@gmail.com]
>>> Sent: Wednesday, February 07, 2018 1:08 PM
>>> To: dev@drill.apache.org
>>> Subject: Re: Test cases for Drill-6104: Added Logfile Reader
>>> 
>>> Hi Kunal,
>>> I just don’t know how to craft one with all the Drill internals.  Is
>> there an example that I you can point me to?
>>> 
>>>> On Feb 7, 2018, at 18:38, Kunal Khatua <kk...@mapr.com> wrote:
>>>> 
>>>> How about using the Drill logs as a use case?
>>>> 
>>>> You have drillbit.out and drillbit_hostname.log to consume. It would be
>> interesting to see how multiline log entries are handled.
>>>> 
>>>> Logstash does an excellent job IMO, but that's more for parsing.
>>>> 
>>>> -----Original Message-----
>>>> From: Charles Givre [mailto:cgivre@gmail.com]
>>>> Sent: Wednesday, February 07, 2018 2:32 AM
>>>> To: dev@drill.apache.org
>>>> Subject: Test cases for Drill-6104: Added Logfile Reader
>>>> 
>>>> Hello all,
>>>> I submitted this PR for a logfile parser for Drill (https://urldefense.
>> proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_
>> pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
>> oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_
>> zJpJjkPEB_2jT1WsujT0&e= <https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=
>> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-
>> 85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_
>> zJpJjkPEB_2jT1WsujT0&e=>) .  I need to write unit tests for it, however I
>> really have no idea how to do so.  Could someone point me to an example or
>> something so that the PR will pass the CI tests?
>>>> TIA,
>>>> - C
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 


Re: Test cases for Drill-6104: Added Logfile Reader

Posted by Ted Dunning <te...@gmail.com>.
Awesome.

I personally think that the only practical solution for multiline logging
is mandatory sentencing guidelines at the federal level.



On Wed, Feb 7, 2018 at 4:08 PM, Charles Givre <cg...@gmail.com> wrote:

> Hi Kunal,
> As implemented it doesn’t do multiline logfiles.  I wrote this for a
> specific client a while ago and it’s proven VERY useful so I thought I’d
> contribute it.
> I would like to get this in there and then add multiline capability.
> — C
>
>
>
> > On Feb 7, 2018, at 21:28, Kunal Khatua <kk...@mapr.com> wrote:
> >
> > I think I'm jumping the gun, because I haven’t yet tried out your PR.
> >
> > But to explain why I mentioned LogStash is because the primary challenge
> (IMO) of creating a log file reader is that the format can be wildly
> different and there is no standard format. So, what is needed is a good
> mechanism to consume the logs with the right Regex feature. LogStash comes
> with a Grok parser that does (IMHO) a fantastic job of parsing & tokenizing
> the logs.
> >
> > The logback XML that I have for drill defines this format:
> > <appender name="FILE" class="ch.qos.logback.core.
> rolling.RollingFileAppender">
> >   <encoder>
> >      <pattern>%date{ISO8601} %property{HOSTNAME} [%thread] %-5level
> %logger{36} - %msg%n</pattern>
> >   </encoder>
> > </appender>
> >
> > The one that comes default with Drill is
> >  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
> >    <encoder>
> >      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} -
> %msg%n</pattern>
> >    </encoder>
> >  </appender>
> > And
> >    <appender name="FILE" class="ch.qos.logback.core.
> rolling.RollingFileAppender">
> >      <encoder>
> >        <pattern>%date{ISO8601} [%thread] %-5level %logger{36} -
> %msg%n</pattern>
> >      </encoder>
> >    </appender>
> >
> > Notice how all three patterns are different.
> >
> > A quick glance of the PR hints towards a fairly limited scope of log
> files that can be processed (though I could be wrong).
> >
> > A good way to test the log reader should be to simply look at the web
> UI's http://<hostname>:8047/logs link and pick out those logs for
> processing/parsing.
> >
> > I did stitch up something using ELK (ElasticSearch+LogStash+Kibana) to
> process Drill logs, but that was back in 2015. If we can get something like
> that into a storage plugin for Drill, that would probably go much farther.
> I could share what I did back then and figure out a way to use that
> approach and libraries to leverage this.
> >
> > -----Original Message-----
> > From: Charles Givre [mailto:cgivre@gmail.com]
> > Sent: Wednesday, February 07, 2018 1:08 PM
> > To: dev@drill.apache.org
> > Subject: Re: Test cases for Drill-6104: Added Logfile Reader
> >
> > Hi Kunal,
> > I just don’t know how to craft one with all the Drill internals.  Is
> there an example that I you can point me to?
> >
> >> On Feb 7, 2018, at 18:38, Kunal Khatua <kk...@mapr.com> wrote:
> >>
> >> How about using the Drill logs as a use case?
> >>
> >> You have drillbit.out and drillbit_hostname.log to consume. It would be
> interesting to see how multiline log entries are handled.
> >>
> >> Logstash does an excellent job IMO, but that's more for parsing.
> >>
> >> -----Original Message-----
> >> From: Charles Givre [mailto:cgivre@gmail.com]
> >> Sent: Wednesday, February 07, 2018 2:32 AM
> >> To: dev@drill.apache.org
> >> Subject: Test cases for Drill-6104: Added Logfile Reader
> >>
> >> Hello all,
> >> I submitted this PR for a logfile parser for Drill (https://urldefense.
> proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_
> pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=
> oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_
> zJpJjkPEB_2jT1WsujT0&e= <https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=
> cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-
> 85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_
> zJpJjkPEB_2jT1WsujT0&e=>) .  I need to write unit tests for it, however I
> really have no idea how to do so.  Could someone point me to an example or
> something so that the PR will pass the CI tests?
> >> TIA,
> >> - C
> >>
> >>
> >>
> >
>
>

Re: Test cases for Drill-6104: Added Logfile Reader

Posted by Charles Givre <cg...@gmail.com>.
Hi Kunal, 
As implemented it doesn’t do multiline logfiles.  I wrote this for a specific client a while ago and it’s proven VERY useful so I thought I’d contribute it.  
I would like to get this in there and then add multiline capability.  
— C



> On Feb 7, 2018, at 21:28, Kunal Khatua <kk...@mapr.com> wrote:
> 
> I think I'm jumping the gun, because I haven’t yet tried out your PR.
> 
> But to explain why I mentioned LogStash is because the primary challenge (IMO) of creating a log file reader is that the format can be wildly different and there is no standard format. So, what is needed is a good mechanism to consume the logs with the right Regex feature. LogStash comes with a Grok parser that does (IMHO) a fantastic job of parsing & tokenizing the logs.
> 
> The logback XML that I have for drill defines this format:
> <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
>   <encoder>
>      <pattern>%date{ISO8601} %property{HOSTNAME} [%thread] %-5level %logger{36} - %msg%n</pattern>
>   </encoder>
> </appender>
> 
> The one that comes default with Drill is
>  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
>    <encoder>
>      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
>    </encoder>
>  </appender>
> And 
>    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
>      <encoder>
>        <pattern>%date{ISO8601} [%thread] %-5level %logger{36} - %msg%n</pattern>
>      </encoder>
>    </appender>
> 
> Notice how all three patterns are different. 
> 
> A quick glance of the PR hints towards a fairly limited scope of log files that can be processed (though I could be wrong).  
> 
> A good way to test the log reader should be to simply look at the web UI's http://<hostname>:8047/logs link and pick out those logs for processing/parsing.
> 
> I did stitch up something using ELK (ElasticSearch+LogStash+Kibana) to process Drill logs, but that was back in 2015. If we can get something like that into a storage plugin for Drill, that would probably go much farther. I could share what I did back then and figure out a way to use that approach and libraries to leverage this. 
> 
> -----Original Message-----
> From: Charles Givre [mailto:cgivre@gmail.com] 
> Sent: Wednesday, February 07, 2018 1:08 PM
> To: dev@drill.apache.org
> Subject: Re: Test cases for Drill-6104: Added Logfile Reader
> 
> Hi Kunal, 
> I just don’t know how to craft one with all the Drill internals.  Is there an example that I you can point me to?
> 
>> On Feb 7, 2018, at 18:38, Kunal Khatua <kk...@mapr.com> wrote:
>> 
>> How about using the Drill logs as a use case?
>> 
>> You have drillbit.out and drillbit_hostname.log to consume. It would be interesting to see how multiline log entries are handled.
>> 
>> Logstash does an excellent job IMO, but that's more for parsing.
>> 
>> -----Original Message-----
>> From: Charles Givre [mailto:cgivre@gmail.com] 
>> Sent: Wednesday, February 07, 2018 2:32 AM
>> To: dev@drill.apache.org
>> Subject: Test cases for Drill-6104: Added Logfile Reader
>> 
>> Hello all, 
>> I submitted this PR for a logfile parser for Drill (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_zJpJjkPEB_2jT1WsujT0&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_zJpJjkPEB_2jT1WsujT0&e=>) .  I need to write unit tests for it, however I really have no idea how to do so.  Could someone point me to an example or something so that the PR will pass the CI tests?
>> TIA,
>> - C
>> 
>> 
>> 
> 


RE: Test cases for Drill-6104: Added Logfile Reader

Posted by Kunal Khatua <kk...@mapr.com>.
I think I'm jumping the gun, because I haven’t yet tried out your PR.

But to explain why I mentioned LogStash is because the primary challenge (IMO) of creating a log file reader is that the format can be wildly different and there is no standard format. So, what is needed is a good mechanism to consume the logs with the right Regex feature. LogStash comes with a Grok parser that does (IMHO) a fantastic job of parsing & tokenizing the logs.

The logback XML that I have for drill defines this format:
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
   <encoder>
      <pattern>%date{ISO8601} %property{HOSTNAME} [%thread] %-5level %logger{36} - %msg%n</pattern>
   </encoder>
</appender>

The one that comes default with Drill is
  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
    </encoder>
  </appender>
And 
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
      <encoder>
        <pattern>%date{ISO8601} [%thread] %-5level %logger{36} - %msg%n</pattern>
      </encoder>
    </appender>

Notice how all three patterns are different. 

A quick glance of the PR hints towards a fairly limited scope of log files that can be processed (though I could be wrong).  

A good way to test the log reader should be to simply look at the web UI's http://<hostname>:8047/logs link and pick out those logs for processing/parsing.

I did stitch up something using ELK (ElasticSearch+LogStash+Kibana) to process Drill logs, but that was back in 2015. If we can get something like that into a storage plugin for Drill, that would probably go much farther. I could share what I did back then and figure out a way to use that approach and libraries to leverage this. 

-----Original Message-----
From: Charles Givre [mailto:cgivre@gmail.com] 
Sent: Wednesday, February 07, 2018 1:08 PM
To: dev@drill.apache.org
Subject: Re: Test cases for Drill-6104: Added Logfile Reader

Hi Kunal, 
I just don’t know how to craft one with all the Drill internals.  Is there an example that I you can point me to?

> On Feb 7, 2018, at 18:38, Kunal Khatua <kk...@mapr.com> wrote:
> 
> How about using the Drill logs as a use case?
> 
> You have drillbit.out and drillbit_hostname.log to consume. It would be interesting to see how multiline log entries are handled.
> 
> Logstash does an excellent job IMO, but that's more for parsing.
> 
> -----Original Message-----
> From: Charles Givre [mailto:cgivre@gmail.com] 
> Sent: Wednesday, February 07, 2018 2:32 AM
> To: dev@drill.apache.org
> Subject: Test cases for Drill-6104: Added Logfile Reader
> 
> Hello all, 
> I submitted this PR for a logfile parser for Drill (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_zJpJjkPEB_2jT1WsujT0&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_zJpJjkPEB_2jT1WsujT0&e=>) .  I need to write unit tests for it, however I really have no idea how to do so.  Could someone point me to an example or something so that the PR will pass the CI tests?
> TIA,
> - C
> 
> 
> 


Re: Test cases for Drill-6104: Added Logfile Reader

Posted by Charles Givre <cg...@gmail.com>.
Hi Kunal, 
I just don’t know how to craft one with all the Drill internals.  Is there an example that I you can point me to?

> On Feb 7, 2018, at 18:38, Kunal Khatua <kk...@mapr.com> wrote:
> 
> How about using the Drill logs as a use case?
> 
> You have drillbit.out and drillbit_hostname.log to consume. It would be interesting to see how multiline log entries are handled.
> 
> Logstash does an excellent job IMO, but that's more for parsing.
> 
> -----Original Message-----
> From: Charles Givre [mailto:cgivre@gmail.com] 
> Sent: Wednesday, February 07, 2018 2:32 AM
> To: dev@drill.apache.org
> Subject: Test cases for Drill-6104: Added Logfile Reader
> 
> Hello all, 
> I submitted this PR for a logfile parser for Drill (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_zJpJjkPEB_2jT1WsujT0&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_zJpJjkPEB_2jT1WsujT0&e=>) .  I need to write unit tests for it, however I really have no idea how to do so.  Could someone point me to an example or something so that the PR will pass the CI tests?
> TIA,
> - C
> 
> 
> 


RE: Test cases for Drill-6104: Added Logfile Reader

Posted by Kunal Khatua <kk...@mapr.com>.
How about using the Drill logs as a use case?

You have drillbit.out and drillbit_hostname.log to consume. It would be interesting to see how multiline log entries are handled.

Logstash does an excellent job IMO, but that's more for parsing.

-----Original Message-----
From: Charles Givre [mailto:cgivre@gmail.com] 
Sent: Wednesday, February 07, 2018 2:32 AM
To: dev@drill.apache.org
Subject: Test cases for Drill-6104: Added Logfile Reader

Hello all, 
I submitted this PR for a logfile parser for Drill (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_zJpJjkPEB_2jT1WsujT0&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIFAg&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oyYUEV4U-85UnHzphkWP57ikKiUPhdBpBw7F9HZGZZ4&s=rmM0FHOFV2_cyScnz1qtDz_zJpJjkPEB_2jT1WsujT0&e=>) .  I need to write unit tests for it, however I really have no idea how to do so.  Could someone point me to an example or something so that the PR will pass the CI tests?
TIA,
- C