You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by Casey Stella <ce...@gmail.com> on 2016/09/24 22:35:05 UTC

[DISCUSS] Management of grok parser inside of the REPL

Grok-based parsers are an important part of Metron and Grok is an extremely
adaptable, if complex, language for specifying structure in data.

Right now, the flow is generally:

   1. Get some sample data
   2. Use an online utility to test grok statements
   3. upload grok statement to HDFS
   4. restart parser topology

Given the demo around managing stellar transformations inside of the REPL,
I began to think that it might be a more coherent and integrated experience
to manage grok inside of the REPL.

This would involve stellar functions to

   - test grok statements
   - read grok statements from HDFS
   - push grok statements to HDFS

I made a few JIRAs to outline it here
<https://issues.apache.org/jira/browse/METRON-454>.

Re: [DISCUSS] Management of grok parser inside of the REPL

Posted by Carolyn Duby <cd...@hortonworks.com>.
Having a utility to test ingest of a log format is very helpful.   For example having a UI where you can start with a log and then work through the resulting events.  A unit testing framework is helpful too for supporting upgrades and validating new versions of the logs.  For example a repository of log samples plus their known good results.  Run the logs through grok and verify that the results are correct.

Thanks
Carolyn



On 9/25/16, 10:30 PM, "James Sirota" <js...@apache.org> wrote:

>I think having the ability to validate Grok statements from within the REPL is useful.  It would save the user the manual step of pasting Grok expressions and sample logs into Grok validators.  Also, if we can potentially embed grok validators into a map-only job we can instantly see what % of our corpus the expression can parse and can get a dump of every log line it cannot parse.  So I think you are on to something here.
>
>James 
>
>25.09.2016, 07:40, "Otto Fowler" <ot...@gmail.com>:
>> Casey,
>>
>> Would things implemented this way be a better way to automate these flows (
>> built in error handling and checking as standards ) and not just for a user
>> at the terminal?
>>
>> On September 24, 2016 at 19:02:59, Casey Stella (cestella@gmail.com) wrote:
>>
>> Sorry, I meant to ask if anyone had feedback or thoughts.
>> On Sat, Sep 24, 2016 at 18:35 Casey Stella <ce...@gmail.com> wrote:
>>
>>>  Grok-based parsers are an important part of Metron and Grok is an
>>>  extremely adaptable, if complex, language for specifying structure in
>>
>> data.
>>>  Right now, the flow is generally:
>>>
>>>  1. Get some sample data
>>>  2. Use an online utility to test grok statements
>>>  3. upload grok statement to HDFS
>>>  4. restart parser topology
>>>
>>>  Given the demo around managing stellar transformations inside of the
>>
>> REPL,
>>>  I began to think that it might be a more coherent and integrated
>>
>> experience
>>>  to manage grok inside of the REPL.
>>>
>>>  This would involve stellar functions to
>>>
>>>  - test grok statements
>>>  - read grok statements from HDFS
>>>  - push grok statements to HDFS
>>>
>>>  I made a few JIRAs to outline it here
>>>  <https://issues.apache.org/jira/browse/METRON-454>.
>
>------------------- 
>Thank you,
>
>James Sirota
>PPMC- Apache Metron (Incubating)
>jsirota AT apache DOT org
>

Re: [DISCUSS] Management of grok parser inside of the REPL

Posted by "Tseytlin, Keren" <Ke...@capitalone.com>.
I think it would be a cool feature to have Stellar be able to do Grok Parsing!

Best,
Keren

On 9/25/16, 10:30 PM, "James Sirota" <js...@apache.org> wrote:

    I think having the ability to validate Grok statements from within the REPL is useful.  It would save the user the manual step of pasting Grok expressions and sample logs into Grok validators.  Also, if we can potentially embed grok validators into a map-only job we can instantly see what % of our corpus the expression can parse and can get a dump of every log line it cannot parse.  So I think you are on to something here.
    
    James 
    
    25.09.2016, 07:40, "Otto Fowler" <ot...@gmail.com>:
    > Casey,
    >
    > Would things implemented this way be a better way to automate these flows (
    > built in error handling and checking as standards ) and not just for a user
    > at the terminal?
    >
    > On September 24, 2016 at 19:02:59, Casey Stella (cestella@gmail.com) wrote:
    >
    > Sorry, I meant to ask if anyone had feedback or thoughts.
    > On Sat, Sep 24, 2016 at 18:35 Casey Stella <ce...@gmail.com> wrote:
    >
    >>  Grok-based parsers are an important part of Metron and Grok is an
    >>  extremely adaptable, if complex, language for specifying structure in
    >
    > data.
    >>  Right now, the flow is generally:
    >>
    >>  1. Get some sample data
    >>  2. Use an online utility to test grok statements
    >>  3. upload grok statement to HDFS
    >>  4. restart parser topology
    >>
    >>  Given the demo around managing stellar transformations inside of the
    >
    > REPL,
    >>  I began to think that it might be a more coherent and integrated
    >
    > experience
    >>  to manage grok inside of the REPL.
    >>
    >>  This would involve stellar functions to
    >>
    >>  - test grok statements
    >>  - read grok statements from HDFS
    >>  - push grok statements to HDFS
    >>
    >>  I made a few JIRAs to outline it here
    >>  <https://issues.apache.org/jira/browse/METRON-454>.
    
    ------------------- 
    Thank you,
    
    James Sirota
    PPMC- Apache Metron (Incubating)
    jsirota AT apache DOT org
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: [DISCUSS] Management of grok parser inside of the REPL

Posted by James Sirota <js...@apache.org>.
I think having the ability to validate Grok statements from within the REPL is useful.  It would save the user the manual step of pasting Grok expressions and sample logs into Grok validators.  Also, if we can potentially embed grok validators into a map-only job we can instantly see what % of our corpus the expression can parse and can get a dump of every log line it cannot parse.  So I think you are on to something here.

James 

25.09.2016, 07:40, "Otto Fowler" <ot...@gmail.com>:
> Casey,
>
> Would things implemented this way be a better way to automate these flows (
> built in error handling and checking as standards ) and not just for a user
> at the terminal?
>
> On September 24, 2016 at 19:02:59, Casey Stella (cestella@gmail.com) wrote:
>
> Sorry, I meant to ask if anyone had feedback or thoughts.
> On Sat, Sep 24, 2016 at 18:35 Casey Stella <ce...@gmail.com> wrote:
>
>> �Grok-based parsers are an important part of Metron and Grok is an
>> �extremely adaptable, if complex, language for specifying structure in
>
> data.
>> �Right now, the flow is generally:
>>
>> �1. Get some sample data
>> �2. Use an online utility to test grok statements
>> �3. upload grok statement to HDFS
>> �4. restart parser topology
>>
>> �Given the demo around managing stellar transformations inside of the
>
> REPL,
>> �I began to think that it might be a more coherent and integrated
>
> experience
>> �to manage grok inside of the REPL.
>>
>> �This would involve stellar functions to
>>
>> �- test grok statements
>> �- read grok statements from HDFS
>> �- push grok statements to HDFS
>>
>> �I made a few JIRAs to outline it here
>> �<https://issues.apache.org/jira/browse/METRON-454>.

-------------------�
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Re: [DISCUSS] Management of grok parser inside of the REPL

Posted by Otto Fowler <ot...@gmail.com>.
Shell service?

-- 

Sent with Airmail

On September 25, 2016 at 10:54:47, Casey Stella (cestella@gmail.com) wrote:

> The thought has crossed my mind, in fact I made sure that you can execute
> sets of stellar statements by cat script | $METRON_HOME/bin/stellar -z
> $ZK_QUORUM
>
> In fact that is how I generated the examples in the docs for the
> management functions PR at
>
> https://github.com/cestella/incubator-metron/blob/00a46745f316c21f8ea2c24caeda4d2c2239da97/metron-platform/metron-management/README.md
>
> The challenge at the moment is the startup time to search the class path
> for stellar functions at shell start. I have some ideas to make that go
> away, but until that happens you will have to eat the cost per script run.
>
> In general just like the python command is nice as a repl, it also works
> for executing scripts. We could do something similar with stellar.
>
> Casey
>
> On Sun, Sep 25, 2016 at 10:40 Otto Fowler <ot...@gmail.com> wrote:
>
>> Casey,
>>
>> Would things implemented this way be a better way to automate these flows
>> ( built in error handling and checking as standards ) and not just for a
>> user at the terminal?
>>
>> On September 24, 2016 at 19:02:59, Casey Stella (cestella@gmail.com)
>> wrote:
>>
>> Sorry, I meant to ask if anyone had feedback or thoughts.
>> On Sat, Sep 24, 2016 at 18:35 Casey Stella <ce...@gmail.com> wrote:
>>
>> > Grok-based parsers are an important part of Metron and Grok is an
>> > extremely adaptable, if complex, language for specifying structure in
>> data.
>> >
>> >
>> > Right now, the flow is generally:
>> >
>>
>> > 1. Get some sample data
>> > 2. Use an online utility to test grok statements
>> > 3. upload grok statement to HDFS
>> > 4. restart parser topology
>>
>> >
>> > Given the demo around managing stellar transformations inside of the
>> REPL,
>> > I began to think that it might be a more coherent and integrated
>> experience
>> > to manage grok inside of the REPL.
>> >
>> > This would involve stellar functions to
>> >
>>
>> > - test grok statements
>> > - read grok statements from HDFS
>> > - push grok statements to HDFS
>>
>> >
>> > I made a few JIRAs to outline it here
>>
>> > <https://issues.apache.org/jira/browse/METRON-454>.
>> >
>>
>>

Re: [DISCUSS] Management of grok parser inside of the REPL

Posted by Casey Stella <ce...@gmail.com>.
The thought has crossed my mind, in fact I made sure that you can execute
sets of stellar statements by cat script | $METRON_HOME/bin/stellar -z
$ZK_QUORUM

In fact that is how I generated the examples in the docs for the management
functions PR at
https://github.com/cestella/incubator-metron/blob/00a46745f316c21f8ea2c24caeda4d2c2239da97/metron-platform/metron-management/README.md

The challenge at the moment is the startup time to search the class path
for stellar functions at shell start. I have some ideas to make that go
away, but until that happens you will have to eat the cost per script run.

In general just like the python command is nice as a repl, it also works
for executing scripts. We could do something similar with stellar.

Casey

On Sun, Sep 25, 2016 at 10:40 Otto Fowler <ot...@gmail.com> wrote:

> Casey,
>
> Would things implemented this way be a better way to automate these flows
> ( built in error handling and checking as standards ) and not just for a
> user at the terminal?
>
> On September 24, 2016 at 19:02:59, Casey Stella (cestella@gmail.com)
> wrote:
>
> Sorry, I meant to ask if anyone had feedback or thoughts.
> On Sat, Sep 24, 2016 at 18:35 Casey Stella <ce...@gmail.com> wrote:
>
> > Grok-based parsers are an important part of Metron and Grok is an
> > extremely adaptable, if complex, language for specifying structure in
> data.
> >
> >
> > Right now, the flow is generally:
> >
>
> > 1. Get some sample data
> > 2. Use an online utility to test grok statements
> > 3. upload grok statement to HDFS
> > 4. restart parser topology
>
> >
> > Given the demo around managing stellar transformations inside of the
> REPL,
> > I began to think that it might be a more coherent and integrated
> experience
> > to manage grok inside of the REPL.
> >
> > This would involve stellar functions to
> >
>
> > - test grok statements
> > - read grok statements from HDFS
> > - push grok statements to HDFS
>
> >
> > I made a few JIRAs to outline it here
>
> > <https://issues.apache.org/jira/browse/METRON-454>.
> >
>
>

Re: [DISCUSS] Management of grok parser inside of the REPL

Posted by Otto Fowler <ot...@gmail.com>.
Casey,

Would things implemented this way be a better way to automate these flows (
built in error handling and checking as standards ) and not just for a user
at the terminal?

On September 24, 2016 at 19:02:59, Casey Stella (cestella@gmail.com) wrote:

Sorry, I meant to ask if anyone had feedback or thoughts.
On Sat, Sep 24, 2016 at 18:35 Casey Stella <ce...@gmail.com> wrote:

> Grok-based parsers are an important part of Metron and Grok is an
> extremely adaptable, if complex, language for specifying structure in
data.
>
>
> Right now, the flow is generally:
>
> 1. Get some sample data
> 2. Use an online utility to test grok statements
> 3. upload grok statement to HDFS
> 4. restart parser topology
>
> Given the demo around managing stellar transformations inside of the
REPL,
> I began to think that it might be a more coherent and integrated
experience
> to manage grok inside of the REPL.
>
> This would involve stellar functions to
>
> - test grok statements
> - read grok statements from HDFS
> - push grok statements to HDFS
>
> I made a few JIRAs to outline it here
> <https://issues.apache.org/jira/browse/METRON-454>.
>

Re: [DISCUSS] Management of grok parser inside of the REPL

Posted by Casey Stella <ce...@gmail.com>.
Sorry, I meant to ask if anyone had feedback or thoughts.
On Sat, Sep 24, 2016 at 18:35 Casey Stella <ce...@gmail.com> wrote:

> Grok-based parsers are an important part of Metron and Grok is an
> extremely adaptable, if complex, language for specifying structure in data.
>
>
> Right now, the flow is generally:
>
>    1. Get some sample data
>    2. Use an online utility to test grok statements
>    3. upload grok statement to HDFS
>    4. restart parser topology
>
> Given the demo around managing stellar transformations inside of the REPL,
> I began to think that it might be a more coherent and integrated experience
> to manage grok inside of the REPL.
>
> This would involve stellar functions to
>
>    - test grok statements
>    - read grok statements from HDFS
>    - push grok statements to HDFS
>
> I made a few JIRAs to outline it here
> <https://issues.apache.org/jira/browse/METRON-454>.
>