You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by Casey Stella <ce...@gmail.com> on 2017/01/12 20:51:53 UTC

[DISCUSS] Turning off indexing writers feature discussion

As of METRON-652 <https://github.com/apache/incubator-metron/pull/415>, we
will have decoupled the indexing configuration from the enrichment
configuration.  As an immediate follow-up to that, I'd like to provide the
ability to turn off and on writers via the configs.  I'd like to get some
community feedback on how the functionality should work, if y'all are
amenable. :)


As of now, we have 3 possible writers which can be used in the indexing
topology:

   - Solr
   - Elasticsearch
   - HDFS

HDFS is always used, elasticsearch or solr is used depending on how you
start the indexing topology.

A couple of proposals come to mind immediately:

*Index Filtering*

You would be able to specify a filter as defined by a stellar statement
(likely a reuse of the StellarFilter that exists in the Parsers) which
would allow you to indicate on a message-by-message basis whether or not to
write the message.

The semantics of this would be as follows:

   - Default (i.e. unspecified) is to pass everything through (hence
   backwards compatible with the current default config).
   - Messages which have the associated stellar statement evaluate to true
   for the writer type will be written, otherwise not.


Sample indexing config which would write out no messages to HDFS and write
out only messages containing a field called "field1":
{
   "index" : "squid"
  ,"batchSize" : 100
  ,"filters" : {
      "HDFS" : "false"
     ,"ES" : "exists(field1)"
                 }
}

*Index On/Off Switch*

A simpler solution would be to just provide a list of writers to write
messages.  The semantics would be as follows:

   - If the list is unspecified, then the default is to write all messages
   for every writer in the indexing topology
   - If the list is specified, then a writer will write all messages if and
   only if it is named in the list.

Sample indexing config which turns off HDFS and keeps on Elasticsearch:
{
   "index" : "squid"
  ,"batchSize" : 100
  ,"writers" : [ "ES" ]
}

Thanks in advance for the feedback!  Also, if you have any other, better
ideas than the ones presented here, let me know too.

Best,

Casey

Re: [DISCUSS] Turning off indexing writers feature discussion

Posted by Otto Fowler <ot...@gmail.com>.
I prefer option1 with stellar, although I’m concerned that in a real world
scenario the amount of filters and rules might be large, and some thought
about the structure of the rule expressions for maintainability etc will
need to be considered.


On January 12, 2017 at 15:52:03, Casey Stella (cestella@gmail.com) wrote:

As of METRON-652 <https://github.com/apache/incubator-metron/pull/415>, we
will have decoupled the indexing configuration from the enrichment
configuration. As an immediate follow-up to that, I'd like to provide the
ability to turn off and on writers via the configs. I'd like to get some
community feedback on how the functionality should work, if y'all are
amenable. :)


As of now, we have 3 possible writers which can be used in the indexing
topology:

- Solr
- Elasticsearch
- HDFS

HDFS is always used, elasticsearch or solr is used depending on how you
start the indexing topology.

A couple of proposals come to mind immediately:

*Index Filtering*

You would be able to specify a filter as defined by a stellar statement
(likely a reuse of the StellarFilter that exists in the Parsers) which
would allow you to indicate on a message-by-message basis whether or not to
write the message.

The semantics of this would be as follows:

- Default (i.e. unspecified) is to pass everything through (hence
backwards compatible with the current default config).
- Messages which have the associated stellar statement evaluate to true
for the writer type will be written, otherwise not.


Sample indexing config which would write out no messages to HDFS and write
out only messages containing a field called "field1":
{
"index" : "squid"
,"batchSize" : 100
,"filters" : {
"HDFS" : "false"
,"ES" : "exists(field1)"
}
}

*Index On/Off Switch*

A simpler solution would be to just provide a list of writers to write
messages. The semantics would be as follows:

- If the list is unspecified, then the default is to write all messages
for every writer in the indexing topology
- If the list is specified, then a writer will write all messages if and
only if it is named in the list.

Sample indexing config which turns off HDFS and keeps on Elasticsearch:
{
"index" : "squid"
,"batchSize" : 100
,"writers" : [ "ES" ]
}

Thanks in advance for the feedback! Also, if you have any other, better
ideas than the ones presented here, let me know too.

Best,

Casey

Re: [DISCUSS] Turning off indexing writers feature discussion

Posted by Michael Miklavcic <mi...@gmail.com>.
I like the flexibility and expressibility of the first option with Stellar
filters.

M

On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella <ce...@gmail.com> wrote:

> As of METRON-652 <https://github.com/apache/incubator-metron/pull/415>, we
> will have decoupled the indexing configuration from the enrichment
> configuration.  As an immediate follow-up to that, I'd like to provide the
> ability to turn off and on writers via the configs.  I'd like to get some
> community feedback on how the functionality should work, if y'all are
> amenable. :)
>
>
> As of now, we have 3 possible writers which can be used in the indexing
> topology:
>
>    - Solr
>    - Elasticsearch
>    - HDFS
>
> HDFS is always used, elasticsearch or solr is used depending on how you
> start the indexing topology.
>
> A couple of proposals come to mind immediately:
>
> *Index Filtering*
>
> You would be able to specify a filter as defined by a stellar statement
> (likely a reuse of the StellarFilter that exists in the Parsers) which
> would allow you to indicate on a message-by-message basis whether or not to
> write the message.
>
> The semantics of this would be as follows:
>
>    - Default (i.e. unspecified) is to pass everything through (hence
>    backwards compatible with the current default config).
>    - Messages which have the associated stellar statement evaluate to true
>    for the writer type will be written, otherwise not.
>
>
> Sample indexing config which would write out no messages to HDFS and write
> out only messages containing a field called "field1":
> {
>    "index" : "squid"
>   ,"batchSize" : 100
>   ,"filters" : {
>       "HDFS" : "false"
>      ,"ES" : "exists(field1)"
>                  }
> }
>
> *Index On/Off Switch*
>
> A simpler solution would be to just provide a list of writers to write
> messages.  The semantics would be as follows:
>
>    - If the list is unspecified, then the default is to write all messages
>    for every writer in the indexing topology
>    - If the list is specified, then a writer will write all messages if and
>    only if it is named in the list.
>
> Sample indexing config which turns off HDFS and keeps on Elasticsearch:
> {
>    "index" : "squid"
>   ,"batchSize" : 100
>   ,"writers" : [ "ES" ]
> }
>
> Thanks in advance for the feedback!  Also, if you have any other, better
> ideas than the ones presented here, let me know too.
>
> Best,
>
> Casey
>