You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@metron.apache.org by Charles Joynt <Ch...@gresearch.co.uk> on 2018/06/01 10:26:45 UTC

Writing enrichment data directly from NiFi with PutHBaseJSON

Hello,

I work as a Dev/Ops Data Engineer within the security team at a company in London where we are in the process of implementing Metron. I have been tasked with implementing feeds of network environment data into HBase so that this data can be used as enrichment sources for our security events. First-off I wanted to pull in DNS data for an internal domain.

I am assuming that I need to write data into HBase in such a way that it exactly matches what I would get from the flatfile_loader.sh script. A colleague of mine has already loaded some DNS data using that script, so I am using that as a reference.

I have implemented a flow in NiFi which takes JSON data from a HTTP listener and routes it to a PutHBaseJSON processor. The flow is working, in the sense that data is successfully written to HBase, but despite (naively) specifying "Row Identifier Encoding Strategy = Binary", the results in HBase don't look correct. Comparing the output from HBase scan commands I see:

flatfile_loader.sh produced:

ROW: \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\x0E192.168.0.198
CELL: column=data:v, timestamp=1516896203840, value={"clientname":"server.domain.local","clientip":"192.168.0.198"}

PutHBaseJSON produced:

ROW: server.domain.local
CELL: column=dns:v, timestamp=1527778603783, value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}

From source JSON:

{"k":"server.domain.local","v":{"name":"server.domain.local","type":"A","data":"192.168.0.198"}}

I know that there are some differences in column family / field names, but my worry is the ROW id. Presumably I need to encode my row key, "k" in the JSON data, in a way that matches how the flatfile_loader.sh script did it.

Can anyone explain how I might convert my Id to the correct format?
-or-
Does this matter-can Metron use the human-readable ROW ids?

Charlie Joynt

--------------
G-RESEARCH believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis of any claim, demand or cause of action.
The information in this email is intended only for the named recipient. If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this e-mail address will be logged by G-RESEARCH and are subject to archival storage, monitoring, review and disclosure.
G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Trenchant Limited is a company registered in England with company number 08127121.
--------------

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Carolyn Duby <cd...@hortonworks.com>.

Hi Charles - 

I think your best bet is to create a csv file and use the flatfile_loader.sh  This will be easier and you won’t have to worry if the format of Hbase storage changes:

https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#loading-utilities


The flat file loader is located here:

https://github.com/apache/metron/blob/master/metron-platform/metron-data-management/src/main/scripts/flatfile_loader.sh


Here is an example of an enrichment that maps a userid to a user category.

Here is the csv mapping the userid to a category.  For example tsausner has user category BAD_GUY.

[centos@metron-demo-4 rangeraudit]$ cat user_enrichment.csv 
tsausner,BAD_GUY 
ndhanase,CONTRACTOR 
svelagap,ADMIN 
jprivite,EMPLOYEE 
nolan,EMPLOYEE

Create an extractor config file that maps the columns of the csv file to enrichments.  The indicator_column is the key for the enrichment.   


[centos@metron-demo-4 rangeraudit]$ cat user_extraction.json 
{
  "config" : {
    "columns" : {
         "user_id" : 0
        ,"user_category" : 1 
    }
    ,"indicator_column" : "user_id"
    ,"type" : "user_categorization"
    ,"separator" : ","
  }
  ,"extractor" : "CSV"
}

This is an optional step where you can specify where to use the enrichments in Metron, when you import the enrichment data.  You can skip this step if the enrichments are already configured or you can add them later.
This config file applies the user_categorization enrichment using the reqUser field as the key.  
 
[centos@metron-demo-4 rangeraudit]$ cat rangeradmin_user_category_enrichment.json 
{
  "zkQuorum": "metron-demo-2.field.hortonworks.com:2181,metron-demo-0.field.hortonworks.com:2181,metron-demo-1.field.hortonworks.com:2181",
  "sensorToFieldList": {
    "rangeradmin": {
      "type": "ENRICHMENT",
      "fieldToEnrichmentTypes": {
        "reqUser": [
          "user_categorization"
        ]
      }
    }
  }
}

The command below imports the enrichment mappings into Hbase and adds the enrichment to the rangeradmin sensor data.   The result is that when a ranger admin event is enriched, metron will use the reqUser field value as a key into the user_categorization enrichment.  If the value of the field is present in the CSV data the enriched event will have a new field indicating the user category:

[centos@metron-demo-4 rangeraudit]$ /usr/hcp/1.4.0.0-38/metron/bin/flatfile_loader.sh -e user_extraction.json -t enrichment -i user_enrichment.csv -c t -n rangeradmin_user_category_enrichment.json


Base will look similar to this:

hbase(main):002:0> scan 'enrichment'
ROW                                  COLUMN+CELL                                                                                               
 \x01\x12\x8Bjx@d.\xF3\xBF\xD3\xB2\x column=t:v, timestamp=1518118740456, value={"user_category":"BAD_GUY ","user_id":"tsausner"}              
 81\xEB\xB5\xD2\x00\x13user_categori                                                                                                           
 zation\x00\x08tsausner                                                                                                                        
 /\xA8\xEB\xB1\xE0N\xBE\xCBv?\xCAz9\ column=t:v, timestamp=1518118740540, value={"user_category":"ADMIN ","user_id":"svelagap"}                
 xF6;\xD3\x00\x13user_categorization                                                                                                           
 \x00\x08svelagap                                                                                                                              
 l\xF1F\x83t\xD6x\xF9\xBEwrk3\x00M2\ column=t:v, timestamp=1518118740522, value={"user_category":"CONTRACTOR ","user_id":"ndhanase"}           
 x00\x13user_categorization\x00\x08n                                                                                                           
 dhanase  



After the enrichment data is in Hbase, create an event and add it to the rangeradmin topic.  For example if the reqUser field is set to nnolan, the enriched event will have the following fields:

enrichments:hbaseEnrichment:reqUser:user_categorization:user_category
EMPLOYEE

enrichments:hbaseEnrichment:reqUser:user_categorization:user_id
nnolan



Thanks

Carolyn Duby
Solutions Engineer, Northeast
cduby@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>








On 6/1/18, 6:26 AM, "Charles Joynt" <Ch...@gresearch.co.uk> wrote:

>Hello,
>
>I work as a Dev/Ops Data Engineer within the security team at a company in London where we are in the process of implementing Metron. I have been tasked with implementing feeds of network environment data into HBase so that this data can be used as enrichment sources for our security events. First-off I wanted to pull in DNS data for an internal domain.
>
>I am assuming that I need to write data into HBase in such a way that it exactly matches what I would get from the flatfile_loader.sh script. A colleague of mine has already loaded some DNS data using that script, so I am using that as a reference.
>
>I have implemented a flow in NiFi which takes JSON data from a HTTP listener and routes it to a PutHBaseJSON processor. The flow is working, in the sense that data is successfully written to HBase, but despite (naively) specifying "Row Identifier Encoding Strategy = Binary", the results in HBase don't look correct. Comparing the output from HBase scan commands I see:
>
>flatfile_loader.sh produced:
>
>ROW:      \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\x0E192.168.0.198
>CELL: column=data:v, timestamp=1516896203840, value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>
>PutHBaseJSON produced:
>
>ROW:  server.domain.local
>CELL: column=dns:v, timestamp=1527778603783, value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
>From source JSON:
>
>{"k":"server.domain.local","v":{"name":"server.domain.local","type":"A","data":"192.168.0.198"}}
>
>I know that there are some differences in column family / field names, but my worry is the ROW id. Presumably I need to encode my row key, "k" in the JSON data, in a way that matches how the flatfile_loader.sh script did it.
>
>Can anyone explain how I might convert my Id to the correct format?
>-or-
>Does this matter-can Metron use the human-readable ROW ids?
>
>Charlie Joynt
>
>--------------
>G-RESEARCH believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis of any claim, demand or cause of action.
>The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
>All messages sent to and from this e-mail address will be logged by G-RESEARCH and are subject to archival storage, monitoring, review and disclosure.
>G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>Trenchant Limited is a company registered in England with company number 08127121.
>--------------

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Otto Fowler <ot...@gmail.com>.

-jiras-


On June 13, 2018 at 10:30:26, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

That’s where something like the Nifi solution would come in...

With the PutEnrichment processor and a ProcessHttpRequest processor, you do
have a web service for loading enrichments.

We could probably also create a rest service end point for it, which would
make some sense, but there is a nice multi-source, queuing, and lineage
element to the nifi solution.

Simon

> On 13 Jun 2018, at 15:04, Casey Stella <ce...@gmail.com> wrote:
>
> no, sadly we do not.
>
>> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com>
wrote:
>>
>> Agreed….Streaming enrichments is the right solution for DNS data.
>>
>> Do we have a web service for writing enrichments?
>>
>> Carolyn Duby
>> Solutions Engineer, Northeast
>> cduby@hortonworks.com
>> +1.508.965.0584
>>
>> Join my team!
>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
>> Solutions Engineer – Boston - http://grnh.se/8gbxy41
>> Need Answers? Try https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 6/13/18, 6:25 AM, "Charles Joynt" <Ch...@gresearch.co.uk>
>> wrote:
>>
>>> Regarding why I didn't choose to load data with the flatfile loader
>> script...
>>>
>>> I want to be able to SEND enrichment data to Metron rather than have to
>> set up cron jobs to PULL data. At the moment I'm trying to prove that
the
>> process works with a simple data source. In the future we will want
>> enrichment data in Metron that comes from systems (e.g. HR databases)
that
>> I won't have access to, hence will need someone to be able to send us
the
>> data.
>>>
>>>> Carolyn: just call the flat file loader from a script processor...
>>>
>>> I didn't believe that would work in my environment. I'm pretty sure the
>> script has dependencies on various Metron JARs, not least for the row id
>> hashing algorithm. I suppose this would require at least a partial
install
>> of Metron alongside NiFi, and would introduce additional work on the
NiFi
>> cluster for any Metron upgrade. In some (enterprise) environments there
>> might be separation of ownership between NiFi and Metron.
>>>
>>> I also prefer not to have a Java app calling a bash script which calls
a
>> new java process, with logs or error output that might just get
swallowed
>> up invisibly. Somewhere down the line this could hold up effective
>> troubleshooting.
>>>
>>>> Simon: I have actually written a stellar processor, which applies
>> stellar to all FlowFile attributes...
>>>
>>> Gulp.
>>>
>>>> Simon: what didn't you like about the flatfile loader script?
>>>
>>> The flatfile loader script has worked fine for me when prepping
>> enrichment data in test systems, however it was a bit of a chore to get
the
>> JSON configuration files set up, especially for "wide" data sources that
>> may have 15-20 fields, e.g. Active Directory.
>>>
>>> More broadly speaking, I want to embrace the streaming data paradigm
and
>> tried to avoid batch jobs. With the DNS example, you might imagine a
future
>> where the enrichment data is streamed based on DHCP registrations, DNS
>> update events, etc. In principle this could reduce the window of time
where
>> we might enrich a data source with out-of-date data.
>>>
>>> Charlie
>>>
>>> -----Original Message-----
>>> From: Carolyn Duby [mailto:cduby@hortonworks.com]
>>> Sent: 12 June 2018 20:33
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with
PutHBaseJSON
>>>
>>> I like the streaming enrichment solutions but it depends on how you are
>> getting the data in. If you get the data in a csv file just call the
flat
>> file loader from a script processor. No special Nifi required.
>>>
>>> If the enrichments don’t arrive in bulk, the streaming solution is
better.
>>>
>>> Thanks
>>> Carolyn Duby
>>> Solutions Engineer, Northeast
>>> cduby@hortonworks.com
>>> +1.508.965.0584
>>>
>>> Join my team!
>>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
>> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
>> https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>>
>>>
>>> On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com>

>> wrote:
>>>
>>>> Good solution. The streaming enrichment writer makes a lot of sense
for
>>>> this, especially if you're not using huge enrichment sources that need
>>>> the batch based loaders.
>>>>
>>>> As it happens I have written most of a NiFi processor to handle this
>>>> use case directly - both non-record and Record based, especially for
>> Otto :).
>>>> The one thing we need to figure out now is where to host that, and how
>>>> to handle releases of a nifi-metron-bundle. I'll probably get round to
>>>> putting the code in my github at least in the next few days, while we
>>>> figure out a more permanent home.
>>>>
>>>> Charlie, out of curiosity, what didn't you like about the flatfile
>>>> loader script?
>>>>
>>>> Simon
>>>>
>>>> On 12 June 2018 at 18:00, Charles Joynt <Ch...@gresearch.co.uk>

>>>> wrote:
>>>>
>>>>> Thanks for the responses. I appreciate the willingness to look at
>>>>> creating a NiFi processer. That would be great!
>>>>>
>>>>> Just to follow up on this (after a week looking after the "ops" side
>>>>> of
>>>>> dev-ops): I really don't want to have to use the flatfile loader
>>>>> script, and I'm not going to be able to write a Metron-style HBase
>>>>> key generator any time soon, but I have had some success with a
>> different approach.
>>>>>
>>>>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>>>>> 192.168.0.198"
>>>>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>>>>
>>>>> I then followed your instructions in this blog:
>>>>> https://cwiki.apache.org/confluence/display/METRON/
>>>>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>>>>> ent
>>>>>
>>>>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
>>>>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
>>>>> into HBase:
>>>>>
>>>>> {
>>>>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>>>> "writerClassName": "org.apache.metron.enrichment.writer.
>>>>> SimpleHbaseEnrichmentWriter",
>>>>> "sensorTopic": "dns",
>>>>> "parserConfig": {
>>>>> "shew.table": " dns",
>>>>> "shew.cf": "dns",
>>>>> "shew.keyColumns": "name",
>>>>> "shew.enrichmentType": "dns",
>>>>> "columns": {
>>>>> "name": 0,
>>>>> "type": 1,
>>>>> "data": 2
>>>>> }
>>>>> },
>>>>> }
>>>>>
>>>>> And... it seems to be working. At least, I have data in HBase which
>>>>> looks more like the output of the flatfile loader.
>>>>>
>>>>> Charlie
>>>>>
>>>>> -----Original Message-----
>>>>> From: Casey Stella [mailto:cestella@gmail.com]
>>>>> Sent: 05 June 2018 14:56
>>>>> To: dev@metron.apache.org
>>>>> Subject: Re: Writing enrichment data directly from NiFi with
>>>>> PutHBaseJSON
>>>>>
>>>>> The problem, as you correctly diagnosed, is the key in HBase. We
>>>>> construct the key very specifically in Metron, so it's unlikely to
>>>>> work out of the box with the NiFi processor unfortunately. The key
>>>>> that we use is formed here in the codebase:
>>>>> https://github.com/cestella/incubator-metron/blob/master/
>>>>> metron-platform/metron-enrichment/src/main/java/org/
>>>>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>>>>
>>>>> To put that in english, consider the following:
>>>>>
>>>>> - type - The enrichment type
>>>>> - indicator - the indicator to use
>>>>> - hash(*) - A murmur 3 128bit hash function
>>>>>
>>>>> the key is hash(indicator) + type + indicator
>>>>>
>>>>> This hash prefixing is a standard practice in hbase key design that
>>>>> allows the keys to be uniformly distributed among the regions and
>>>>> prevents hotspotting. Depending on how the PutHBaseJSON processor
>>>>> works, if you can construct the key and pass it in, then you might be
>>>>> able to either construct the key in NiFi or write a processor to
>> construct the key.
>>>>> Ultimately though, what Carolyn said is true..the easiest approach is
>>>>> probably using the flatfile loader.
>>>>> If you do get this working in NiFi, however, do please let us know
>>>>> and/or consider contributing it back to the project as a PR :)
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>>>>> Charles.Joynt@gresearch.co.uk>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I work as a Dev/Ops Data Engineer within the security team at a
>>>>>> company in London where we are in the process of implementing
Metron.
>>>>>> I have been tasked with implementing feeds of network environment
>>>>>> data into HBase so that this data can be used as enrichment sources
>>>>>> for our
>>>>> security events.
>>>>>> First-off I wanted to pull in DNS data for an internal domain.
>>>>>>
>>>>>> I am assuming that I need to write data into HBase in such a way
>>>>>> that it exactly matches what I would get from the
>>>>>> flatfile_loader.sh script. A colleague of mine has already loaded
>>>>>> some DNS data using that script, so I am using that as a reference.
>>>>>>
>>>>>> I have implemented a flow in NiFi which takes JSON data from a HTTP
>>>>>> listener and routes it to a PutHBaseJSON processor. The flow is
>>>>>> working, in the sense that data is successfully written to HBase,
>>>>>> but despite (naively) specifying "Row Identifier Encoding Strategy
>>>>>> = Binary", the results in HBase don't look correct. Comparing the
>>>>>> output from HBase scan commands I
>>>>>> see:
>>>>>>
>>>>>> flatfile_loader.sh produced:
>>>>>>
>>>>>> ROW:
>>>>>> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
>>>>>> 00\
>>>>>> x0E192.168.0.198
>>>>>> CELL: column=data:v, timestamp=1516896203840,
>>>>>> value={"clientname":"server.domain.local","clientip":"192.168.0.198
>>>>>> "}
>>>>>>
>>>>>> PutHBaseJSON produced:
>>>>>>
>>>>>> ROW: server.domain.local
>>>>>> CELL: column=dns:v, timestamp=1527778603783,
>>>>>> value={"name":"server.domain.local","type":"A","data":"192.168.0.19
>>>>>> 8"}
>>>>>>
>>>>>> From source JSON:
>>>>>>
>>>>>>
>>>>>> {"k":"server.domain.local","v":{"name":"server.domain.local","type"
>>>>>> :"A
>>>>>> ","data":"192.168.0.198"}}
>>>>>>
>>>>>> I know that there are some differences in column family / field
>>>>>> names, but my worry is the ROW id. Presumably I need to encode my
>>>>>> row key, "k" in the JSON data, in a way that matches how the
>>>>>> flatfile_loader.sh
>>>>> script did it.
>>>>>>
>>>>>> Can anyone explain how I might convert my Id to the correct format?
>>>>>> -or-
>>>>>> Does this matter-can Metron use the human-readable ROW ids?
>>>>>>
>>>>>> Charlie Joynt
>>>>>>
>>>>>> --------------
>>>>>> G-RESEARCH believes the information provided herein is reliable.
>>>>>> While every care has been taken to ensure accuracy, the information
>>>>>> is furnished to the recipients with no warranty as to the
>>>>>> completeness and accuracy of its contents and on condition that any
>>>>>> errors or omissions shall not be made the basis of any claim,
>>>>>> demand or cause of
>>>>> action.
>>>>>> The information in this email is intended only for the named
>> recipient.
>>>>>> If you are not the intended recipient please notify us immediately
>>>>>> and do not copy, distribute or take action based on this e-mail.
>>>>>> All messages sent to and from this e-mail address will be logged by
>>>>>> G-RESEARCH and are subject to archival storage, monitoring, review
>>>>>> and disclosure.
>>>>>> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
>>>>>> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>>>>>> Trenchant Limited is a company registered in England with company
>>>>>> number 08127121.
>>>>>> --------------
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> simon elliston ball
>>>> @sireb
>>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

That’s where something like the Nifi solution would come in... 

With the PutEnrichment processor and a ProcessHttpRequest processor, you do have a web service for loading enrichments.

We could probably also create a rest service end point for it, which would make some sense, but there is a nice multi-source, queuing, and lineage element to the nifi solution.

Simon 

> On 13 Jun 2018, at 15:04, Casey Stella <ce...@gmail.com> wrote:
> 
> no, sadly we do not.
> 
>> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com> wrote:
>> 
>> Agreed….Streaming enrichments is the right solution for DNS data.
>> 
>> Do we have a web service for writing enrichments?
>> 
>> Carolyn Duby
>> Solutions Engineer, Northeast
>> cduby@hortonworks.com
>> +1.508.965.0584
>> 
>> Join my team!
>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
>> Solutions Engineer – Boston - http://grnh.se/8gbxy41
>> Need Answers? Try https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 6/13/18, 6:25 AM, "Charles Joynt" <Ch...@gresearch.co.uk>
>> wrote:
>> 
>>> Regarding why I didn't choose to load data with the flatfile loader
>> script...
>>> 
>>> I want to be able to SEND enrichment data to Metron rather than have to
>> set up cron jobs to PULL data. At the moment I'm trying to prove that the
>> process works with a simple data source. In the future we will want
>> enrichment data in Metron that comes from systems (e.g. HR databases) that
>> I won't have access to, hence will need someone to be able to send us the
>> data.
>>> 
>>>> Carolyn: just call the flat file loader from a script processor...
>>> 
>>> I didn't believe that would work in my environment. I'm pretty sure the
>> script has dependencies on various Metron JARs, not least for the row id
>> hashing algorithm. I suppose this would require at least a partial install
>> of Metron alongside NiFi, and would introduce additional work on the NiFi
>> cluster for any Metron upgrade. In some (enterprise) environments there
>> might be separation of ownership between NiFi and Metron.
>>> 
>>> I also prefer not to have a Java app calling a bash script which calls a
>> new java process, with logs or error output that might just get swallowed
>> up invisibly. Somewhere down the line this could hold up effective
>> troubleshooting.
>>> 
>>>> Simon: I have actually written a stellar processor, which applies
>> stellar to all FlowFile attributes...
>>> 
>>> Gulp.
>>> 
>>>> Simon: what didn't you like about the flatfile loader script?
>>> 
>>> The flatfile loader script has worked fine for me when prepping
>> enrichment data in test systems, however it was a bit of a chore to get the
>> JSON configuration files set up, especially for "wide" data sources that
>> may have 15-20 fields, e.g. Active Directory.
>>> 
>>> More broadly speaking, I want to embrace the streaming data paradigm and
>> tried to avoid batch jobs. With the DNS example, you might imagine a future
>> where the enrichment data is streamed based on DHCP registrations, DNS
>> update events, etc. In principle this could reduce the window of time where
>> we might enrich a data source with out-of-date data.
>>> 
>>> Charlie
>>> 
>>> -----Original Message-----
>>> From: Carolyn Duby [mailto:cduby@hortonworks.com]
>>> Sent: 12 June 2018 20:33
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>>> 
>>> I like the streaming enrichment solutions but it depends on how you are
>> getting the data in.  If you get the data in a csv file just call the flat
>> file loader from a script processor.  No special Nifi required.
>>> 
>>> If the enrichments don’t arrive in bulk, the streaming solution is better.
>>> 
>>> Thanks
>>> Carolyn Duby
>>> Solutions Engineer, Northeast
>>> cduby@hortonworks.com
>>> +1.508.965.0584
>>> 
>>> Join my team!
>>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
>> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
>> https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>> 
>>> 
>>> On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com>
>> wrote:
>>> 
>>>> Good solution. The streaming enrichment writer makes a lot of sense for
>>>> this, especially if you're not using huge enrichment sources that need
>>>> the batch based loaders.
>>>> 
>>>> As it happens I have written most of a NiFi processor to handle this
>>>> use case directly - both non-record and Record based, especially for
>> Otto :).
>>>> The one thing we need to figure out now is where to host that, and how
>>>> to handle releases of a nifi-metron-bundle. I'll probably get round to
>>>> putting the code in my github at least in the next few days, while we
>>>> figure out a more permanent home.
>>>> 
>>>> Charlie, out of curiosity, what didn't you like about the flatfile
>>>> loader script?
>>>> 
>>>> Simon
>>>> 
>>>> On 12 June 2018 at 18:00, Charles Joynt <Ch...@gresearch.co.uk>
>>>> wrote:
>>>> 
>>>>> Thanks for the responses. I appreciate the willingness to look at
>>>>> creating a NiFi processer. That would be great!
>>>>> 
>>>>> Just to follow up on this (after a week looking after the "ops" side
>>>>> of
>>>>> dev-ops): I really don't want to have to use the flatfile loader
>>>>> script, and I'm not going to be able to write a Metron-style HBase
>>>>> key generator any time soon, but I have had some success with a
>> different approach.
>>>>> 
>>>>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>>>>> 192.168.0.198"
>>>>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>>>> 
>>>>> I then followed your instructions in this blog:
>>>>> https://cwiki.apache.org/confluence/display/METRON/
>>>>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>>>>> ent
>>>>> 
>>>>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
>>>>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
>>>>> into HBase:
>>>>> 
>>>>> {
>>>>>        "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>>>>        "writerClassName": "org.apache.metron.enrichment.writer.
>>>>> SimpleHbaseEnrichmentWriter",
>>>>>        "sensorTopic": "dns",
>>>>>        "parserConfig": {
>>>>>                "shew.table": " dns",
>>>>>                "shew.cf": "dns",
>>>>>                "shew.keyColumns": "name",
>>>>>                "shew.enrichmentType": "dns",
>>>>>                "columns": {
>>>>>                        "name": 0,
>>>>>                        "type": 1,
>>>>>                        "data": 2
>>>>>                }
>>>>>        },
>>>>> }
>>>>> 
>>>>> And... it seems to be working. At least, I have data in HBase which
>>>>> looks more like the output of the flatfile loader.
>>>>> 
>>>>> Charlie
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Casey Stella [mailto:cestella@gmail.com]
>>>>> Sent: 05 June 2018 14:56
>>>>> To: dev@metron.apache.org
>>>>> Subject: Re: Writing enrichment data directly from NiFi with
>>>>> PutHBaseJSON
>>>>> 
>>>>> The problem, as you correctly diagnosed, is the key in HBase.  We
>>>>> construct the key very specifically in Metron, so it's unlikely to
>>>>> work out of the box with the NiFi processor unfortunately.  The key
>>>>> that we use is formed here in the codebase:
>>>>> https://github.com/cestella/incubator-metron/blob/master/
>>>>> metron-platform/metron-enrichment/src/main/java/org/
>>>>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>>>> 
>>>>> To put that in english, consider the following:
>>>>> 
>>>>>   - type - The enrichment type
>>>>>   - indicator - the indicator to use
>>>>>   - hash(*) - A murmur 3 128bit hash function
>>>>> 
>>>>> the key is hash(indicator) + type + indicator
>>>>> 
>>>>> This hash prefixing is a standard practice in hbase key design that
>>>>> allows the keys to be uniformly distributed among the regions and
>>>>> prevents hotspotting.  Depending on how the PutHBaseJSON processor
>>>>> works, if you can construct the key and pass it in, then you might be
>>>>> able to either construct the key in NiFi or write a processor to
>> construct the key.
>>>>> Ultimately though, what Carolyn said is true..the easiest approach is
>>>>> probably using the flatfile loader.
>>>>> If you do get this working in NiFi, however, do please let us know
>>>>> and/or consider contributing it back to the project as a PR :)
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>>>>> Charles.Joynt@gresearch.co.uk>
>>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I work as a Dev/Ops Data Engineer within the security team at a
>>>>>> company in London where we are in the process of implementing Metron.
>>>>>> I have been tasked with implementing feeds of network environment
>>>>>> data into HBase so that this data can be used as enrichment sources
>>>>>> for our
>>>>> security events.
>>>>>> First-off I wanted to pull in DNS data for an internal domain.
>>>>>> 
>>>>>> I am assuming that I need to write data into HBase in such a way
>>>>>> that it exactly matches what I would get from the
>>>>>> flatfile_loader.sh script. A colleague of mine has already loaded
>>>>>> some DNS data using that script, so I am using that as a reference.
>>>>>> 
>>>>>> I have implemented a flow in NiFi which takes JSON data from a HTTP
>>>>>> listener and routes it to a PutHBaseJSON processor. The flow is
>>>>>> working, in the sense that data is successfully written to HBase,
>>>>>> but despite (naively) specifying "Row Identifier Encoding Strategy
>>>>>> = Binary", the results in HBase don't look correct. Comparing the
>>>>>> output from HBase scan commands I
>>>>>> see:
>>>>>> 
>>>>>> flatfile_loader.sh produced:
>>>>>> 
>>>>>> ROW:
>>>>>> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
>>>>>> 00\
>>>>>> x0E192.168.0.198
>>>>>> CELL: column=data:v, timestamp=1516896203840,
>>>>>> value={"clientname":"server.domain.local","clientip":"192.168.0.198
>>>>>> "}
>>>>>> 
>>>>>> PutHBaseJSON produced:
>>>>>> 
>>>>>> ROW:  server.domain.local
>>>>>> CELL: column=dns:v, timestamp=1527778603783,
>>>>>> value={"name":"server.domain.local","type":"A","data":"192.168.0.19
>>>>>> 8"}
>>>>>> 
>>>>>> From source JSON:
>>>>>> 
>>>>>> 
>>>>>> {"k":"server.domain.local","v":{"name":"server.domain.local","type"
>>>>>> :"A
>>>>>> ","data":"192.168.0.198"}}
>>>>>> 
>>>>>> I know that there are some differences in column family / field
>>>>>> names, but my worry is the ROW id. Presumably I need to encode my
>>>>>> row key, "k" in the JSON data, in a way that matches how the
>>>>>> flatfile_loader.sh
>>>>> script did it.
>>>>>> 
>>>>>> Can anyone explain how I might convert my Id to the correct format?
>>>>>> -or-
>>>>>> Does this matter-can Metron use the human-readable ROW ids?
>>>>>> 
>>>>>> Charlie Joynt
>>>>>> 
>>>>>> --------------
>>>>>> G-RESEARCH believes the information provided herein is reliable.
>>>>>> While every care has been taken to ensure accuracy, the information
>>>>>> is furnished to the recipients with no warranty as to the
>>>>>> completeness and accuracy of its contents and on condition that any
>>>>>> errors or omissions shall not be made the basis of any claim,
>>>>>> demand or cause of
>>>>> action.
>>>>>> The information in this email is intended only for the named
>> recipient.
>>>>>> If you are not the intended recipient please notify us immediately
>>>>>> and do not copy, distribute or take action based on this e-mail.
>>>>>> All messages sent to and from this e-mail address will be logged by
>>>>>> G-RESEARCH and are subject to archival storage, monitoring, review
>>>>>> and disclosure.
>>>>>> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
>>>>>> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>>>>>> Trenchant Limited is a company registered in England with company
>>>>>> number 08127121.
>>>>>> --------------
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> --
>>>> simon elliston ball
>>>> @sireb
>>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Not convinced we should be writing Jiras against the metron project, or the
nifi project if we don't know where it's actually going to end up to be
honest. In any case, working code:
https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle
which is currently in a metron fork, for no particular reason. Also, it
needs proper tests, docs and all that jazz, but PoC grade, it works,
scales, and is moderately robust as long as hbase doesn't fall over too
much.

Simon

On 13 June 2018 at 15:24, Otto Fowler <ot...@gmail.com> wrote:

> Do we even have a jira?  If not maybe Carolyn et. al. can write one up that
> lays out some
> requirements and context.
>
>
> On June 13, 2018 at 10:04:27, Casey Stella (cestella@gmail.com) wrote:
>
> no, sadly we do not.
>
> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com>
> wrote:
>
> > Agreed….Streaming enrichments is the right solution for DNS data.
> >
> > Do we have a web service for writing enrichments?
> >
> > Carolyn Duby
> > Solutions Engineer, Northeast
> > cduby@hortonworks.com
> > +1.508.965.0584
> >
> > Join my team!
> > Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> > Solutions Engineer – Boston - http://grnh.se/8gbxy41
> > Need Answers? Try https://community.hortonworks.com <
> > https://community.hortonworks.com/answers/index.html>
> >
> >
> >
> >
> >
> >
> >
> >
> > On 6/13/18, 6:25 AM, "Charles Joynt" <Ch...@gresearch.co.uk>
> > wrote:
> >
> > >Regarding why I didn't choose to load data with the flatfile loader
> > script...
> > >
> > >I want to be able to SEND enrichment data to Metron rather than have to
> > set up cron jobs to PULL data. At the moment I'm trying to prove that the
> > process works with a simple data source. In the future we will want
> > enrichment data in Metron that comes from systems (e.g. HR databases)
> that
> > I won't have access to, hence will need someone to be able to send us the
> > data.
> > >
> > >> Carolyn: just call the flat file loader from a script processor...
> > >
> > >I didn't believe that would work in my environment. I'm pretty sure the
> > script has dependencies on various Metron JARs, not least for the row id
> > hashing algorithm. I suppose this would require at least a partial
> install
> > of Metron alongside NiFi, and would introduce additional work on the NiFi
> > cluster for any Metron upgrade. In some (enterprise) environments there
> > might be separation of ownership between NiFi and Metron.
> > >
> > >I also prefer not to have a Java app calling a bash script which calls a
> > new java process, with logs or error output that might just get swallowed
> > up invisibly. Somewhere down the line this could hold up effective
> > troubleshooting.
> > >
> > >> Simon: I have actually written a stellar processor, which applies
> > stellar to all FlowFile attributes...
> > >
> > >Gulp.
> > >
> > >> Simon: what didn't you like about the flatfile loader script?
> > >
> > >The flatfile loader script has worked fine for me when prepping
> > enrichment data in test systems, however it was a bit of a chore to get
> the
> > JSON configuration files set up, especially for "wide" data sources that
> > may have 15-20 fields, e.g. Active Directory.
> > >
> > >More broadly speaking, I want to embrace the streaming data paradigm and
> > tried to avoid batch jobs. With the DNS example, you might imagine a
> future
> > where the enrichment data is streamed based on DHCP registrations, DNS
> > update events, etc. In principle this could reduce the window of time
> where
> > we might enrich a data source with out-of-date data.
> > >
> > >Charlie
> > >
> > >-----Original Message-----
> > >From: Carolyn Duby [mailto:cduby@hortonworks.com]
> > >Sent: 12 June 2018 20:33
> > >To: dev@metron.apache.org
> > >Subject: Re: Writing enrichment data directly from NiFi with
> PutHBaseJSON
> > >
> > >I like the streaming enrichment solutions but it depends on how you are
> > getting the data in. If you get the data in a csv file just call the flat
> > file loader from a script processor. No special Nifi required.
> > >
> > >If the enrichments don’t arrive in bulk, the streaming solution is
> better.
> > >
> > >Thanks
> > >Carolyn Duby
> > >Solutions Engineer, Northeast
> > >cduby@hortonworks.com
> > >+1.508.965.0584
> > >
> > >Join my team!
> > >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
> > Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
> > https://community.hortonworks.com <
> > https://community.hortonworks.com/answers/index.html>
> > >
> > >
> > >On 6/12/18, 1:08 PM, "Simon Elliston Ball" <simon@simonellistonball.com
> >
> > wrote:
> > >
> > >>Good solution. The streaming enrichment writer makes a lot of sense for
> > >>this, especially if you're not using huge enrichment sources that need
> > >>the batch based loaders.
> > >>
> > >>As it happens I have written most of a NiFi processor to handle this
> > >>use case directly - both non-record and Record based, especially for
> > Otto :).
> > >>The one thing we need to figure out now is where to host that, and how
> > >>to handle releases of a nifi-metron-bundle. I'll probably get round to
> > >>putting the code in my github at least in the next few days, while we
> > >>figure out a more permanent home.
> > >>
> > >>Charlie, out of curiosity, what didn't you like about the flatfile
> > >>loader script?
> > >>
> > >>Simon
> > >>
> > >>On 12 June 2018 at 18:00, Charles Joynt <Charles.Joynt@gresearch.co.uk
> >
> > >>wrote:
> > >>
> > >>> Thanks for the responses. I appreciate the willingness to look at
> > >>> creating a NiFi processer. That would be great!
> > >>>
> > >>> Just to follow up on this (after a week looking after the "ops" side
> > >>> of
> > >>> dev-ops): I really don't want to have to use the flatfile loader
> > >>> script, and I'm not going to be able to write a Metron-style HBase
> > >>> key generator any time soon, but I have had some success with a
> > different approach.
> > >>>
> > >>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> > >>> 192.168.0.198"
> > >>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
> > >>>
> > >>> I then followed your instructions in this blog:
> > >>> https://cwiki.apache.org/confluence/display/METRON/
> > >>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+
> Streaming+Enrichm
> > >>> ent
> > >>>
> > >>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
> > >>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
> > >>> into HBase:
> > >>>
> > >>> {
> > >>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
> > >>> "writerClassName": "org.apache.metron.enrichment.writer.
> > >>> SimpleHbaseEnrichmentWriter",
> > >>> "sensorTopic": "dns",
> > >>> "parserConfig": {
> > >>> "shew.table": " dns",
> > >>> "shew.cf": "dns",
> > >>> "shew.keyColumns": "name",
> > >>> "shew.enrichmentType": "dns",
> > >>> "columns": {
> > >>> "name": 0,
> > >>> "type": 1,
> > >>> "data": 2
> > >>> }
> > >>> },
> > >>> }
> > >>>
> > >>> And... it seems to be working. At least, I have data in HBase which
> > >>> looks more like the output of the flatfile loader.
> > >>>
> > >>> Charlie
> > >>>
> > >>> -----Original Message-----
> > >>> From: Casey Stella [mailto:cestella@gmail.com]
> > >>> Sent: 05 June 2018 14:56
> > >>> To: dev@metron.apache.org
> > >>> Subject: Re: Writing enrichment data directly from NiFi with
> > >>> PutHBaseJSON
> > >>>
> > >>> The problem, as you correctly diagnosed, is the key in HBase. We
> > >>> construct the key very specifically in Metron, so it's unlikely to
> > >>> work out of the box with the NiFi processor unfortunately. The key
> > >>> that we use is formed here in the codebase:
> > >>> https://github.com/cestella/incubator-metron/blob/master/
> > >>> metron-platform/metron-enrichment/src/main/java/org/
> > >>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > >>>
> > >>> To put that in english, consider the following:
> > >>>
> > >>> - type - The enrichment type
> > >>> - indicator - the indicator to use
> > >>> - hash(*) - A murmur 3 128bit hash function
> > >>>
> > >>> the key is hash(indicator) + type + indicator
> > >>>
> > >>> This hash prefixing is a standard practice in hbase key design that
> > >>> allows the keys to be uniformly distributed among the regions and
> > >>> prevents hotspotting. Depending on how the PutHBaseJSON processor
> > >>> works, if you can construct the key and pass it in, then you might be
> > >>> able to either construct the key in NiFi or write a processor to
> > construct the key.
> > >>> Ultimately though, what Carolyn said is true..the easiest approach is
> > >>> probably using the flatfile loader.
> > >>> If you do get this working in NiFi, however, do please let us know
> > >>> and/or consider contributing it back to the project as a PR :)
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > >>> Charles.Joynt@gresearch.co.uk>
> > >>> wrote:
> > >>>
> > >>> > Hello,
> > >>> >
> > >>> > I work as a Dev/Ops Data Engineer within the security team at a
> > >>> > company in London where we are in the process of implementing
> Metron.
> > >>> > I have been tasked with implementing feeds of network environment
> > >>> > data into HBase so that this data can be used as enrichment sources
> > >>> > for our
> > >>> security events.
> > >>> > First-off I wanted to pull in DNS data for an internal domain.
> > >>> >
> > >>> > I am assuming that I need to write data into HBase in such a way
> > >>> > that it exactly matches what I would get from the
> > >>> > flatfile_loader.sh script. A colleague of mine has already loaded
> > >>> > some DNS data using that script, so I am using that as a reference.
> > >>> >
> > >>> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > >>> > listener and routes it to a PutHBaseJSON processor. The flow is
> > >>> > working, in the sense that data is successfully written to HBase,
> > >>> > but despite (naively) specifying "Row Identifier Encoding Strategy
> > >>> > = Binary", the results in HBase don't look correct. Comparing the
> > >>> > output from HBase scan commands I
> > >>> > see:
> > >>> >
> > >>> > flatfile_loader.sh produced:
> > >>> >
> > >>> > ROW:
> > >>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> x05whois\x
> > >>> > 00\
> > >>> > x0E192.168.0.198
> > >>> > CELL: column=data:v, timestamp=1516896203840,
> > >>> > value={"clientname":"server.domain.local","clientip":"192.
> 168.0.198
> > >>> > "}
> > >>> >
> > >>> > PutHBaseJSON produced:
> > >>> >
> > >>> > ROW: server.domain.local
> > >>> > CELL: column=dns:v, timestamp=1527778603783,
> > >>> > value={"name":"server.domain.local","type":"A","data":"192.
> 168.0.19
> > >>> > 8"}
> > >>> >
> > >>> > From source JSON:
> > >>> >
> > >>> >
> > >>> > {"k":"server.domain.local","v":{"name":"server.domain.local"
> ,"type"
> > >>> > :"A
> > >>> > ","data":"192.168.0.198"}}
> > >>> >
> > >>> > I know that there are some differences in column family / field
> > >>> > names, but my worry is the ROW id. Presumably I need to encode my
> > >>> > row key, "k" in the JSON data, in a way that matches how the
> > >>> > flatfile_loader.sh
> > >>> script did it.
> > >>> >
> > >>> > Can anyone explain how I might convert my Id to the correct format?
> > >>> > -or-
> > >>> > Does this matter-can Metron use the human-readable ROW ids?
> > >>> >
> > >>> > Charlie Joynt
> > >>> >
> > >>> > --------------
> > >>> > G-RESEARCH believes the information provided herein is reliable.
> > >>> > While every care has been taken to ensure accuracy, the information
> > >>> > is furnished to the recipients with no warranty as to the
> > >>> > completeness and accuracy of its contents and on condition that any
> > >>> > errors or omissions shall not be made the basis of any claim,
> > >>> > demand or cause of
> > >>> action.
> > >>> > The information in this email is intended only for the named
> > recipient.
> > >>> > If you are not the intended recipient please notify us immediately
> > >>> > and do not copy, distribute or take action based on this e-mail.
> > >>> > All messages sent to and from this e-mail address will be logged by
> > >>> > G-RESEARCH and are subject to archival storage, monitoring, review
> > >>> > and disclosure.
> > >>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > >>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > >>> > Trenchant Limited is a company registered in England with company
> > >>> > number 08127121.
> > >>> > --------------
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >>--
> > >>--
> > >>simon elliston ball
> > >>@sireb
> >
>



-- 
--
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Otto Fowler <ot...@gmail.com>.

Do we even have a jira?  If not maybe Carolyn et. al. can write one up that
lays out some
requirements and context.


On June 13, 2018 at 10:04:27, Casey Stella (cestella@gmail.com) wrote:

no, sadly we do not.

On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com>
wrote:

> Agreed….Streaming enrichments is the right solution for DNS data.
>
> Do we have a web service for writing enrichments?
>
> Carolyn Duby
> Solutions Engineer, Northeast
> cduby@hortonworks.com
> +1.508.965.0584
>
> Join my team!
> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> Solutions Engineer – Boston - http://grnh.se/8gbxy41
> Need Answers? Try https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
>
>
>
>
>
>
>
>
> On 6/13/18, 6:25 AM, "Charles Joynt" <Ch...@gresearch.co.uk>
> wrote:
>
> >Regarding why I didn't choose to load data with the flatfile loader
> script...
> >
> >I want to be able to SEND enrichment data to Metron rather than have to
> set up cron jobs to PULL data. At the moment I'm trying to prove that the
> process works with a simple data source. In the future we will want
> enrichment data in Metron that comes from systems (e.g. HR databases)
that
> I won't have access to, hence will need someone to be able to send us the
> data.
> >
> >> Carolyn: just call the flat file loader from a script processor...
> >
> >I didn't believe that would work in my environment. I'm pretty sure the
> script has dependencies on various Metron JARs, not least for the row id
> hashing algorithm. I suppose this would require at least a partial
install
> of Metron alongside NiFi, and would introduce additional work on the NiFi
> cluster for any Metron upgrade. In some (enterprise) environments there
> might be separation of ownership between NiFi and Metron.
> >
> >I also prefer not to have a Java app calling a bash script which calls a
> new java process, with logs or error output that might just get swallowed
> up invisibly. Somewhere down the line this could hold up effective
> troubleshooting.
> >
> >> Simon: I have actually written a stellar processor, which applies
> stellar to all FlowFile attributes...
> >
> >Gulp.
> >
> >> Simon: what didn't you like about the flatfile loader script?
> >
> >The flatfile loader script has worked fine for me when prepping
> enrichment data in test systems, however it was a bit of a chore to get
the
> JSON configuration files set up, especially for "wide" data sources that
> may have 15-20 fields, e.g. Active Directory.
> >
> >More broadly speaking, I want to embrace the streaming data paradigm and
> tried to avoid batch jobs. With the DNS example, you might imagine a
future
> where the enrichment data is streamed based on DHCP registrations, DNS
> update events, etc. In principle this could reduce the window of time
where
> we might enrich a data source with out-of-date data.
> >
> >Charlie
> >
> >-----Original Message-----
> >From: Carolyn Duby [mailto:cduby@hortonworks.com]
> >Sent: 12 June 2018 20:33
> >To: dev@metron.apache.org
> >Subject: Re: Writing enrichment data directly from NiFi with
PutHBaseJSON
> >
> >I like the streaming enrichment solutions but it depends on how you are
> getting the data in. If you get the data in a csv file just call the flat
> file loader from a script processor. No special Nifi required.
> >
> >If the enrichments don’t arrive in bulk, the streaming solution is
better.
> >
> >Thanks
> >Carolyn Duby
> >Solutions Engineer, Northeast
> >cduby@hortonworks.com
> >+1.508.965.0584
> >
> >Join my team!
> >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
> https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
> >
> >
> >On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com>
> wrote:
> >
> >>Good solution. The streaming enrichment writer makes a lot of sense for
> >>this, especially if you're not using huge enrichment sources that need
> >>the batch based loaders.
> >>
> >>As it happens I have written most of a NiFi processor to handle this
> >>use case directly - both non-record and Record based, especially for
> Otto :).
> >>The one thing we need to figure out now is where to host that, and how
> >>to handle releases of a nifi-metron-bundle. I'll probably get round to
> >>putting the code in my github at least in the next few days, while we
> >>figure out a more permanent home.
> >>
> >>Charlie, out of curiosity, what didn't you like about the flatfile
> >>loader script?
> >>
> >>Simon
> >>
> >>On 12 June 2018 at 18:00, Charles Joynt <Ch...@gresearch.co.uk>
> >>wrote:
> >>
> >>> Thanks for the responses. I appreciate the willingness to look at
> >>> creating a NiFi processer. That would be great!
> >>>
> >>> Just to follow up on this (after a week looking after the "ops" side
> >>> of
> >>> dev-ops): I really don't want to have to use the flatfile loader
> >>> script, and I'm not going to be able to write a Metron-style HBase
> >>> key generator any time soon, but I have had some success with a
> different approach.
> >>>
> >>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> >>> 192.168.0.198"
> >>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
> >>>
> >>> I then followed your instructions in this blog:
> >>> https://cwiki.apache.org/confluence/display/METRON/
> >>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
> >>> ent
> >>>
> >>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
> >>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
> >>> into HBase:
> >>>
> >>> {
> >>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
> >>> "writerClassName": "org.apache.metron.enrichment.writer.
> >>> SimpleHbaseEnrichmentWriter",
> >>> "sensorTopic": "dns",
> >>> "parserConfig": {
> >>> "shew.table": " dns",
> >>> "shew.cf": "dns",
> >>> "shew.keyColumns": "name",
> >>> "shew.enrichmentType": "dns",
> >>> "columns": {
> >>> "name": 0,
> >>> "type": 1,
> >>> "data": 2
> >>> }
> >>> },
> >>> }
> >>>
> >>> And... it seems to be working. At least, I have data in HBase which
> >>> looks more like the output of the flatfile loader.
> >>>
> >>> Charlie
> >>>
> >>> -----Original Message-----
> >>> From: Casey Stella [mailto:cestella@gmail.com]
> >>> Sent: 05 June 2018 14:56
> >>> To: dev@metron.apache.org
> >>> Subject: Re: Writing enrichment data directly from NiFi with
> >>> PutHBaseJSON
> >>>
> >>> The problem, as you correctly diagnosed, is the key in HBase. We
> >>> construct the key very specifically in Metron, so it's unlikely to
> >>> work out of the box with the NiFi processor unfortunately. The key
> >>> that we use is formed here in the codebase:
> >>> https://github.com/cestella/incubator-metron/blob/master/
> >>> metron-platform/metron-enrichment/src/main/java/org/
> >>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >>>
> >>> To put that in english, consider the following:
> >>>
> >>> - type - The enrichment type
> >>> - indicator - the indicator to use
> >>> - hash(*) - A murmur 3 128bit hash function
> >>>
> >>> the key is hash(indicator) + type + indicator
> >>>
> >>> This hash prefixing is a standard practice in hbase key design that
> >>> allows the keys to be uniformly distributed among the regions and
> >>> prevents hotspotting. Depending on how the PutHBaseJSON processor
> >>> works, if you can construct the key and pass it in, then you might be
> >>> able to either construct the key in NiFi or write a processor to
> construct the key.
> >>> Ultimately though, what Carolyn said is true..the easiest approach is
> >>> probably using the flatfile loader.
> >>> If you do get this working in NiFi, however, do please let us know
> >>> and/or consider contributing it back to the project as a PR :)
> >>>
> >>>
> >>>
> >>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> >>> Charles.Joynt@gresearch.co.uk>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> > I work as a Dev/Ops Data Engineer within the security team at a
> >>> > company in London where we are in the process of implementing
Metron.
> >>> > I have been tasked with implementing feeds of network environment
> >>> > data into HBase so that this data can be used as enrichment sources
> >>> > for our
> >>> security events.
> >>> > First-off I wanted to pull in DNS data for an internal domain.
> >>> >
> >>> > I am assuming that I need to write data into HBase in such a way
> >>> > that it exactly matches what I would get from the
> >>> > flatfile_loader.sh script. A colleague of mine has already loaded
> >>> > some DNS data using that script, so I am using that as a reference.
> >>> >
> >>> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> >>> > listener and routes it to a PutHBaseJSON processor. The flow is
> >>> > working, in the sense that data is successfully written to HBase,
> >>> > but despite (naively) specifying "Row Identifier Encoding Strategy
> >>> > = Binary", the results in HBase don't look correct. Comparing the
> >>> > output from HBase scan commands I
> >>> > see:
> >>> >
> >>> > flatfile_loader.sh produced:
> >>> >
> >>> > ROW:
> >>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
> >>> > 00\
> >>> > x0E192.168.0.198
> >>> > CELL: column=data:v, timestamp=1516896203840,
> >>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198
> >>> > "}
> >>> >
> >>> > PutHBaseJSON produced:
> >>> >
> >>> > ROW: server.domain.local
> >>> > CELL: column=dns:v, timestamp=1527778603783,
> >>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19
> >>> > 8"}
> >>> >
> >>> > From source JSON:
> >>> >
> >>> >
> >>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type"
> >>> > :"A
> >>> > ","data":"192.168.0.198"}}
> >>> >
> >>> > I know that there are some differences in column family / field
> >>> > names, but my worry is the ROW id. Presumably I need to encode my
> >>> > row key, "k" in the JSON data, in a way that matches how the
> >>> > flatfile_loader.sh
> >>> script did it.
> >>> >
> >>> > Can anyone explain how I might convert my Id to the correct format?
> >>> > -or-
> >>> > Does this matter-can Metron use the human-readable ROW ids?
> >>> >
> >>> > Charlie Joynt
> >>> >
> >>> > --------------
> >>> > G-RESEARCH believes the information provided herein is reliable.
> >>> > While every care has been taken to ensure accuracy, the information
> >>> > is furnished to the recipients with no warranty as to the
> >>> > completeness and accuracy of its contents and on condition that any
> >>> > errors or omissions shall not be made the basis of any claim,
> >>> > demand or cause of
> >>> action.
> >>> > The information in this email is intended only for the named
> recipient.
> >>> > If you are not the intended recipient please notify us immediately
> >>> > and do not copy, distribute or take action based on this e-mail.
> >>> > All messages sent to and from this e-mail address will be logged by
> >>> > G-RESEARCH and are subject to archival storage, monitoring, review
> >>> > and disclosure.
> >>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> >>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> >>> > Trenchant Limited is a company registered in England with company
> >>> > number 08127121.
> >>> > --------------
> >>> >
> >>>
> >>
> >>
> >>
> >>--
> >>--
> >>simon elliston ball
> >>@sireb
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Casey Stella <ce...@gmail.com>.

no, sadly we do not.

On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com> wrote:

> Agreed….Streaming enrichments is the right solution for DNS data.
>
> Do we have a web service for writing enrichments?
>
> Carolyn Duby
> Solutions Engineer, Northeast
> cduby@hortonworks.com
> +1.508.965.0584
>
> Join my team!
> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> Solutions Engineer – Boston - http://grnh.se/8gbxy41
> Need Answers? Try https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
>
>
>
>
>
>
>
>
> On 6/13/18, 6:25 AM, "Charles Joynt" <Ch...@gresearch.co.uk>
> wrote:
>
> >Regarding why I didn't choose to load data with the flatfile loader
> script...
> >
> >I want to be able to SEND enrichment data to Metron rather than have to
> set up cron jobs to PULL data. At the moment I'm trying to prove that the
> process works with a simple data source. In the future we will want
> enrichment data in Metron that comes from systems (e.g. HR databases) that
> I won't have access to, hence will need someone to be able to send us the
> data.
> >
> >> Carolyn: just call the flat file loader from a script processor...
> >
> >I didn't believe that would work in my environment. I'm pretty sure the
> script has dependencies on various Metron JARs, not least for the row id
> hashing algorithm. I suppose this would require at least a partial install
> of Metron alongside NiFi, and would introduce additional work on the NiFi
> cluster for any Metron upgrade. In some (enterprise) environments there
> might be separation of ownership between NiFi and Metron.
> >
> >I also prefer not to have a Java app calling a bash script which calls a
> new java process, with logs or error output that might just get swallowed
> up invisibly. Somewhere down the line this could hold up effective
> troubleshooting.
> >
> >> Simon: I have actually written a stellar processor, which applies
> stellar to all FlowFile attributes...
> >
> >Gulp.
> >
> >> Simon: what didn't you like about the flatfile loader script?
> >
> >The flatfile loader script has worked fine for me when prepping
> enrichment data in test systems, however it was a bit of a chore to get the
> JSON configuration files set up, especially for "wide" data sources that
> may have 15-20 fields, e.g. Active Directory.
> >
> >More broadly speaking, I want to embrace the streaming data paradigm and
> tried to avoid batch jobs. With the DNS example, you might imagine a future
> where the enrichment data is streamed based on DHCP registrations, DNS
> update events, etc. In principle this could reduce the window of time where
> we might enrich a data source with out-of-date data.
> >
> >Charlie
> >
> >-----Original Message-----
> >From: Carolyn Duby [mailto:cduby@hortonworks.com]
> >Sent: 12 June 2018 20:33
> >To: dev@metron.apache.org
> >Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
> >
> >I like the streaming enrichment solutions but it depends on how you are
> getting the data in.  If you get the data in a csv file just call the flat
> file loader from a script processor.  No special Nifi required.
> >
> >If the enrichments don’t arrive in bulk, the streaming solution is better.
> >
> >Thanks
> >Carolyn Duby
> >Solutions Engineer, Northeast
> >cduby@hortonworks.com
> >+1.508.965.0584
> >
> >Join my team!
> >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
> https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
> >
> >
> >On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com>
> wrote:
> >
> >>Good solution. The streaming enrichment writer makes a lot of sense for
> >>this, especially if you're not using huge enrichment sources that need
> >>the batch based loaders.
> >>
> >>As it happens I have written most of a NiFi processor to handle this
> >>use case directly - both non-record and Record based, especially for
> Otto :).
> >>The one thing we need to figure out now is where to host that, and how
> >>to handle releases of a nifi-metron-bundle. I'll probably get round to
> >>putting the code in my github at least in the next few days, while we
> >>figure out a more permanent home.
> >>
> >>Charlie, out of curiosity, what didn't you like about the flatfile
> >>loader script?
> >>
> >>Simon
> >>
> >>On 12 June 2018 at 18:00, Charles Joynt <Ch...@gresearch.co.uk>
> >>wrote:
> >>
> >>> Thanks for the responses. I appreciate the willingness to look at
> >>> creating a NiFi processer. That would be great!
> >>>
> >>> Just to follow up on this (after a week looking after the "ops" side
> >>> of
> >>> dev-ops): I really don't want to have to use the flatfile loader
> >>> script, and I'm not going to be able to write a Metron-style HBase
> >>> key generator any time soon, but I have had some success with a
> different approach.
> >>>
> >>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> >>> 192.168.0.198"
> >>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
> >>>
> >>> I then followed your instructions in this blog:
> >>> https://cwiki.apache.org/confluence/display/METRON/
> >>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
> >>> ent
> >>>
> >>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
> >>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
> >>> into HBase:
> >>>
> >>> {
> >>>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
> >>>         "writerClassName": "org.apache.metron.enrichment.writer.
> >>> SimpleHbaseEnrichmentWriter",
> >>>         "sensorTopic": "dns",
> >>>         "parserConfig": {
> >>>                 "shew.table": " dns",
> >>>                 "shew.cf": "dns",
> >>>                 "shew.keyColumns": "name",
> >>>                 "shew.enrichmentType": "dns",
> >>>                 "columns": {
> >>>                         "name": 0,
> >>>                         "type": 1,
> >>>                         "data": 2
> >>>                 }
> >>>         },
> >>> }
> >>>
> >>> And... it seems to be working. At least, I have data in HBase which
> >>> looks more like the output of the flatfile loader.
> >>>
> >>> Charlie
> >>>
> >>> -----Original Message-----
> >>> From: Casey Stella [mailto:cestella@gmail.com]
> >>> Sent: 05 June 2018 14:56
> >>> To: dev@metron.apache.org
> >>> Subject: Re: Writing enrichment data directly from NiFi with
> >>> PutHBaseJSON
> >>>
> >>> The problem, as you correctly diagnosed, is the key in HBase.  We
> >>> construct the key very specifically in Metron, so it's unlikely to
> >>> work out of the box with the NiFi processor unfortunately.  The key
> >>> that we use is formed here in the codebase:
> >>> https://github.com/cestella/incubator-metron/blob/master/
> >>> metron-platform/metron-enrichment/src/main/java/org/
> >>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >>>
> >>> To put that in english, consider the following:
> >>>
> >>>    - type - The enrichment type
> >>>    - indicator - the indicator to use
> >>>    - hash(*) - A murmur 3 128bit hash function
> >>>
> >>> the key is hash(indicator) + type + indicator
> >>>
> >>> This hash prefixing is a standard practice in hbase key design that
> >>> allows the keys to be uniformly distributed among the regions and
> >>> prevents hotspotting.  Depending on how the PutHBaseJSON processor
> >>> works, if you can construct the key and pass it in, then you might be
> >>> able to either construct the key in NiFi or write a processor to
> construct the key.
> >>> Ultimately though, what Carolyn said is true..the easiest approach is
> >>> probably using the flatfile loader.
> >>> If you do get this working in NiFi, however, do please let us know
> >>> and/or consider contributing it back to the project as a PR :)
> >>>
> >>>
> >>>
> >>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> >>> Charles.Joynt@gresearch.co.uk>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> > I work as a Dev/Ops Data Engineer within the security team at a
> >>> > company in London where we are in the process of implementing Metron.
> >>> > I have been tasked with implementing feeds of network environment
> >>> > data into HBase so that this data can be used as enrichment sources
> >>> > for our
> >>> security events.
> >>> > First-off I wanted to pull in DNS data for an internal domain.
> >>> >
> >>> > I am assuming that I need to write data into HBase in such a way
> >>> > that it exactly matches what I would get from the
> >>> > flatfile_loader.sh script. A colleague of mine has already loaded
> >>> > some DNS data using that script, so I am using that as a reference.
> >>> >
> >>> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> >>> > listener and routes it to a PutHBaseJSON processor. The flow is
> >>> > working, in the sense that data is successfully written to HBase,
> >>> > but despite (naively) specifying "Row Identifier Encoding Strategy
> >>> > = Binary", the results in HBase don't look correct. Comparing the
> >>> > output from HBase scan commands I
> >>> > see:
> >>> >
> >>> > flatfile_loader.sh produced:
> >>> >
> >>> > ROW:
> >>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
> >>> > 00\
> >>> > x0E192.168.0.198
> >>> > CELL: column=data:v, timestamp=1516896203840,
> >>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198
> >>> > "}
> >>> >
> >>> > PutHBaseJSON produced:
> >>> >
> >>> > ROW:  server.domain.local
> >>> > CELL: column=dns:v, timestamp=1527778603783,
> >>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19
> >>> > 8"}
> >>> >
> >>> > From source JSON:
> >>> >
> >>> >
> >>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type"
> >>> > :"A
> >>> > ","data":"192.168.0.198"}}
> >>> >
> >>> > I know that there are some differences in column family / field
> >>> > names, but my worry is the ROW id. Presumably I need to encode my
> >>> > row key, "k" in the JSON data, in a way that matches how the
> >>> > flatfile_loader.sh
> >>> script did it.
> >>> >
> >>> > Can anyone explain how I might convert my Id to the correct format?
> >>> > -or-
> >>> > Does this matter-can Metron use the human-readable ROW ids?
> >>> >
> >>> > Charlie Joynt
> >>> >
> >>> > --------------
> >>> > G-RESEARCH believes the information provided herein is reliable.
> >>> > While every care has been taken to ensure accuracy, the information
> >>> > is furnished to the recipients with no warranty as to the
> >>> > completeness and accuracy of its contents and on condition that any
> >>> > errors or omissions shall not be made the basis of any claim,
> >>> > demand or cause of
> >>> action.
> >>> > The information in this email is intended only for the named
> recipient.
> >>> > If you are not the intended recipient please notify us immediately
> >>> > and do not copy, distribute or take action based on this e-mail.
> >>> > All messages sent to and from this e-mail address will be logged by
> >>> > G-RESEARCH and are subject to archival storage, monitoring, review
> >>> > and disclosure.
> >>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> >>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> >>> > Trenchant Limited is a company registered in England with company
> >>> > number 08127121.
> >>> > --------------
> >>> >
> >>>
> >>
> >>
> >>
> >>--
> >>--
> >>simon elliston ball
> >>@sireb
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Carolyn Duby <cd...@hortonworks.com>.

Agreed….Streaming enrichments is the right solution for DNS data.

Do we have a web service for writing enrichments?

Carolyn Duby
Solutions Engineer, Northeast
cduby@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>








On 6/13/18, 6:25 AM, "Charles Joynt" <Ch...@gresearch.co.uk> wrote:

>Regarding why I didn't choose to load data with the flatfile loader script...
>
>I want to be able to SEND enrichment data to Metron rather than have to set up cron jobs to PULL data. At the moment I'm trying to prove that the process works with a simple data source. In the future we will want enrichment data in Metron that comes from systems (e.g. HR databases) that I won't have access to, hence will need someone to be able to send us the data.
>
>> Carolyn: just call the flat file loader from a script processor...
>
>I didn't believe that would work in my environment. I'm pretty sure the script has dependencies on various Metron JARs, not least for the row id hashing algorithm. I suppose this would require at least a partial install of Metron alongside NiFi, and would introduce additional work on the NiFi cluster for any Metron upgrade. In some (enterprise) environments there might be separation of ownership between NiFi and Metron.
>
>I also prefer not to have a Java app calling a bash script which calls a new java process, with logs or error output that might just get swallowed up invisibly. Somewhere down the line this could hold up effective troubleshooting.
>
>> Simon: I have actually written a stellar processor, which applies stellar to all FlowFile attributes...
>
>Gulp.
>
>> Simon: what didn't you like about the flatfile loader script?
>
>The flatfile loader script has worked fine for me when prepping enrichment data in test systems, however it was a bit of a chore to get the JSON configuration files set up, especially for "wide" data sources that may have 15-20 fields, e.g. Active Directory.
>
>More broadly speaking, I want to embrace the streaming data paradigm and tried to avoid batch jobs. With the DNS example, you might imagine a future where the enrichment data is streamed based on DHCP registrations, DNS update events, etc. In principle this could reduce the window of time where we might enrich a data source with out-of-date data.
>
>Charlie
>
>-----Original Message-----
>From: Carolyn Duby [mailto:cduby@hortonworks.com] 
>Sent: 12 June 2018 20:33
>To: dev@metron.apache.org
>Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
>I like the streaming enrichment solutions but it depends on how you are getting the data in.  If you get the data in a csv file just call the flat file loader from a script processor.  No special Nifi required.
>
>If the enrichments don’t arrive in bulk, the streaming solution is better.
>
>Thanks
>Carolyn Duby
>Solutions Engineer, Northeast
>cduby@hortonworks.com
>+1.508.965.0584
>
>Join my team!
>Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>
>
>
>On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com> wrote:
>
>>Good solution. The streaming enrichment writer makes a lot of sense for 
>>this, especially if you're not using huge enrichment sources that need 
>>the batch based loaders.
>>
>>As it happens I have written most of a NiFi processor to handle this 
>>use case directly - both non-record and Record based, especially for Otto :).
>>The one thing we need to figure out now is where to host that, and how 
>>to handle releases of a nifi-metron-bundle. I'll probably get round to 
>>putting the code in my github at least in the next few days, while we 
>>figure out a more permanent home.
>>
>>Charlie, out of curiosity, what didn't you like about the flatfile 
>>loader script?
>>
>>Simon
>>
>>On 12 June 2018 at 18:00, Charles Joynt <Ch...@gresearch.co.uk>
>>wrote:
>>
>>> Thanks for the responses. I appreciate the willingness to look at 
>>> creating a NiFi processer. That would be great!
>>>
>>> Just to follow up on this (after a week looking after the "ops" side 
>>> of
>>> dev-ops): I really don't want to have to use the flatfile loader 
>>> script, and I'm not going to be able to write a Metron-style HBase 
>>> key generator any time soon, but I have had some success with a different approach.
>>>
>>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>>> 192.168.0.198"
>>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>>
>>> I then followed your instructions in this blog:
>>> https://cwiki.apache.org/confluence/display/METRON/
>>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>>> ent
>>>
>>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and 
>>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this 
>>> into HBase:
>>>
>>> {
>>>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>>         "writerClassName": "org.apache.metron.enrichment.writer.
>>> SimpleHbaseEnrichmentWriter",
>>>         "sensorTopic": "dns",
>>>         "parserConfig": {
>>>                 "shew.table": " dns",
>>>                 "shew.cf": "dns",
>>>                 "shew.keyColumns": "name",
>>>                 "shew.enrichmentType": "dns",
>>>                 "columns": {
>>>                         "name": 0,
>>>                         "type": 1,
>>>                         "data": 2
>>>                 }
>>>         },
>>> }
>>>
>>> And... it seems to be working. At least, I have data in HBase which 
>>> looks more like the output of the flatfile loader.
>>>
>>> Charlie
>>>
>>> -----Original Message-----
>>> From: Casey Stella [mailto:cestella@gmail.com]
>>> Sent: 05 June 2018 14:56
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with 
>>> PutHBaseJSON
>>>
>>> The problem, as you correctly diagnosed, is the key in HBase.  We 
>>> construct the key very specifically in Metron, so it's unlikely to 
>>> work out of the box with the NiFi processor unfortunately.  The key 
>>> that we use is formed here in the codebase:
>>> https://github.com/cestella/incubator-metron/blob/master/
>>> metron-platform/metron-enrichment/src/main/java/org/
>>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>>
>>> To put that in english, consider the following:
>>>
>>>    - type - The enrichment type
>>>    - indicator - the indicator to use
>>>    - hash(*) - A murmur 3 128bit hash function
>>>
>>> the key is hash(indicator) + type + indicator
>>>
>>> This hash prefixing is a standard practice in hbase key design that 
>>> allows the keys to be uniformly distributed among the regions and 
>>> prevents hotspotting.  Depending on how the PutHBaseJSON processor 
>>> works, if you can construct the key and pass it in, then you might be 
>>> able to either construct the key in NiFi or write a processor to construct the key.
>>> Ultimately though, what Carolyn said is true..the easiest approach is 
>>> probably using the flatfile loader.
>>> If you do get this working in NiFi, however, do please let us know 
>>> and/or consider contributing it back to the project as a PR :)
>>>
>>>
>>>
>>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < 
>>> Charles.Joynt@gresearch.co.uk>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> > I work as a Dev/Ops Data Engineer within the security team at a 
>>> > company in London where we are in the process of implementing Metron.
>>> > I have been tasked with implementing feeds of network environment 
>>> > data into HBase so that this data can be used as enrichment sources 
>>> > for our
>>> security events.
>>> > First-off I wanted to pull in DNS data for an internal domain.
>>> >
>>> > I am assuming that I need to write data into HBase in such a way 
>>> > that it exactly matches what I would get from the 
>>> > flatfile_loader.sh script. A colleague of mine has already loaded 
>>> > some DNS data using that script, so I am using that as a reference.
>>> >
>>> > I have implemented a flow in NiFi which takes JSON data from a HTTP 
>>> > listener and routes it to a PutHBaseJSON processor. The flow is 
>>> > working, in the sense that data is successfully written to HBase, 
>>> > but despite (naively) specifying "Row Identifier Encoding Strategy 
>>> > = Binary", the results in HBase don't look correct. Comparing the 
>>> > output from HBase scan commands I
>>> > see:
>>> >
>>> > flatfile_loader.sh produced:
>>> >
>>> > ROW:
>>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
>>> > 00\
>>> > x0E192.168.0.198
>>> > CELL: column=data:v, timestamp=1516896203840, 
>>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198
>>> > "}
>>> >
>>> > PutHBaseJSON produced:
>>> >
>>> > ROW:  server.domain.local
>>> > CELL: column=dns:v, timestamp=1527778603783, 
>>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19
>>> > 8"}
>>> >
>>> > From source JSON:
>>> >
>>> >
>>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type"
>>> > :"A
>>> > ","data":"192.168.0.198"}}
>>> >
>>> > I know that there are some differences in column family / field 
>>> > names, but my worry is the ROW id. Presumably I need to encode my 
>>> > row key, "k" in the JSON data, in a way that matches how the 
>>> > flatfile_loader.sh
>>> script did it.
>>> >
>>> > Can anyone explain how I might convert my Id to the correct format?
>>> > -or-
>>> > Does this matter-can Metron use the human-readable ROW ids?
>>> >
>>> > Charlie Joynt
>>> >
>>> > --------------
>>> > G-RESEARCH believes the information provided herein is reliable. 
>>> > While every care has been taken to ensure accuracy, the information 
>>> > is furnished to the recipients with no warranty as to the 
>>> > completeness and accuracy of its contents and on condition that any 
>>> > errors or omissions shall not be made the basis of any claim, 
>>> > demand or cause of
>>> action.
>>> > The information in this email is intended only for the named recipient.
>>> > If you are not the intended recipient please notify us immediately 
>>> > and do not copy, distribute or take action based on this e-mail.
>>> > All messages sent to and from this e-mail address will be logged by 
>>> > G-RESEARCH and are subject to archival storage, monitoring, review 
>>> > and disclosure.
>>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, 
>>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>>> > Trenchant Limited is a company registered in England with company 
>>> > number 08127121.
>>> > --------------
>>> >
>>>
>>
>>
>>
>>--
>>--
>>simon elliston ball
>>@sireb

RE: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Charles Joynt <Ch...@gresearch.co.uk>.

Regarding why I didn't choose to load data with the flatfile loader script...

I want to be able to SEND enrichment data to Metron rather than have to set up cron jobs to PULL data. At the moment I'm trying to prove that the process works with a simple data source. In the future we will want enrichment data in Metron that comes from systems (e.g. HR databases) that I won't have access to, hence will need someone to be able to send us the data.

> Carolyn: just call the flat file loader from a script processor...

I didn't believe that would work in my environment. I'm pretty sure the script has dependencies on various Metron JARs, not least for the row id hashing algorithm. I suppose this would require at least a partial install of Metron alongside NiFi, and would introduce additional work on the NiFi cluster for any Metron upgrade. In some (enterprise) environments there might be separation of ownership between NiFi and Metron.

I also prefer not to have a Java app calling a bash script which calls a new java process, with logs or error output that might just get swallowed up invisibly. Somewhere down the line this could hold up effective troubleshooting.

> Simon: I have actually written a stellar processor, which applies stellar to all FlowFile attributes...

Gulp.

> Simon: what didn't you like about the flatfile loader script?

The flatfile loader script has worked fine for me when prepping enrichment data in test systems, however it was a bit of a chore to get the JSON configuration files set up, especially for "wide" data sources that may have 15-20 fields, e.g. Active Directory.

More broadly speaking, I want to embrace the streaming data paradigm and tried to avoid batch jobs. With the DNS example, you might imagine a future where the enrichment data is streamed based on DHCP registrations, DNS update events, etc. In principle this could reduce the window of time where we might enrich a data source with out-of-date data.

Charlie

-----Original Message-----
From: Carolyn Duby [mailto:cduby@hortonworks.com] 
Sent: 12 June 2018 20:33
To: dev@metron.apache.org
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON

I like the streaming enrichment solutions but it depends on how you are getting the data in.  If you get the data in a csv file just call the flat file loader from a script processor.  No special Nifi required.

If the enrichments don’t arrive in bulk, the streaming solution is better.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cduby@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>


On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com> wrote:

>Good solution. The streaming enrichment writer makes a lot of sense for 
>this, especially if you're not using huge enrichment sources that need 
>the batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this 
>use case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how 
>to handle releases of a nifi-metron-bundle. I'll probably get round to 
>putting the code in my github at least in the next few days, while we 
>figure out a more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile 
>loader script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt <Ch...@gresearch.co.uk>
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at 
>> creating a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side 
>> of
>> dev-ops): I really don't want to have to use the flatfile loader 
>> script, and I'm not going to be able to write a Metron-style HBase 
>> key generator any time soon, but I have had some success with a different approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>> ent
>>
>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and 
>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this 
>> into HBase:
>>
>> {
>>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>         "writerClassName": "org.apache.metron.enrichment.writer.
>> SimpleHbaseEnrichmentWriter",
>>         "sensorTopic": "dns",
>>         "parserConfig": {
>>                 "shew.table": " dns",
>>                 "shew.cf": "dns",
>>                 "shew.keyColumns": "name",
>>                 "shew.enrichmentType": "dns",
>>                 "columns": {
>>                         "name": 0,
>>                         "type": 1,
>>                         "data": 2
>>                 }
>>         },
>> }
>>
>> And... it seems to be working. At least, I have data in HBase which 
>> looks more like the output of the flatfile loader.
>>
>> Charlie
>>
>> -----Original Message-----
>> From: Casey Stella [mailto:cestella@gmail.com]
>> Sent: 05 June 2018 14:56
>> To: dev@metron.apache.org
>> Subject: Re: Writing enrichment data directly from NiFi with 
>> PutHBaseJSON
>>
>> The problem, as you correctly diagnosed, is the key in HBase.  We 
>> construct the key very specifically in Metron, so it's unlikely to 
>> work out of the box with the NiFi processor unfortunately.  The key 
>> that we use is formed here in the codebase:
>> https://github.com/cestella/incubator-metron/blob/master/
>> metron-platform/metron-enrichment/src/main/java/org/
>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>
>> To put that in english, consider the following:
>>
>>    - type - The enrichment type
>>    - indicator - the indicator to use
>>    - hash(*) - A murmur 3 128bit hash function
>>
>> the key is hash(indicator) + type + indicator
>>
>> This hash prefixing is a standard practice in hbase key design that 
>> allows the keys to be uniformly distributed among the regions and 
>> prevents hotspotting.  Depending on how the PutHBaseJSON processor 
>> works, if you can construct the key and pass it in, then you might be 
>> able to either construct the key in NiFi or write a processor to construct the key.
>> Ultimately though, what Carolyn said is true..the easiest approach is 
>> probably using the flatfile loader.
>> If you do get this working in NiFi, however, do please let us know 
>> and/or consider contributing it back to the project as a PR :)
>>
>>
>>
>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < 
>> Charles.Joynt@gresearch.co.uk>
>> wrote:
>>
>> > Hello,
>> >
>> > I work as a Dev/Ops Data Engineer within the security team at a 
>> > company in London where we are in the process of implementing Metron.
>> > I have been tasked with implementing feeds of network environment 
>> > data into HBase so that this data can be used as enrichment sources 
>> > for our
>> security events.
>> > First-off I wanted to pull in DNS data for an internal domain.
>> >
>> > I am assuming that I need to write data into HBase in such a way 
>> > that it exactly matches what I would get from the 
>> > flatfile_loader.sh script. A colleague of mine has already loaded 
>> > some DNS data using that script, so I am using that as a reference.
>> >
>> > I have implemented a flow in NiFi which takes JSON data from a HTTP 
>> > listener and routes it to a PutHBaseJSON processor. The flow is 
>> > working, in the sense that data is successfully written to HBase, 
>> > but despite (naively) specifying "Row Identifier Encoding Strategy 
>> > = Binary", the results in HBase don't look correct. Comparing the 
>> > output from HBase scan commands I
>> > see:
>> >
>> > flatfile_loader.sh produced:
>> >
>> > ROW:
>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
>> > 00\
>> > x0E192.168.0.198
>> > CELL: column=data:v, timestamp=1516896203840, 
>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198
>> > "}
>> >
>> > PutHBaseJSON produced:
>> >
>> > ROW:  server.domain.local
>> > CELL: column=dns:v, timestamp=1527778603783, 
>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19
>> > 8"}
>> >
>> > From source JSON:
>> >
>> >
>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type"
>> > :"A
>> > ","data":"192.168.0.198"}}
>> >
>> > I know that there are some differences in column family / field 
>> > names, but my worry is the ROW id. Presumably I need to encode my 
>> > row key, "k" in the JSON data, in a way that matches how the 
>> > flatfile_loader.sh
>> script did it.
>> >
>> > Can anyone explain how I might convert my Id to the correct format?
>> > -or-
>> > Does this matter-can Metron use the human-readable ROW ids?
>> >
>> > Charlie Joynt
>> >
>> > --------------
>> > G-RESEARCH believes the information provided herein is reliable. 
>> > While every care has been taken to ensure accuracy, the information 
>> > is furnished to the recipients with no warranty as to the 
>> > completeness and accuracy of its contents and on condition that any 
>> > errors or omissions shall not be made the basis of any claim, 
>> > demand or cause of
>> action.
>> > The information in this email is intended only for the named recipient.
>> > If you are not the intended recipient please notify us immediately 
>> > and do not copy, distribute or take action based on this e-mail.
>> > All messages sent to and from this e-mail address will be logged by 
>> > G-RESEARCH and are subject to archival storage, monitoring, review 
>> > and disclosure.
>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, 
>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>> > Trenchant Limited is a company registered in England with company 
>> > number 08127121.
>> > --------------
>> >
>>
>
>
>
>--
>--
>simon elliston ball
>@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Carolyn Duby <cd...@hortonworks.com>.

I like the streaming enrichment solutions but it depends on how you are getting the data in.  If you get the data in a csv file just call the flat file loader from a script processor.  No special Nifi required.

If the enrichments don’t arrive in bulk, the streaming solution is better.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cduby@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>








On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com> wrote:

>Good solution. The streaming enrichment writer makes a lot of sense for
>this, especially if you're not using huge enrichment sources that need the
>batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this use
>case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how to
>handle releases of a nifi-metron-bundle. I'll probably get round to putting
>the code in my github at least in the next few days, while we figure out a
>more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile loader
>script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt <Ch...@gresearch.co.uk>
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at creating
>> a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side of
>> dev-ops): I really don't want to have to use the flatfile loader script,
>> and I'm not going to be able to write a Metron-style HBase key generator
>> any time soon, but I have had some success with a different approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi
>> 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>>
>> 4. Create a new "dns" sensor in Metron
>> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
>> settings to push this into HBase:
>>
>> {
>>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>         "writerClassName": "org.apache.metron.enrichment.writer.
>> SimpleHbaseEnrichmentWriter",
>>         "sensorTopic": "dns",
>>         "parserConfig": {
>>                 "shew.table": " dns",
>>                 "shew.cf": "dns",
>>                 "shew.keyColumns": "name",
>>                 "shew.enrichmentType": "dns",
>>                 "columns": {
>>                         "name": 0,
>>                         "type": 1,
>>                         "data": 2
>>                 }
>>         },
>> }
>>
>> And... it seems to be working. At least, I have data in HBase which looks
>> more like the output of the flatfile loader.
>>
>> Charlie
>>
>> -----Original Message-----
>> From: Casey Stella [mailto:cestella@gmail.com]
>> Sent: 05 June 2018 14:56
>> To: dev@metron.apache.org
>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>>
>> The problem, as you correctly diagnosed, is the key in HBase.  We
>> construct the key very specifically in Metron, so it's unlikely to work out
>> of the box with the NiFi processor unfortunately.  The key that we use is
>> formed here in the codebase:
>> https://github.com/cestella/incubator-metron/blob/master/
>> metron-platform/metron-enrichment/src/main/java/org/
>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>
>> To put that in english, consider the following:
>>
>>    - type - The enrichment type
>>    - indicator - the indicator to use
>>    - hash(*) - A murmur 3 128bit hash function
>>
>> the key is hash(indicator) + type + indicator
>>
>> This hash prefixing is a standard practice in hbase key design that allows
>> the keys to be uniformly distributed among the regions and prevents
>> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
>> construct the key and pass it in, then you might be able to either
>> construct the key in NiFi or write a processor to construct the key.
>> Ultimately though, what Carolyn said is true..the easiest approach is
>> probably using the flatfile loader.
>> If you do get this working in NiFi, however, do please let us know and/or
>> consider contributing it back to the project as a PR :)
>>
>>
>>
>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>> Charles.Joynt@gresearch.co.uk>
>> wrote:
>>
>> > Hello,
>> >
>> > I work as a Dev/Ops Data Engineer within the security team at a
>> > company in London where we are in the process of implementing Metron.
>> > I have been tasked with implementing feeds of network environment data
>> > into HBase so that this data can be used as enrichment sources for our
>> security events.
>> > First-off I wanted to pull in DNS data for an internal domain.
>> >
>> > I am assuming that I need to write data into HBase in such a way that
>> > it exactly matches what I would get from the flatfile_loader.sh
>> > script. A colleague of mine has already loaded some DNS data using
>> > that script, so I am using that as a reference.
>> >
>> > I have implemented a flow in NiFi which takes JSON data from a HTTP
>> > listener and routes it to a PutHBaseJSON processor. The flow is
>> > working, in the sense that data is successfully written to HBase, but
>> > despite (naively) specifying "Row Identifier Encoding Strategy =
>> > Binary", the results in HBase don't look correct. Comparing the output
>> > from HBase scan commands I
>> > see:
>> >
>> > flatfile_loader.sh produced:
>> >
>> > ROW:
>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\
>> > x0E192.168.0.198
>> > CELL: column=data:v, timestamp=1516896203840,
>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>> >
>> > PutHBaseJSON produced:
>> >
>> > ROW:  server.domain.local
>> > CELL: column=dns:v, timestamp=1527778603783,
>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>> >
>> > From source JSON:
>> >
>> >
>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A
>> > ","data":"192.168.0.198"}}
>> >
>> > I know that there are some differences in column family / field names,
>> > but my worry is the ROW id. Presumably I need to encode my row key,
>> > "k" in the JSON data, in a way that matches how the flatfile_loader.sh
>> script did it.
>> >
>> > Can anyone explain how I might convert my Id to the correct format?
>> > -or-
>> > Does this matter-can Metron use the human-readable ROW ids?
>> >
>> > Charlie Joynt
>> >
>> > --------------
>> > G-RESEARCH believes the information provided herein is reliable. While
>> > every care has been taken to ensure accuracy, the information is
>> > furnished to the recipients with no warranty as to the completeness
>> > and accuracy of its contents and on condition that any errors or
>> > omissions shall not be made the basis of any claim, demand or cause of
>> action.
>> > The information in this email is intended only for the named recipient.
>> > If you are not the intended recipient please notify us immediately and
>> > do not copy, distribute or take action based on this e-mail.
>> > All messages sent to and from this e-mail address will be logged by
>> > G-RESEARCH and are subject to archival storage, monitoring, review and
>> > disclosure.
>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>> > Trenchant Limited is a company registered in England with company
>> > number 08127121.
>> > --------------
>> >
>>
>
>
>
>-- 
>--
>simon elliston ball
>@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Good solution. The streaming enrichment writer makes a lot of sense for
this, especially if you're not using huge enrichment sources that need the
batch based loaders.

As it happens I have written most of a NiFi processor to handle this use
case directly - both non-record and Record based, especially for Otto :).
The one thing we need to figure out now is where to host that, and how to
handle releases of a nifi-metron-bundle. I'll probably get round to putting
the code in my github at least in the next few days, while we figure out a
more permanent home.

Charlie, out of curiosity, what didn't you like about the flatfile loader
script?

Simon

On 12 June 2018 at 18:00, Charles Joynt <Ch...@gresearch.co.uk>
wrote:

> Thanks for the responses. I appreciate the willingness to look at creating
> a NiFi processer. That would be great!
>
> Just to follow up on this (after a week looking after the "ops" side of
> dev-ops): I really don't want to have to use the flatfile loader script,
> and I'm not going to be able to write a Metron-style HBase key generator
> any time soon, but I have had some success with a different approach.
>
> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> 192.168.0.198"
> 2. Send this to a HTTP listener in NiFi
> 3. Write to a kafka topic
>
> I then followed your instructions in this blog:
> https://cwiki.apache.org/confluence/display/METRON/
> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>
> 4. Create a new "dns" sensor in Metron
> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
> settings to push this into HBase:
>
> {
>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>         "writerClassName": "org.apache.metron.enrichment.writer.
> SimpleHbaseEnrichmentWriter",
>         "sensorTopic": "dns",
>         "parserConfig": {
>                 "shew.table": " dns",
>                 "shew.cf": "dns",
>                 "shew.keyColumns": "name",
>                 "shew.enrichmentType": "dns",
>                 "columns": {
>                         "name": 0,
>                         "type": 1,
>                         "data": 2
>                 }
>         },
> }
>
> And... it seems to be working. At least, I have data in HBase which looks
> more like the output of the flatfile loader.
>
> Charlie
>
> -----Original Message-----
> From: Casey Stella [mailto:cestella@gmail.com]
> Sent: 05 June 2018 14:56
> To: dev@metron.apache.org
> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
> The problem, as you correctly diagnosed, is the key in HBase.  We
> construct the key very specifically in Metron, so it's unlikely to work out
> of the box with the NiFi processor unfortunately.  The key that we use is
> formed here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
>    - type - The enrichment type
>    - indicator - the indicator to use
>    - hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> Charles.Joynt@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a
> > company in London where we are in the process of implementing Metron.
> > I have been tasked with implementing feeds of network environment data
> > into HBase so that this data can be used as enrichment sources for our
> security events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that
> > it exactly matches what I would get from the flatfile_loader.sh
> > script. A colleague of mine has already loaded some DNS data using
> > that script, so I am using that as a reference.
> >
> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > listener and routes it to a PutHBaseJSON processor. The flow is
> > working, in the sense that data is successfully written to HBase, but
> > despite (naively) specifying "Row Identifier Encoding Strategy =
> > Binary", the results in HBase don't look correct. Comparing the output
> > from HBase scan commands I
> > see:
> >
> > flatfile_loader.sh produced:
> >
> > ROW:
> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\
> > x0E192.168.0.198
> > CELL: column=data:v, timestamp=1516896203840,
> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> >
> > PutHBaseJSON produced:
> >
> > ROW:  server.domain.local
> > CELL: column=dns:v, timestamp=1527778603783,
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > From source JSON:
> >
> >
> > {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A
> > ","data":"192.168.0.198"}}
> >
> > I know that there are some differences in column family / field names,
> > but my worry is the ROW id. Presumably I need to encode my row key,
> > "k" in the JSON data, in a way that matches how the flatfile_loader.sh
> script did it.
> >
> > Can anyone explain how I might convert my Id to the correct format?
> > -or-
> > Does this matter-can Metron use the human-readable ROW ids?
> >
> > Charlie Joynt
> >
> > --------------
> > G-RESEARCH believes the information provided herein is reliable. While
> > every care has been taken to ensure accuracy, the information is
> > furnished to the recipients with no warranty as to the completeness
> > and accuracy of its contents and on condition that any errors or
> > omissions shall not be made the basis of any claim, demand or cause of
> action.
> > The information in this email is intended only for the named recipient.
> > If you are not the intended recipient please notify us immediately and
> > do not copy, distribute or take action based on this e-mail.
> > All messages sent to and from this e-mail address will be logged by
> > G-RESEARCH and are subject to archival storage, monitoring, review and
> > disclosure.
> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > Trenchant Limited is a company registered in England with company
> > number 08127121.
> > --------------
> >
>



-- 
--
simon elliston ball
@sireb

RE: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Charles Joynt <Ch...@gresearch.co.uk>.

Thanks for the responses. I appreciate the willingness to look at creating a NiFi processer. That would be great!

Just to follow up on this (after a week looking after the "ops" side of dev-ops): I really don't want to have to use the flatfile loader script, and I'm not going to be able to write a Metron-style HBase key generator any time soon, but I have had some success with a different approach.

1. Generate data in CSV format, e.g. "server.domain.local","A","192.168.0.198"
2. Send this to a HTTP listener in NiFi
3. Write to a kafka topic

I then followed your instructions in this blog:
https://cwiki.apache.org/confluence/display/METRON/2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment

4. Create a new "dns" sensor in Metron
5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig settings to push this into HBase:

{
	"parserClassName": "org.apache.metron.parsers.csv.CSVParser",
	"writerClassName": "org.apache.metron.enrichment.writer.SimpleHbaseEnrichmentWriter",
	"sensorTopic": "dns",
	"parserConfig": {
		"shew.table": " dns",
		"shew.cf": "dns",
		"shew.keyColumns": "name",
		"shew.enrichmentType": "dns",
		"columns": {
			"name": 0,
			"type": 1,
			"data": 2
		}
	},
}

And... it seems to be working. At least, I have data in HBase which looks more like the output of the flatfile loader.

Charlie

-----Original Message-----
From: Casey Stella [mailto:cestella@gmail.com] 
Sent: 05 June 2018 14:56
To: dev@metron.apache.org
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON

The problem, as you correctly diagnosed, is the key in HBase.  We construct the key very specifically in Metron, so it's unlikely to work out of the box with the NiFi processor unfortunately.  The key that we use is formed here in the codebase:
https://github.com/cestella/incubator-metron/blob/master/metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/converter/EnrichmentKey.java#L51

To put that in english, consider the following:

   - type - The enrichment type
   - indicator - the indicator to use
   - hash(*) - A murmur 3 128bit hash function

the key is hash(indicator) + type + indicator

This hash prefixing is a standard practice in hbase key design that allows the keys to be uniformly distributed among the regions and prevents hotspotting.  Depending on how the PutHBaseJSON processor works, if you can construct the key and pass it in, then you might be able to either construct the key in NiFi or write a processor to construct the key.
Ultimately though, what Carolyn said is true..the easiest approach is probably using the flatfile loader.
If you do get this working in NiFi, however, do please let us know and/or consider contributing it back to the project as a PR :)

On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <Ch...@gresearch.co.uk>
wrote:

> Hello,
>
> I work as a Dev/Ops Data Engineer within the security team at a 
> company in London where we are in the process of implementing Metron. 
> I have been tasked with implementing feeds of network environment data 
> into HBase so that this data can be used as enrichment sources for our security events.
> First-off I wanted to pull in DNS data for an internal domain.
>
> I am assuming that I need to write data into HBase in such a way that 
> it exactly matches what I would get from the flatfile_loader.sh 
> script. A colleague of mine has already loaded some DNS data using 
> that script, so I am using that as a reference.
>
> I have implemented a flow in NiFi which takes JSON data from a HTTP 
> listener and routes it to a PutHBaseJSON processor. The flow is 
> working, in the sense that data is successfully written to HBase, but 
> despite (naively) specifying "Row Identifier Encoding Strategy = 
> Binary", the results in HBase don't look correct. Comparing the output 
> from HBase scan commands I
> see:
>
> flatfile_loader.sh produced:
>
> ROW:
> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\
> x0E192.168.0.198
> CELL: column=data:v, timestamp=1516896203840, 
> value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>
> PutHBaseJSON produced:
>
> ROW:  server.domain.local
> CELL: column=dns:v, timestamp=1527778603783, 
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
> From source JSON:
>
>
> {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A
> ","data":"192.168.0.198"}}
>
> I know that there are some differences in column family / field names, 
> but my worry is the ROW id. Presumably I need to encode my row key, 
> "k" in the JSON data, in a way that matches how the flatfile_loader.sh script did it.
>
> Can anyone explain how I might convert my Id to the correct format?
> -or-
> Does this matter-can Metron use the human-readable ROW ids?
>
> Charlie Joynt
>
> --------------
> G-RESEARCH believes the information provided herein is reliable. While 
> every care has been taken to ensure accuracy, the information is 
> furnished to the recipients with no warranty as to the completeness 
> and accuracy of its contents and on condition that any errors or 
> omissions shall not be made the basis of any claim, demand or cause of action.
> The information in this email is intended only for the named recipient.
> If you are not the intended recipient please notify us immediately and 
> do not copy, distribute or take action based on this e-mail.
> All messages sent to and from this e-mail address will be logged by 
> G-RESEARCH and are subject to archival storage, monitoring, review and 
> disclosure.
> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, 
> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Trenchant Limited is a company registered in England with company 
> number 08127121.
> --------------
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Otto Fowler <ot...@gmail.com>.

PutMetronEnrichementRecords*  ;)


On June 5, 2018 at 10:32:43, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

Do we, the community, think it would be a good idea to create a
PutMetronEnrichment NiFi processor for this use case? It seems a number of
people want to use NiFi to manage and schedule loading of enrichments for
example.

Simon

On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:

> The problem, as you correctly diagnosed, is the key in HBase. We
construct
> the key very specifically in Metron, so it's unlikely to work out of the
> box with the NiFi processor unfortunately. The key that we use is formed
> here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
> - type - The enrichment type
> - indicator - the indicator to use
> - hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that
allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting. Depending on how the PutHBaseJSON processor works, if you
can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> Charles.Joynt@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a company
> in
> > London where we are in the process of implementing Metron. I have been
> > tasked with implementing feeds of network environment data into HBase
so
> > that this data can be used as enrichment sources for our security
events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that
it
> > exactly matches what I would get from the flatfile_loader.sh script. A
> > colleague of mine has already loaded some DNS data using that script,
so
> I
> > am using that as a reference.
> >
> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > listener and routes it to a PutHBaseJSON processor. The flow is
working,
> in
> > the sense that data is successfully written to HBase, but despite
> (naively)
> > specifying "Row Identifier Encoding Strategy = Binary", the results in
> > HBase don't look correct. Comparing the output from HBase scan commands
I
> > see:
> >
> > flatfile_loader.sh produced:
> >
> > ROW:
> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> x05whois\x00\x0E192.168.0.198
> > CELL: column=data:v, timestamp=1516896203840,
> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> >
> > PutHBaseJSON produced:
> >
> > ROW: server.domain.local
> > CELL: column=dns:v, timestamp=1527778603783,
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > From source JSON:
> >
> >
> > {"k":"server.domain.local","v":{"name":"server.domain.local"
> ,"type":"A","data":"192.168.0.198"}}
> >
> > I know that there are some differences in column family / field names,
> but
> > my worry is the ROW id. Presumably I need to encode my row key, "k" in
> the
> > JSON data, in a way that matches how the flatfile_loader.sh script did
> it.
> >
> > Can anyone explain how I might convert my Id to the correct format?
> > -or-
> > Does this matter-can Metron use the human-readable ROW ids?
> >
> > Charlie Joynt
> >
> > --------------
> > G-RESEARCH believes the information provided herein is reliable. While
> > every care has been taken to ensure accuracy, the information is
> furnished
> > to the recipients with no warranty as to the completeness and accuracy
of
> > its contents and on condition that any errors or omissions shall not be
> > made the basis of any claim, demand or cause of action.
> > The information in this email is intended only for the named recipient.
> > If you are not the intended recipient please notify us immediately and
do
> > not copy, distribute or take action based on this e-mail.
> > All messages sent to and from this e-mail address will be logged by
> > G-RESEARCH and are subject to archival storage, monitoring, review and
> > disclosure.
> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > Trenchant Limited is a company registered in England with company
number
> > 08127121.
> > --------------
> >
>



-- 
-- 
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Otto Fowler <ot...@gmail.com>.

Having it in it’s own repo doesn’t tie it to Metron any less functional
wise, but allows
for a new release with Nifi only changes to be produced, or multiple
streams of releases
across nifi versions ( 1.7.x, 1.8.x ) to be produced.



On June 5, 2018 at 15:14:38, Casey Stella (cestella@gmail.com) wrote:

I agree with Simon here, the benefit of providing NiFi tooling is to enable
NiFi to use our infrastructure (e.g. our parsers, MaaS, stellar
enrichments, etc). This would tie it to Metron pretty closely.

On Tue, Jun 5, 2018 at 3:12 PM Otto Fowler <ot...@gmail.com> wrote:

> Nifi releases more often then Metron does, that might be an issue.
>
>
> On June 5, 2018 at 14:07:22, Simon Elliston Ball (
> simon@simonellistonball.com) wrote:
>
> To be honest, I would expect this to be heavily linked to the Metron
> releases, since it's going to use other metron classes and dependencies
to
> ensure compatibility. For example, a Stellar NiFi processor will be
linked
> to Metron's stellar-common, the enrichment loader will depend on key
> construction code from metron-enrichment (and should align to it). I was
> also considering an opinionated PublishMetron which linked to the Metron
> kafka, and hid some of the dances you have to do to make the readMetadata
> functions to work (i.e. some sugar around our mild abuse of kafka keys,
> which prevents people hurting their kafka by choosing the wrong
> partitioner).
>
> To that extent, I think the releases belong with Metron releases, though
of
> course that does increase our release and test burden.
>
> On 5 June 2018 at 10:55, Otto Fowler <ot...@gmail.com> wrote:
>
> > Similar to Bro, we may need to release out of cycle.
> >
> >
> >
> > On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> > simon@simonellistonball.com) wrote:
> >
> > Do you mean in the sense of a separate module, or are you suggesting we
> go
> > as far as a sub-project?
> >
> > On 5 June 2018 at 10:08, Otto Fowler <ot...@gmail.com> wrote:
> >
> > > If we do that, we should have it as a separate component maybe.
> > >
> > >
> > > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > > simon@simonellistonball.com) wrote:
> > >
> > > @otto, well, of course we would use the record api... it's great.
> > >
> > > @casey, I have actually written a stellar processor, which applies
> > stellar
> > > to all FlowFile attributes outputting the resulting stellar variable
> > space
> > > to either attributes or as json in the content.
> > >
> > > Is it worth us creating an nifi-metron-bundle. Happy to kick that
off,
> > > since I'm half way there.
> > >
> > > Simon
> > >
> > >
> > >
> > > On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:
> > >
> > > > We have jiras about ‘diverting’ and reading from nifi flows already
> > > >
> > > >
> > > > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com)
> wrote:
> > > >
> > > > I'd be in strong support of that, Simon. I think we should have
some
> > > other
> > > > NiFi components in Metron to enable users to interact with our
> > > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > > etc).
> > > >
> > > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > > simon@simonellistonball.com> wrote:
> > > >
> > > > > Do we, the community, think it would be a good idea to create a
> > > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > > number
> > > > of
> > > > > people want to use NiFi to manage and schedule loading of
> > enrichments
> > > for
> > > > > example.
> > > > >
> > > > > Simon
> > > > >
> > > > > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
> > > > >
> > > > > > The problem, as you correctly diagnosed, is the key in HBase.
We
> > > > > construct
> > > > > > the key very specifically in Metron, so it's unlikely to work
out
> > of
> > > > the
> > > > > > box with the NiFi processor unfortunately. The key that we use
is
> > > > formed
> > > > > > here in the codebase:
> > > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > > >
> > > > > > To put that in english, consider the following:
> > > > > >
> > > > > > - type - The enrichment type
> > > > > > - indicator - the indicator to use
> > > > > > - hash(*) - A murmur 3 128bit hash function
> > > > > >
> > > > > > the key is hash(indicator) + type + indicator
> > > > > >
> > > > > > This hash prefixing is a standard practice in hbase key design
> > that
> > > > > allows
> > > > > > the keys to be uniformly distributed among the regions and
> > prevents
> > > > > > hotspotting. Depending on how the PutHBaseJSON processor works,
> if
> > > you
> > > > > can
> > > > > > construct the key and pass it in, then you might be able to
> either
> > > > > > construct the key in NiFi or write a processor to construct the
> > key.
> > > > > > Ultimately though, what Carolyn said is true..the easiest
> approach
> > > is
> > > > > > probably using the flatfile loader.
> > > > > > If you do get this working in NiFi, however, do please let us
> know
> > > > and/or
> > > > > > consider contributing it back to the project as a PR :)
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > > > Charles.Joynt@gresearch.co.uk>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I work as a Dev/Ops Data Engineer within the security team at
a
> > > > company
> > > > > > in
> > > > > > > London where we are in the process of implementing Metron. I
> > have
> > > > been
> > > > > > > tasked with implementing feeds of network environment data
into
> > > HBase
> > > > > so
> > > > > > > that this data can be used as enrichment sources for our
> > security
> > > > > events.
> > > > > > > First-off I wanted to pull in DNS data for an internal
domain.
> > > > > > >
> > > > > > > I am assuming that I need to write data into HBase in such a
> way
> > > that
> > > > > it
> > > > > > > exactly matches what I would get from the flatfile_loader.sh
> > > script.
> > > > A
> > > > > > > colleague of mine has already loaded some DNS data using that
> > > script,
> > > > > so
> > > > > > I
> > > > > > > am using that as a reference.
> > > > > > >
> > > > > > > I have implemented a flow in NiFi which takes JSON data from
a
> > > HTTP
> > > > > > > listener and routes it to a PutHBaseJSON processor. The flow
is
> > > > > working,
> > > > > > in
> > > > > > > the sense that data is successfully written to HBase, but
> > despite
> > > > > > (naively)
> > > > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> > > results
> > > > in
> > > > > > > HBase don't look correct. Comparing the output from HBase
scan
> > > > > commands I
> > > > > > > see:
> > > > > > >
> > > > > > > flatfile_loader.sh produced:
> > > > > > >
> > > > > > > ROW:
> > > > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > > > x05whois\x00\x0E192.168.0.198
> > > > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > > > value={"clientname":"server.domain.local","clientip":"192.
> > > > 168.0.198"}
> > > > > > >
> > > > > > > PutHBaseJSON produced:
> > > > > > >
> > > > > > > ROW: server.domain.local
> > > > > > > CELL: column=dns:v, timestamp=1527778603783,
> > > > > > >
> > > >
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > > > > > >
> > > > > > > From source JSON:
> > > > > > >
> > > > > > >
> > > > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > > > > ,"type":"A","data":"192.168.0.198"}}
> > > > > > >
> > > > > > > I know that there are some differences in column family /
field
> > > > names,
> > > > > > but
> > > > > > > my worry is the ROW id. Presumably I need to encode my row
key,
> > > "k"
> > > > in
> > > > > > the
> > > > > > > JSON data, in a way that matches how the flatfile_loader.sh
> > script
> > > > did
> > > > > > it.
> > > > > > >
> > > > > > > Can anyone explain how I might convert my Id to the correct
> > > format?
> > > > > > > -or-
> > > > > > > Does this matter-can Metron use the human-readable ROW ids?
> > > > > > >
> > > > > > > Charlie Joynt
> > > > > > >
> > > > > > > --------------
> > > > > > > G-RESEARCH believes the information provided herein is
> reliable.
> > > > While
> > > > > > > every care has been taken to ensure accuracy, the information
> is
> > > > > > furnished
> > > > > > > to the recipients with no warranty as to the completeness and
> > > > accuracy
> > > > > of
> > > > > > > its contents and on condition that any errors or omissions
> shall
> > > not
> > > > be
> > > > > > > made the basis of any claim, demand or cause of action.
> > > > > > > The information in this email is intended only for the named
> > > > recipient.
> > > > > > > If you are not the intended recipient please notify us
> > immediately
> > > > and
> > > > > do
> > > > > > > not copy, distribute or take action based on this e-mail.
> > > > > > > All messages sent to and from this e-mail address will be
> logged
> > > by
> > > > > > > G-RESEARCH and are subject to archival storage, monitoring,
> > review
> > > > and
> > > > > > > disclosure.
> > > > > > > G-RESEARCH is the trading name of Trenchant Limited, 5th
Floor,
> > > > > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA
> > <
>
>
https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g
> >
>
> > > <https://maps.google.com/?q=19-30+Alfred+Place,+London+
> > WC1E+7EA&entry=gmail&source=g>.
> > >
> > > > > > > Trenchant Limited is a company registered in England with
> > company
> > > > > number
> > > > > > > 08127121.
> > > > > > > --------------
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --
> > > > > simon elliston ball
> > > > > @sireb
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > simon elliston ball
> > > @sireb
> > >
> > >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
> >
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Also, the bundle would be part of the metron project I expect, so the NiFi release shouldn’t matter much, now NiFi can version only processors independently.

Simon 

> On 5 Jun 2018, at 20:14, Casey Stella <ce...@gmail.com> wrote:
> 
> I agree with Simon here, the benefit of providing NiFi tooling is to enable NiFi to use our infrastructure (e.g. our parsers, MaaS, stellar enrichments, etc).  This would tie it to Metron pretty closely.
> 
>> On Tue, Jun 5, 2018 at 3:12 PM Otto Fowler <ot...@gmail.com> wrote:
>> Nifi releases more often then Metron does, that might be an issue.
>> 
>> 
>> On June 5, 2018 at 14:07:22, Simon Elliston Ball (
>> simon@simonellistonball.com) wrote:
>> 
>> To be honest, I would expect this to be heavily linked to the Metron
>> releases, since it's going to use other metron classes and dependencies to
>> ensure compatibility. For example, a Stellar NiFi processor will be linked
>> to Metron's stellar-common, the enrichment loader will depend on key
>> construction code from metron-enrichment (and should align to it). I was
>> also considering an opinionated PublishMetron which linked to the Metron
>> kafka, and hid some of the dances you have to do to make the readMetadata
>> functions to work (i.e. some sugar around our mild abuse of kafka keys,
>> which prevents people hurting their kafka by choosing the wrong
>> partitioner).
>> 
>> To that extent, I think the releases belong with Metron releases, though of
>> course that does increase our release and test burden.
>> 
>> On 5 June 2018 at 10:55, Otto Fowler <ot...@gmail.com> wrote:
>> 
>> > Similar to Bro, we may need to release out of cycle.
>> >
>> >
>> >
>> > On June 5, 2018 at 13:17:55, Simon Elliston Ball (
>> > simon@simonellistonball.com) wrote:
>> >
>> > Do you mean in the sense of a separate module, or are you suggesting we
>> go
>> > as far as a sub-project?
>> >
>> > On 5 June 2018 at 10:08, Otto Fowler <ot...@gmail.com> wrote:
>> >
>> > > If we do that, we should have it as a separate component maybe.
>> > >
>> > >
>> > > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
>> > > simon@simonellistonball.com) wrote:
>> > >
>> > > @otto, well, of course we would use the record api... it's great.
>> > >
>> > > @casey, I have actually written a stellar processor, which applies
>> > stellar
>> > > to all FlowFile attributes outputting the resulting stellar variable
>> > space
>> > > to either attributes or as json in the content.
>> > >
>> > > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
>> > > since I'm half way there.
>> > >
>> > > Simon
>> > >
>> > >
>> > >
>> > > On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:
>> > >
>> > > > We have jiras about ‘diverting’ and reading from nifi flows already
>> > > >
>> > > >
>> > > > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
>> > > >
>> > > > I'd be in strong support of that, Simon. I think we should have some
>> > > other
>> > > > NiFi components in Metron to enable users to interact with our
>> > > > infrastructure from NiFi (e.g. being able to transform via stellar,
>> > > etc).
>> > > >
>> > > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
>> > > > simon@simonellistonball.com> wrote:
>> > > >
>> > > > > Do we, the community, think it would be a good idea to create a
>> > > > > PutMetronEnrichment NiFi processor for this use case? It seems a
>> > > number
>> > > > of
>> > > > > people want to use NiFi to manage and schedule loading of
>> > enrichments
>> > > for
>> > > > > example.
>> > > > >
>> > > > > Simon
>> > > > >
>> > > > > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
>> > > > >
>> > > > > > The problem, as you correctly diagnosed, is the key in HBase. We
>> > > > > construct
>> > > > > > the key very specifically in Metron, so it's unlikely to work out
>> > of
>> > > > the
>> > > > > > box with the NiFi processor unfortunately. The key that we use is
>> > > > formed
>> > > > > > here in the codebase:
>> > > > > > https://github.com/cestella/incubator-metron/blob/master/
>> > > > > > metron-platform/metron-enrichment/src/main/java/org/
>> > > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
>> > > > > >
>> > > > > > To put that in english, consider the following:
>> > > > > >
>> > > > > > - type - The enrichment type
>> > > > > > - indicator - the indicator to use
>> > > > > > - hash(*) - A murmur 3 128bit hash function
>> > > > > >
>> > > > > > the key is hash(indicator) + type + indicator
>> > > > > >
>> > > > > > This hash prefixing is a standard practice in hbase key design
>> > that
>> > > > > allows
>> > > > > > the keys to be uniformly distributed among the regions and
>> > prevents
>> > > > > > hotspotting. Depending on how the PutHBaseJSON processor works,
>> if
>> > > you
>> > > > > can
>> > > > > > construct the key and pass it in, then you might be able to
>> either
>> > > > > > construct the key in NiFi or write a processor to construct the
>> > key.
>> > > > > > Ultimately though, what Carolyn said is true..the easiest
>> approach
>> > > is
>> > > > > > probably using the flatfile loader.
>> > > > > > If you do get this working in NiFi, however, do please let us
>> know
>> > > > and/or
>> > > > > > consider contributing it back to the project as a PR :)
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>> > > > > > Charles.Joynt@gresearch.co.uk>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hello,
>> > > > > > >
>> > > > > > > I work as a Dev/Ops Data Engineer within the security team at a
>> > > > company
>> > > > > > in
>> > > > > > > London where we are in the process of implementing Metron. I
>> > have
>> > > > been
>> > > > > > > tasked with implementing feeds of network environment data into
>> > > HBase
>> > > > > so
>> > > > > > > that this data can be used as enrichment sources for our
>> > security
>> > > > > events.
>> > > > > > > First-off I wanted to pull in DNS data for an internal domain.
>> > > > > > >
>> > > > > > > I am assuming that I need to write data into HBase in such a
>> way
>> > > that
>> > > > > it
>> > > > > > > exactly matches what I would get from the flatfile_loader.sh
>> > > script.
>> > > > A
>> > > > > > > colleague of mine has already loaded some DNS data using that
>> > > script,
>> > > > > so
>> > > > > > I
>> > > > > > > am using that as a reference.
>> > > > > > >
>> > > > > > > I have implemented a flow in NiFi which takes JSON data from a
>> > > HTTP
>> > > > > > > listener and routes it to a PutHBaseJSON processor. The flow is
>> > > > > working,
>> > > > > > in
>> > > > > > > the sense that data is successfully written to HBase, but
>> > despite
>> > > > > > (naively)
>> > > > > > > specifying "Row Identifier Encoding Strategy = Binary", the
>> > > results
>> > > > in
>> > > > > > > HBase don't look correct. Comparing the output from HBase scan
>> > > > > commands I
>> > > > > > > see:
>> > > > > > >
>> > > > > > > flatfile_loader.sh produced:
>> > > > > > >
>> > > > > > > ROW:
>> > > > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
>> > > > > > x05whois\x00\x0E192.168.0.198
>> > > > > > > CELL: column=data:v, timestamp=1516896203840,
>> > > > > > > value={"clientname":"server.domain.local","clientip":"192.
>> > > > 168.0.198"}
>> > > > > > >
>> > > > > > > PutHBaseJSON produced:
>> > > > > > >
>> > > > > > > ROW: server.domain.local
>> > > > > > > CELL: column=dns:v, timestamp=1527778603783,
>> > > > > > >
>> > > >
>> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>> >
>> > > > > > >
>> > > > > > > From source JSON:
>> > > > > > >
>> > > > > > >
>> > > > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
>> > > > > > ,"type":"A","data":"192.168.0.198"}}
>> > > > > > >
>> > > > > > > I know that there are some differences in column family / field
>> > > > names,
>> > > > > > but
>> > > > > > > my worry is the ROW id. Presumably I need to encode my row key,
>> > > "k"
>> > > > in
>> > > > > > the
>> > > > > > > JSON data, in a way that matches how the flatfile_loader.sh
>> > script
>> > > > did
>> > > > > > it.
>> > > > > > >
>> > > > > > > Can anyone explain how I might convert my Id to the correct
>> > > format?
>> > > > > > > -or-
>> > > > > > > Does this matter-can Metron use the human-readable ROW ids?
>> > > > > > >
>> > > > > > > Charlie Joynt
>> > > > > > >
>> > > > > > > --------------
>> > > > > > > G-RESEARCH believes the information provided herein is
>> reliable.
>> > > > While
>> > > > > > > every care has been taken to ensure accuracy, the information
>> is
>> > > > > > furnished
>> > > > > > > to the recipients with no warranty as to the completeness and
>> > > > accuracy
>> > > > > of
>> > > > > > > its contents and on condition that any errors or omissions
>> shall
>> > > not
>> > > > be
>> > > > > > > made the basis of any claim, demand or cause of action.
>> > > > > > > The information in this email is intended only for the named
>> > > > recipient.
>> > > > > > > If you are not the intended recipient please notify us
>> > immediately
>> > > > and
>> > > > > do
>> > > > > > > not copy, distribute or take action based on this e-mail.
>> > > > > > > All messages sent to and from this e-mail address will be
>> logged
>> > > by
>> > > > > > > G-RESEARCH and are subject to archival storage, monitoring,
>> > review
>> > > > and
>> > > > > > > disclosure.
>> > > > > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
>> > > > > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA
>> > <
>> https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g>
>> 
>> > > <https://maps.google.com/?q=19-30+Alfred+Place,+London+
>> > WC1E+7EA&entry=gmail&source=g>.
>> > >
>> > > > > > > Trenchant Limited is a company registered in England with
>> > company
>> > > > > number
>> > > > > > > 08127121.
>> > > > > > > --------------
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > --
>> > > > > simon elliston ball
>> > > > > @sireb
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > --
>> > > simon elliston ball
>> > > @sireb
>> > >
>> > >
>> >
>> >
>> > --
>> > --
>> > simon elliston ball
>> > @sireb
>> >
>> >
>> 
>> 
>> -- 
>> -- 
>> simon elliston ball
>> @sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Casey Stella <ce...@gmail.com>.

I agree with Simon here, the benefit of providing NiFi tooling is to enable
NiFi to use our infrastructure (e.g. our parsers, MaaS, stellar
enrichments, etc).  This would tie it to Metron pretty closely.

On Tue, Jun 5, 2018 at 3:12 PM Otto Fowler <ot...@gmail.com> wrote:

> Nifi releases more often then Metron does, that might be an issue.
>
>
> On June 5, 2018 at 14:07:22, Simon Elliston Ball (
> simon@simonellistonball.com) wrote:
>
> To be honest, I would expect this to be heavily linked to the Metron
> releases, since it's going to use other metron classes and dependencies to
> ensure compatibility. For example, a Stellar NiFi processor will be linked
> to Metron's stellar-common, the enrichment loader will depend on key
> construction code from metron-enrichment (and should align to it). I was
> also considering an opinionated PublishMetron which linked to the Metron
> kafka, and hid some of the dances you have to do to make the readMetadata
> functions to work (i.e. some sugar around our mild abuse of kafka keys,
> which prevents people hurting their kafka by choosing the wrong
> partitioner).
>
> To that extent, I think the releases belong with Metron releases, though of
> course that does increase our release and test burden.
>
> On 5 June 2018 at 10:55, Otto Fowler <ot...@gmail.com> wrote:
>
> > Similar to Bro, we may need to release out of cycle.
> >
> >
> >
> > On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> > simon@simonellistonball.com) wrote:
> >
> > Do you mean in the sense of a separate module, or are you suggesting we
> go
> > as far as a sub-project?
> >
> > On 5 June 2018 at 10:08, Otto Fowler <ot...@gmail.com> wrote:
> >
> > > If we do that, we should have it as a separate component maybe.
> > >
> > >
> > > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > > simon@simonellistonball.com) wrote:
> > >
> > > @otto, well, of course we would use the record api... it's great.
> > >
> > > @casey, I have actually written a stellar processor, which applies
> > stellar
> > > to all FlowFile attributes outputting the resulting stellar variable
> > space
> > > to either attributes or as json in the content.
> > >
> > > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> > > since I'm half way there.
> > >
> > > Simon
> > >
> > >
> > >
> > > On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:
> > >
> > > > We have jiras about ‘diverting’ and reading from nifi flows already
> > > >
> > > >
> > > > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com)
> wrote:
> > > >
> > > > I'd be in strong support of that, Simon. I think we should have some
> > > other
> > > > NiFi components in Metron to enable users to interact with our
> > > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > > etc).
> > > >
> > > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > > simon@simonellistonball.com> wrote:
> > > >
> > > > > Do we, the community, think it would be a good idea to create a
> > > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > > number
> > > > of
> > > > > people want to use NiFi to manage and schedule loading of
> > enrichments
> > > for
> > > > > example.
> > > > >
> > > > > Simon
> > > > >
> > > > > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
> > > > >
> > > > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > > > construct
> > > > > > the key very specifically in Metron, so it's unlikely to work out
> > of
> > > > the
> > > > > > box with the NiFi processor unfortunately. The key that we use is
> > > > formed
> > > > > > here in the codebase:
> > > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > > >
> > > > > > To put that in english, consider the following:
> > > > > >
> > > > > > - type - The enrichment type
> > > > > > - indicator - the indicator to use
> > > > > > - hash(*) - A murmur 3 128bit hash function
> > > > > >
> > > > > > the key is hash(indicator) + type + indicator
> > > > > >
> > > > > > This hash prefixing is a standard practice in hbase key design
> > that
> > > > > allows
> > > > > > the keys to be uniformly distributed among the regions and
> > prevents
> > > > > > hotspotting. Depending on how the PutHBaseJSON processor works,
> if
> > > you
> > > > > can
> > > > > > construct the key and pass it in, then you might be able to
> either
> > > > > > construct the key in NiFi or write a processor to construct the
> > key.
> > > > > > Ultimately though, what Carolyn said is true..the easiest
> approach
> > > is
> > > > > > probably using the flatfile loader.
> > > > > > If you do get this working in NiFi, however, do please let us
> know
> > > > and/or
> > > > > > consider contributing it back to the project as a PR :)
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > > > Charles.Joynt@gresearch.co.uk>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > > > company
> > > > > > in
> > > > > > > London where we are in the process of implementing Metron. I
> > have
> > > > been
> > > > > > > tasked with implementing feeds of network environment data into
> > > HBase
> > > > > so
> > > > > > > that this data can be used as enrichment sources for our
> > security
> > > > > events.
> > > > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > > > >
> > > > > > > I am assuming that I need to write data into HBase in such a
> way
> > > that
> > > > > it
> > > > > > > exactly matches what I would get from the flatfile_loader.sh
> > > script.
> > > > A
> > > > > > > colleague of mine has already loaded some DNS data using that
> > > script,
> > > > > so
> > > > > > I
> > > > > > > am using that as a reference.
> > > > > > >
> > > > > > > I have implemented a flow in NiFi which takes JSON data from a
> > > HTTP
> > > > > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > > > > working,
> > > > > > in
> > > > > > > the sense that data is successfully written to HBase, but
> > despite
> > > > > > (naively)
> > > > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> > > results
> > > > in
> > > > > > > HBase don't look correct. Comparing the output from HBase scan
> > > > > commands I
> > > > > > > see:
> > > > > > >
> > > > > > > flatfile_loader.sh produced:
> > > > > > >
> > > > > > > ROW:
> > > > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > > > x05whois\x00\x0E192.168.0.198
> > > > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > > > value={"clientname":"server.domain.local","clientip":"192.
> > > > 168.0.198"}
> > > > > > >
> > > > > > > PutHBaseJSON produced:
> > > > > > >
> > > > > > > ROW: server.domain.local
> > > > > > > CELL: column=dns:v, timestamp=1527778603783,
> > > > > > >
> > > >
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > > > > > >
> > > > > > > From source JSON:
> > > > > > >
> > > > > > >
> > > > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > > > > ,"type":"A","data":"192.168.0.198"}}
> > > > > > >
> > > > > > > I know that there are some differences in column family / field
> > > > names,
> > > > > > but
> > > > > > > my worry is the ROW id. Presumably I need to encode my row key,
> > > "k"
> > > > in
> > > > > > the
> > > > > > > JSON data, in a way that matches how the flatfile_loader.sh
> > script
> > > > did
> > > > > > it.
> > > > > > >
> > > > > > > Can anyone explain how I might convert my Id to the correct
> > > format?
> > > > > > > -or-
> > > > > > > Does this matter-can Metron use the human-readable ROW ids?
> > > > > > >
> > > > > > > Charlie Joynt
> > > > > > >
> > > > > > > --------------
> > > > > > > G-RESEARCH believes the information provided herein is
> reliable.
> > > > While
> > > > > > > every care has been taken to ensure accuracy, the information
> is
> > > > > > furnished
> > > > > > > to the recipients with no warranty as to the completeness and
> > > > accuracy
> > > > > of
> > > > > > > its contents and on condition that any errors or omissions
> shall
> > > not
> > > > be
> > > > > > > made the basis of any claim, demand or cause of action.
> > > > > > > The information in this email is intended only for the named
> > > > recipient.
> > > > > > > If you are not the intended recipient please notify us
> > immediately
> > > > and
> > > > > do
> > > > > > > not copy, distribute or take action based on this e-mail.
> > > > > > > All messages sent to and from this e-mail address will be
> logged
> > > by
> > > > > > > G-RESEARCH and are subject to archival storage, monitoring,
> > review
> > > > and
> > > > > > > disclosure.
> > > > > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > > > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA
> > <
>
> https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g
> >
>
> > > <https://maps.google.com/?q=19-30+Alfred+Place,+London+
> > WC1E+7EA&entry=gmail&source=g>.
> > >
> > > > > > > Trenchant Limited is a company registered in England with
> > company
> > > > > number
> > > > > > > 08127121.
> > > > > > > --------------
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --
> > > > > simon elliston ball
> > > > > @sireb
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > simon elliston ball
> > > @sireb
> > >
> > >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
> >
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Otto Fowler <ot...@gmail.com>.

Nifi releases more often then Metron does, that might be an issue.


On June 5, 2018 at 14:07:22, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

To be honest, I would expect this to be heavily linked to the Metron
releases, since it's going to use other metron classes and dependencies to
ensure compatibility. For example, a Stellar NiFi processor will be linked
to Metron's stellar-common, the enrichment loader will depend on key
construction code from metron-enrichment (and should align to it). I was
also considering an opinionated PublishMetron which linked to the Metron
kafka, and hid some of the dances you have to do to make the readMetadata
functions to work (i.e. some sugar around our mild abuse of kafka keys,
which prevents people hurting their kafka by choosing the wrong
partitioner).

To that extent, I think the releases belong with Metron releases, though of
course that does increase our release and test burden.

On 5 June 2018 at 10:55, Otto Fowler <ot...@gmail.com> wrote:

> Similar to Bro, we may need to release out of cycle.
>
>
>
> On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> simon@simonellistonball.com) wrote:
>
> Do you mean in the sense of a separate module, or are you suggesting we
go
> as far as a sub-project?
>
> On 5 June 2018 at 10:08, Otto Fowler <ot...@gmail.com> wrote:
>
> > If we do that, we should have it as a separate component maybe.
> >
> >
> > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > simon@simonellistonball.com) wrote:
> >
> > @otto, well, of course we would use the record api... it's great.
> >
> > @casey, I have actually written a stellar processor, which applies
> stellar
> > to all FlowFile attributes outputting the resulting stellar variable
> space
> > to either attributes or as json in the content.
> >
> > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> > since I'm half way there.
> >
> > Simon
> >
> >
> >
> > On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:
> >
> > > We have jiras about ‘diverting’ and reading from nifi flows already
> > >
> > >
> > > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
> > >
> > > I'd be in strong support of that, Simon. I think we should have some
> > other
> > > NiFi components in Metron to enable users to interact with our
> > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > etc).
> > >
> > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > simon@simonellistonball.com> wrote:
> > >
> > > > Do we, the community, think it would be a good idea to create a
> > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > number
> > > of
> > > > people want to use NiFi to manage and schedule loading of
> enrichments
> > for
> > > > example.
> > > >
> > > > Simon
> > > >
> > > > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
> > > >
> > > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > > construct
> > > > > the key very specifically in Metron, so it's unlikely to work out
> of
> > > the
> > > > > box with the NiFi processor unfortunately. The key that we use is
> > > formed
> > > > > here in the codebase:
> > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > >
> > > > > To put that in english, consider the following:
> > > > >
> > > > > - type - The enrichment type
> > > > > - indicator - the indicator to use
> > > > > - hash(*) - A murmur 3 128bit hash function
> > > > >
> > > > > the key is hash(indicator) + type + indicator
> > > > >
> > > > > This hash prefixing is a standard practice in hbase key design
> that
> > > > allows
> > > > > the keys to be uniformly distributed among the regions and
> prevents
> > > > > hotspotting. Depending on how the PutHBaseJSON processor works,
if
> > you
> > > > can
> > > > > construct the key and pass it in, then you might be able to
either
> > > > > construct the key in NiFi or write a processor to construct the
> key.
> > > > > Ultimately though, what Carolyn said is true..the easiest
approach
> > is
> > > > > probably using the flatfile loader.
> > > > > If you do get this working in NiFi, however, do please let us
know
> > > and/or
> > > > > consider contributing it back to the project as a PR :)
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > > Charles.Joynt@gresearch.co.uk>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > > company
> > > > > in
> > > > > > London where we are in the process of implementing Metron. I
> have
> > > been
> > > > > > tasked with implementing feeds of network environment data into
> > HBase
> > > > so
> > > > > > that this data can be used as enrichment sources for our
> security
> > > > events.
> > > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > > >
> > > > > > I am assuming that I need to write data into HBase in such a
way
> > that
> > > > it
> > > > > > exactly matches what I would get from the flatfile_loader.sh
> > script.
> > > A
> > > > > > colleague of mine has already loaded some DNS data using that
> > script,
> > > > so
> > > > > I
> > > > > > am using that as a reference.
> > > > > >
> > > > > > I have implemented a flow in NiFi which takes JSON data from a
> > HTTP
> > > > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > > > working,
> > > > > in
> > > > > > the sense that data is successfully written to HBase, but
> despite
> > > > > (naively)
> > > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> > results
> > > in
> > > > > > HBase don't look correct. Comparing the output from HBase scan
> > > > commands I
> > > > > > see:
> > > > > >
> > > > > > flatfile_loader.sh produced:
> > > > > >
> > > > > > ROW:
> > > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > > x05whois\x00\x0E192.168.0.198
> > > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > > value={"clientname":"server.domain.local","clientip":"192.
> > > 168.0.198"}
> > > > > >
> > > > > > PutHBaseJSON produced:
> > > > > >
> > > > > > ROW: server.domain.local
> > > > > > CELL: column=dns:v, timestamp=1527778603783,
> > > > > >
> > >
value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
> > > > > >
> > > > > > From source JSON:
> > > > > >
> > > > > >
> > > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > > > ,"type":"A","data":"192.168.0.198"}}
> > > > > >
> > > > > > I know that there are some differences in column family / field
> > > names,
> > > > > but
> > > > > > my worry is the ROW id. Presumably I need to encode my row key,
> > "k"
> > > in
> > > > > the
> > > > > > JSON data, in a way that matches how the flatfile_loader.sh
> script
> > > did
> > > > > it.
> > > > > >
> > > > > > Can anyone explain how I might convert my Id to the correct
> > format?
> > > > > > -or-
> > > > > > Does this matter-can Metron use the human-readable ROW ids?
> > > > > >
> > > > > > Charlie Joynt
> > > > > >
> > > > > > --------------
> > > > > > G-RESEARCH believes the information provided herein is
reliable.
> > > While
> > > > > > every care has been taken to ensure accuracy, the information
is
> > > > > furnished
> > > > > > to the recipients with no warranty as to the completeness and
> > > accuracy
> > > > of
> > > > > > its contents and on condition that any errors or omissions
shall
> > not
> > > be
> > > > > > made the basis of any claim, demand or cause of action.
> > > > > > The information in this email is intended only for the named
> > > recipient.
> > > > > > If you are not the intended recipient please notify us
> immediately
> > > and
> > > > do
> > > > > > not copy, distribute or take action based on this e-mail.
> > > > > > All messages sent to and from this e-mail address will be
logged
> > by
> > > > > > G-RESEARCH and are subject to archival storage, monitoring,
> review
> > > and
> > > > > > disclosure.
> > > > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA
> <
https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g>

> > <https://maps.google.com/?q=19-30+Alfred+Place,+London+
> WC1E+7EA&entry=gmail&source=g>.
> >
> > > > > > Trenchant Limited is a company registered in England with
> company
> > > > number
> > > > > > 08127121.
> > > > > > --------------
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > simon elliston ball
> > > > @sireb
> > > >
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
> >
>
>
> --
> --
> simon elliston ball
> @sireb
>
>


-- 
-- 
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

To be honest, I would expect this to be heavily linked to the Metron
releases, since it's going to use other metron classes and dependencies to
ensure compatibility. For example, a Stellar NiFi processor will be linked
to Metron's stellar-common, the enrichment loader will depend on key
construction code from metron-enrichment (and should align to it). I was
also considering an opinionated PublishMetron which linked to the Metron
kafka, and hid some of the dances you have to do to make the readMetadata
functions to work (i.e. some sugar around our mild abuse of kafka keys,
which prevents people hurting their kafka by choosing the wrong
partitioner).

To that extent, I think the releases belong with Metron releases, though of
course that does increase our release and test burden.

On 5 June 2018 at 10:55, Otto Fowler <ot...@gmail.com> wrote:

> Similar to Bro, we may need to release out of cycle.
>
>
>
> On June 5, 2018 at 13:17:55, Simon Elliston Ball (
> simon@simonellistonball.com) wrote:
>
> Do you mean in the sense of a separate module, or are you suggesting we go
> as far as a sub-project?
>
> On 5 June 2018 at 10:08, Otto Fowler <ot...@gmail.com> wrote:
>
> > If we do that, we should have it as a separate component maybe.
> >
> >
> > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> > simon@simonellistonball.com) wrote:
> >
> > @otto, well, of course we would use the record api... it's great.
> >
> > @casey, I have actually written a stellar processor, which applies
> stellar
> > to all FlowFile attributes outputting the resulting stellar variable
> space
> > to either attributes or as json in the content.
> >
> > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> > since I'm half way there.
> >
> > Simon
> >
> >
> >
> > On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:
> >
> > > We have jiras about ‘diverting’ and reading from nifi flows already
> > >
> > >
> > > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
> > >
> > > I'd be in strong support of that, Simon. I think we should have some
> > other
> > > NiFi components in Metron to enable users to interact with our
> > > infrastructure from NiFi (e.g. being able to transform via stellar,
> > etc).
> > >
> > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > > simon@simonellistonball.com> wrote:
> > >
> > > > Do we, the community, think it would be a good idea to create a
> > > > PutMetronEnrichment NiFi processor for this use case? It seems a
> > number
> > > of
> > > > people want to use NiFi to manage and schedule loading of
> enrichments
> > for
> > > > example.
> > > >
> > > > Simon
> > > >
> > > > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
> > > >
> > > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > > construct
> > > > > the key very specifically in Metron, so it's unlikely to work out
> of
> > > the
> > > > > box with the NiFi processor unfortunately. The key that we use is
> > > formed
> > > > > here in the codebase:
> > > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > > >
> > > > > To put that in english, consider the following:
> > > > >
> > > > > - type - The enrichment type
> > > > > - indicator - the indicator to use
> > > > > - hash(*) - A murmur 3 128bit hash function
> > > > >
> > > > > the key is hash(indicator) + type + indicator
> > > > >
> > > > > This hash prefixing is a standard practice in hbase key design
> that
> > > > allows
> > > > > the keys to be uniformly distributed among the regions and
> prevents
> > > > > hotspotting. Depending on how the PutHBaseJSON processor works, if
> > you
> > > > can
> > > > > construct the key and pass it in, then you might be able to either
> > > > > construct the key in NiFi or write a processor to construct the
> key.
> > > > > Ultimately though, what Carolyn said is true..the easiest approach
> > is
> > > > > probably using the flatfile loader.
> > > > > If you do get this working in NiFi, however, do please let us know
> > > and/or
> > > > > consider contributing it back to the project as a PR :)
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > > Charles.Joynt@gresearch.co.uk>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > > company
> > > > > in
> > > > > > London where we are in the process of implementing Metron. I
> have
> > > been
> > > > > > tasked with implementing feeds of network environment data into
> > HBase
> > > > so
> > > > > > that this data can be used as enrichment sources for our
> security
> > > > events.
> > > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > > >
> > > > > > I am assuming that I need to write data into HBase in such a way
> > that
> > > > it
> > > > > > exactly matches what I would get from the flatfile_loader.sh
> > script.
> > > A
> > > > > > colleague of mine has already loaded some DNS data using that
> > script,
> > > > so
> > > > > I
> > > > > > am using that as a reference.
> > > > > >
> > > > > > I have implemented a flow in NiFi which takes JSON data from a
> > HTTP
> > > > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > > > working,
> > > > > in
> > > > > > the sense that data is successfully written to HBase, but
> despite
> > > > > (naively)
> > > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> > results
> > > in
> > > > > > HBase don't look correct. Comparing the output from HBase scan
> > > > commands I
> > > > > > see:
> > > > > >
> > > > > > flatfile_loader.sh produced:
> > > > > >
> > > > > > ROW:
> > > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > > x05whois\x00\x0E192.168.0.198
> > > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > > value={"clientname":"server.domain.local","clientip":"192.
> > > 168.0.198"}
> > > > > >
> > > > > > PutHBaseJSON produced:
> > > > > >
> > > > > > ROW: server.domain.local
> > > > > > CELL: column=dns:v, timestamp=1527778603783,
> > > > > >
> > > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
> > > > > >
> > > > > > From source JSON:
> > > > > >
> > > > > >
> > > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > > > ,"type":"A","data":"192.168.0.198"}}
> > > > > >
> > > > > > I know that there are some differences in column family / field
> > > names,
> > > > > but
> > > > > > my worry is the ROW id. Presumably I need to encode my row key,
> > "k"
> > > in
> > > > > the
> > > > > > JSON data, in a way that matches how the flatfile_loader.sh
> script
> > > did
> > > > > it.
> > > > > >
> > > > > > Can anyone explain how I might convert my Id to the correct
> > format?
> > > > > > -or-
> > > > > > Does this matter-can Metron use the human-readable ROW ids?
> > > > > >
> > > > > > Charlie Joynt
> > > > > >
> > > > > > --------------
> > > > > > G-RESEARCH believes the information provided herein is reliable.
> > > While
> > > > > > every care has been taken to ensure accuracy, the information is
> > > > > furnished
> > > > > > to the recipients with no warranty as to the completeness and
> > > accuracy
> > > > of
> > > > > > its contents and on condition that any errors or omissions shall
> > not
> > > be
> > > > > > made the basis of any claim, demand or cause of action.
> > > > > > The information in this email is intended only for the named
> > > recipient.
> > > > > > If you are not the intended recipient please notify us
> immediately
> > > and
> > > > do
> > > > > > not copy, distribute or take action based on this e-mail.
> > > > > > All messages sent to and from this e-mail address will be logged
> > by
> > > > > > G-RESEARCH and are subject to archival storage, monitoring,
> review
> > > and
> > > > > > disclosure.
> > > > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA
> <https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g>
> > <https://maps.google.com/?q=19-30+Alfred+Place,+London+
> WC1E+7EA&entry=gmail&source=g>.
> >
> > > > > > Trenchant Limited is a company registered in England with
> company
> > > > number
> > > > > > 08127121.
> > > > > > --------------
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > simon elliston ball
> > > > @sireb
> > > >
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
> >
>
>
> --
> --
> simon elliston ball
> @sireb
>
>


-- 
--
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Otto Fowler <ot...@gmail.com>.

Similar to Bro, we may need to release out of cycle.



On June 5, 2018 at 13:17:55, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

Do you mean in the sense of a separate module, or are you suggesting we go
as far as a sub-project?

On 5 June 2018 at 10:08, Otto Fowler <ot...@gmail.com> wrote:

> If we do that, we should have it as a separate component maybe.
>
>
> On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> simon@simonellistonball.com) wrote:
>
> @otto, well, of course we would use the record api... it's great.
>
> @casey, I have actually written a stellar processor, which applies
stellar
> to all FlowFile attributes outputting the resulting stellar variable
space
> to either attributes or as json in the content.
>
> Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> since I'm half way there.
>
> Simon
>
>
>
> On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:
>
> > We have jiras about ‘diverting’ and reading from nifi flows already
> >
> >
> > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
> >
> > I'd be in strong support of that, Simon. I think we should have some
> other
> > NiFi components in Metron to enable users to interact with our
> > infrastructure from NiFi (e.g. being able to transform via stellar,
> etc).
> >
> > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > simon@simonellistonball.com> wrote:
> >
> > > Do we, the community, think it would be a good idea to create a
> > > PutMetronEnrichment NiFi processor for this use case? It seems a
> number
> > of
> > > people want to use NiFi to manage and schedule loading of enrichments
> for
> > > example.
> > >
> > > Simon
> > >
> > > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
> > >
> > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > construct
> > > > the key very specifically in Metron, so it's unlikely to work out
of
> > the
> > > > box with the NiFi processor unfortunately. The key that we use is
> > formed
> > > > here in the codebase:
> > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > >
> > > > To put that in english, consider the following:
> > > >
> > > > - type - The enrichment type
> > > > - indicator - the indicator to use
> > > > - hash(*) - A murmur 3 128bit hash function
> > > >
> > > > the key is hash(indicator) + type + indicator
> > > >
> > > > This hash prefixing is a standard practice in hbase key design that
> > > allows
> > > > the keys to be uniformly distributed among the regions and prevents
> > > > hotspotting. Depending on how the PutHBaseJSON processor works, if
> you
> > > can
> > > > construct the key and pass it in, then you might be able to either
> > > > construct the key in NiFi or write a processor to construct the
key.
> > > > Ultimately though, what Carolyn said is true..the easiest approach
> is
> > > > probably using the flatfile loader.
> > > > If you do get this working in NiFi, however, do please let us know
> > and/or
> > > > consider contributing it back to the project as a PR :)
> > > >
> > > >
> > > >
> > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > Charles.Joynt@gresearch.co.uk>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > company
> > > > in
> > > > > London where we are in the process of implementing Metron. I have
> > been
> > > > > tasked with implementing feeds of network environment data into
> HBase
> > > so
> > > > > that this data can be used as enrichment sources for our security
> > > events.
> > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > >
> > > > > I am assuming that I need to write data into HBase in such a way
> that
> > > it
> > > > > exactly matches what I would get from the flatfile_loader.sh
> script.
> > A
> > > > > colleague of mine has already loaded some DNS data using that
> script,
> > > so
> > > > I
> > > > > am using that as a reference.
> > > > >
> > > > > I have implemented a flow in NiFi which takes JSON data from a
> HTTP
> > > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > > working,
> > > > in
> > > > > the sense that data is successfully written to HBase, but despite
> > > > (naively)
> > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> results
> > in
> > > > > HBase don't look correct. Comparing the output from HBase scan
> > > commands I
> > > > > see:
> > > > >
> > > > > flatfile_loader.sh produced:
> > > > >
> > > > > ROW:
> > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > x05whois\x00\x0E192.168.0.198
> > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > value={"clientname":"server.domain.local","clientip":"192.
> > 168.0.198"}
> > > > >
> > > > > PutHBaseJSON produced:
> > > > >
> > > > > ROW: server.domain.local
> > > > > CELL: column=dns:v, timestamp=1527778603783,
> > > > >
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > > > >
> > > > > From source JSON:
> > > > >
> > > > >
> > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > > ,"type":"A","data":"192.168.0.198"}}
> > > > >
> > > > > I know that there are some differences in column family / field
> > names,
> > > > but
> > > > > my worry is the ROW id. Presumably I need to encode my row key,
> "k"
> > in
> > > > the
> > > > > JSON data, in a way that matches how the flatfile_loader.sh
script
> > did
> > > > it.
> > > > >
> > > > > Can anyone explain how I might convert my Id to the correct
> format?
> > > > > -or-
> > > > > Does this matter-can Metron use the human-readable ROW ids?
> > > > >
> > > > > Charlie Joynt
> > > > >
> > > > > --------------
> > > > > G-RESEARCH believes the information provided herein is reliable.
> > While
> > > > > every care has been taken to ensure accuracy, the information is
> > > > furnished
> > > > > to the recipients with no warranty as to the completeness and
> > accuracy
> > > of
> > > > > its contents and on condition that any errors or omissions shall
> not
> > be
> > > > > made the basis of any claim, demand or cause of action.
> > > > > The information in this email is intended only for the named
> > recipient.
> > > > > If you are not the intended recipient please notify us
immediately
> > and
> > > do
> > > > > not copy, distribute or take action based on this e-mail.
> > > > > All messages sent to and from this e-mail address will be logged
> by
> > > > > G-RESEARCH and are subject to archival storage, monitoring,
review
> > and
> > > > > disclosure.
> > > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA
> <
https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g>.

>
> > > > > Trenchant Limited is a company registered in England with company
> > > number
> > > > > 08127121.
> > > > > --------------
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > simon elliston ball
> > > @sireb
> > >
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>
>


-- 
-- 
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Do you mean in the sense of a separate module, or are you suggesting we go
as far as a sub-project?

On 5 June 2018 at 10:08, Otto Fowler <ot...@gmail.com> wrote:

> If we do that, we should have it as a separate component maybe.
>
>
> On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> simon@simonellistonball.com) wrote:
>
> @otto, well, of course we would use the record api... it's great.
>
> @casey, I have actually written a stellar processor, which applies stellar
> to all FlowFile attributes outputting the resulting stellar variable space
> to either attributes or as json in the content.
>
> Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> since I'm half way there.
>
> Simon
>
>
>
> On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:
>
> > We have jiras about ‘diverting’ and reading from nifi flows already
> >
> >
> > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
> >
> > I'd be in strong support of that, Simon. I think we should have some
> other
> > NiFi components in Metron to enable users to interact with our
> > infrastructure from NiFi (e.g. being able to transform via stellar,
> etc).
> >
> > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > simon@simonellistonball.com> wrote:
> >
> > > Do we, the community, think it would be a good idea to create a
> > > PutMetronEnrichment NiFi processor for this use case? It seems a
> number
> > of
> > > people want to use NiFi to manage and schedule loading of enrichments
> for
> > > example.
> > >
> > > Simon
> > >
> > > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
> > >
> > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > construct
> > > > the key very specifically in Metron, so it's unlikely to work out of
> > the
> > > > box with the NiFi processor unfortunately. The key that we use is
> > formed
> > > > here in the codebase:
> > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > >
> > > > To put that in english, consider the following:
> > > >
> > > > - type - The enrichment type
> > > > - indicator - the indicator to use
> > > > - hash(*) - A murmur 3 128bit hash function
> > > >
> > > > the key is hash(indicator) + type + indicator
> > > >
> > > > This hash prefixing is a standard practice in hbase key design that
> > > allows
> > > > the keys to be uniformly distributed among the regions and prevents
> > > > hotspotting. Depending on how the PutHBaseJSON processor works, if
> you
> > > can
> > > > construct the key and pass it in, then you might be able to either
> > > > construct the key in NiFi or write a processor to construct the key.
> > > > Ultimately though, what Carolyn said is true..the easiest approach
> is
> > > > probably using the flatfile loader.
> > > > If you do get this working in NiFi, however, do please let us know
> > and/or
> > > > consider contributing it back to the project as a PR :)
> > > >
> > > >
> > > >
> > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > Charles.Joynt@gresearch.co.uk>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > company
> > > > in
> > > > > London where we are in the process of implementing Metron. I have
> > been
> > > > > tasked with implementing feeds of network environment data into
> HBase
> > > so
> > > > > that this data can be used as enrichment sources for our security
> > > events.
> > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > >
> > > > > I am assuming that I need to write data into HBase in such a way
> that
> > > it
> > > > > exactly matches what I would get from the flatfile_loader.sh
> script.
> > A
> > > > > colleague of mine has already loaded some DNS data using that
> script,
> > > so
> > > > I
> > > > > am using that as a reference.
> > > > >
> > > > > I have implemented a flow in NiFi which takes JSON data from a
> HTTP
> > > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > > working,
> > > > in
> > > > > the sense that data is successfully written to HBase, but despite
> > > > (naively)
> > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> results
> > in
> > > > > HBase don't look correct. Comparing the output from HBase scan
> > > commands I
> > > > > see:
> > > > >
> > > > > flatfile_loader.sh produced:
> > > > >
> > > > > ROW:
> > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > x05whois\x00\x0E192.168.0.198
> > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > value={"clientname":"server.domain.local","clientip":"192.
> > 168.0.198"}
> > > > >
> > > > > PutHBaseJSON produced:
> > > > >
> > > > > ROW: server.domain.local
> > > > > CELL: column=dns:v, timestamp=1527778603783,
> > > > >
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > > > >
> > > > > From source JSON:
> > > > >
> > > > >
> > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > > ,"type":"A","data":"192.168.0.198"}}
> > > > >
> > > > > I know that there are some differences in column family / field
> > names,
> > > > but
> > > > > my worry is the ROW id. Presumably I need to encode my row key,
> "k"
> > in
> > > > the
> > > > > JSON data, in a way that matches how the flatfile_loader.sh script
> > did
> > > > it.
> > > > >
> > > > > Can anyone explain how I might convert my Id to the correct
> format?
> > > > > -or-
> > > > > Does this matter-can Metron use the human-readable ROW ids?
> > > > >
> > > > > Charlie Joynt
> > > > >
> > > > > --------------
> > > > > G-RESEARCH believes the information provided herein is reliable.
> > While
> > > > > every care has been taken to ensure accuracy, the information is
> > > > furnished
> > > > > to the recipients with no warranty as to the completeness and
> > accuracy
> > > of
> > > > > its contents and on condition that any errors or omissions shall
> not
> > be
> > > > > made the basis of any claim, demand or cause of action.
> > > > > The information in this email is intended only for the named
> > recipient.
> > > > > If you are not the intended recipient please notify us immediately
> > and
> > > do
> > > > > not copy, distribute or take action based on this e-mail.
> > > > > All messages sent to and from this e-mail address will be logged
> by
> > > > > G-RESEARCH and are subject to archival storage, monitoring, review
> > and
> > > > > disclosure.
> > > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA
> <https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g>.
>
> > > > > Trenchant Limited is a company registered in England with company
> > > number
> > > > > 08127121.
> > > > > --------------
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > simon elliston ball
> > > @sireb
> > >
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>
>


-- 
--
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Otto Fowler <ot...@gmail.com>.

If we do that, we should have it as a separate component maybe.


On June 5, 2018 at 12:42:57, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

@otto, well, of course we would use the record api... it's great.

@casey, I have actually written a stellar processor, which applies stellar
to all FlowFile attributes outputting the resulting stellar variable space
to either attributes or as json in the content.

Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
since I'm half way there.

Simon



On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:

> We have jiras about ‘diverting’ and reading from nifi flows already
>
>
> On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
>
> I'd be in strong support of that, Simon. I think we should have some
other
> NiFi components in Metron to enable users to interact with our
> infrastructure from NiFi (e.g. being able to transform via stellar, etc).
>
> On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
>
> > Do we, the community, think it would be a good idea to create a
> > PutMetronEnrichment NiFi processor for this use case? It seems a number
> of
> > people want to use NiFi to manage and schedule loading of enrichments
for
> > example.
> >
> > Simon
> >
> > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
> >
> > > The problem, as you correctly diagnosed, is the key in HBase. We
> > construct
> > > the key very specifically in Metron, so it's unlikely to work out of
> the
> > > box with the NiFi processor unfortunately. The key that we use is
> formed
> > > here in the codebase:
> > > https://github.com/cestella/incubator-metron/blob/master/
> > > metron-platform/metron-enrichment/src/main/java/org/
> > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > >
> > > To put that in english, consider the following:
> > >
> > > - type - The enrichment type
> > > - indicator - the indicator to use
> > > - hash(*) - A murmur 3 128bit hash function
> > >
> > > the key is hash(indicator) + type + indicator
> > >
> > > This hash prefixing is a standard practice in hbase key design that
> > allows
> > > the keys to be uniformly distributed among the regions and prevents
> > > hotspotting. Depending on how the PutHBaseJSON processor works, if
you
> > can
> > > construct the key and pass it in, then you might be able to either
> > > construct the key in NiFi or write a processor to construct the key.
> > > Ultimately though, what Carolyn said is true..the easiest approach is
> > > probably using the flatfile loader.
> > > If you do get this working in NiFi, however, do please let us know
> and/or
> > > consider contributing it back to the project as a PR :)
> > >
> > >
> > >
> > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > Charles.Joynt@gresearch.co.uk>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I work as a Dev/Ops Data Engineer within the security team at a
> company
> > > in
> > > > London where we are in the process of implementing Metron. I have
> been
> > > > tasked with implementing feeds of network environment data into
HBase
> > so
> > > > that this data can be used as enrichment sources for our security
> > events.
> > > > First-off I wanted to pull in DNS data for an internal domain.
> > > >
> > > > I am assuming that I need to write data into HBase in such a way
that
> > it
> > > > exactly matches what I would get from the flatfile_loader.sh
script.
> A
> > > > colleague of mine has already loaded some DNS data using that
script,
> > so
> > > I
> > > > am using that as a reference.
> > > >
> > > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > working,
> > > in
> > > > the sense that data is successfully written to HBase, but despite
> > > (naively)
> > > > specifying "Row Identifier Encoding Strategy = Binary", the results
> in
> > > > HBase don't look correct. Comparing the output from HBase scan
> > commands I
> > > > see:
> > > >
> > > > flatfile_loader.sh produced:
> > > >
> > > > ROW:
> > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > x05whois\x00\x0E192.168.0.198
> > > > CELL: column=data:v, timestamp=1516896203840,
> > > > value={"clientname":"server.domain.local","clientip":"192.
> 168.0.198"}
> > > >
> > > > PutHBaseJSON produced:
> > > >
> > > > ROW: server.domain.local
> > > > CELL: column=dns:v, timestamp=1527778603783,
> > > >
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > > >
> > > > From source JSON:
> > > >
> > > >
> > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > ,"type":"A","data":"192.168.0.198"}}
> > > >
> > > > I know that there are some differences in column family / field
> names,
> > > but
> > > > my worry is the ROW id. Presumably I need to encode my row key, "k"
> in
> > > the
> > > > JSON data, in a way that matches how the flatfile_loader.sh script
> did
> > > it.
> > > >
> > > > Can anyone explain how I might convert my Id to the correct format?
> > > > -or-
> > > > Does this matter-can Metron use the human-readable ROW ids?
> > > >
> > > > Charlie Joynt
> > > >
> > > > --------------
> > > > G-RESEARCH believes the information provided herein is reliable.
> While
> > > > every care has been taken to ensure accuracy, the information is
> > > furnished
> > > > to the recipients with no warranty as to the completeness and
> accuracy
> > of
> > > > its contents and on condition that any errors or omissions shall
not
> be
> > > > made the basis of any claim, demand or cause of action.
> > > > The information in this email is intended only for the named
> recipient.
> > > > If you are not the intended recipient please notify us immediately
> and
> > do
> > > > not copy, distribute or take action based on this e-mail.
> > > > All messages sent to and from this e-mail address will be logged by
> > > > G-RESEARCH and are subject to archival storage, monitoring, review
> and
> > > > disclosure.
> > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > > > Trenchant Limited is a company registered in England with company
> > number
> > > > 08127121.
> > > > --------------
> > > >
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
>



-- 
-- 
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

@otto, well, of course we would use the record api... it's great.

@casey, I have actually written a stellar processor, which applies stellar
to all FlowFile attributes outputting the resulting stellar variable space
to either attributes or as json in the content.

Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
since I'm half way there.

Simon



On 5 June 2018 at 08:41, Otto Fowler <ot...@gmail.com> wrote:

> We have jiras about ‘diverting’ and reading from nifi flows already
>
>
> On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
>
> I'd be in strong support of that, Simon. I think we should have some other
> NiFi components in Metron to enable users to interact with our
> infrastructure from NiFi (e.g. being able to transform via stellar, etc).
>
> On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
>
> > Do we, the community, think it would be a good idea to create a
> > PutMetronEnrichment NiFi processor for this use case? It seems a number
> of
> > people want to use NiFi to manage and schedule loading of enrichments for
> > example.
> >
> > Simon
> >
> > On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
> >
> > > The problem, as you correctly diagnosed, is the key in HBase. We
> > construct
> > > the key very specifically in Metron, so it's unlikely to work out of
> the
> > > box with the NiFi processor unfortunately. The key that we use is
> formed
> > > here in the codebase:
> > > https://github.com/cestella/incubator-metron/blob/master/
> > > metron-platform/metron-enrichment/src/main/java/org/
> > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > >
> > > To put that in english, consider the following:
> > >
> > > - type - The enrichment type
> > > - indicator - the indicator to use
> > > - hash(*) - A murmur 3 128bit hash function
> > >
> > > the key is hash(indicator) + type + indicator
> > >
> > > This hash prefixing is a standard practice in hbase key design that
> > allows
> > > the keys to be uniformly distributed among the regions and prevents
> > > hotspotting. Depending on how the PutHBaseJSON processor works, if you
> > can
> > > construct the key and pass it in, then you might be able to either
> > > construct the key in NiFi or write a processor to construct the key.
> > > Ultimately though, what Carolyn said is true..the easiest approach is
> > > probably using the flatfile loader.
> > > If you do get this working in NiFi, however, do please let us know
> and/or
> > > consider contributing it back to the project as a PR :)
> > >
> > >
> > >
> > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > Charles.Joynt@gresearch.co.uk>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I work as a Dev/Ops Data Engineer within the security team at a
> company
> > > in
> > > > London where we are in the process of implementing Metron. I have
> been
> > > > tasked with implementing feeds of network environment data into HBase
> > so
> > > > that this data can be used as enrichment sources for our security
> > events.
> > > > First-off I wanted to pull in DNS data for an internal domain.
> > > >
> > > > I am assuming that I need to write data into HBase in such a way that
> > it
> > > > exactly matches what I would get from the flatfile_loader.sh script.
> A
> > > > colleague of mine has already loaded some DNS data using that script,
> > so
> > > I
> > > > am using that as a reference.
> > > >
> > > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > working,
> > > in
> > > > the sense that data is successfully written to HBase, but despite
> > > (naively)
> > > > specifying "Row Identifier Encoding Strategy = Binary", the results
> in
> > > > HBase don't look correct. Comparing the output from HBase scan
> > commands I
> > > > see:
> > > >
> > > > flatfile_loader.sh produced:
> > > >
> > > > ROW:
> > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > x05whois\x00\x0E192.168.0.198
> > > > CELL: column=data:v, timestamp=1516896203840,
> > > > value={"clientname":"server.domain.local","clientip":"192.
> 168.0.198"}
> > > >
> > > > PutHBaseJSON produced:
> > > >
> > > > ROW: server.domain.local
> > > > CELL: column=dns:v, timestamp=1527778603783,
> > > >
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > > >
> > > > From source JSON:
> > > >
> > > >
> > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > ,"type":"A","data":"192.168.0.198"}}
> > > >
> > > > I know that there are some differences in column family / field
> names,
> > > but
> > > > my worry is the ROW id. Presumably I need to encode my row key, "k"
> in
> > > the
> > > > JSON data, in a way that matches how the flatfile_loader.sh script
> did
> > > it.
> > > >
> > > > Can anyone explain how I might convert my Id to the correct format?
> > > > -or-
> > > > Does this matter-can Metron use the human-readable ROW ids?
> > > >
> > > > Charlie Joynt
> > > >
> > > > --------------
> > > > G-RESEARCH believes the information provided herein is reliable.
> While
> > > > every care has been taken to ensure accuracy, the information is
> > > furnished
> > > > to the recipients with no warranty as to the completeness and
> accuracy
> > of
> > > > its contents and on condition that any errors or omissions shall not
> be
> > > > made the basis of any claim, demand or cause of action.
> > > > The information in this email is intended only for the named
> recipient.
> > > > If you are not the intended recipient please notify us immediately
> and
> > do
> > > > not copy, distribute or take action based on this e-mail.
> > > > All messages sent to and from this e-mail address will be logged by
> > > > G-RESEARCH and are subject to archival storage, monitoring, review
> and
> > > > disclosure.
> > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > > > Trenchant Limited is a company registered in England with company
> > number
> > > > 08127121.
> > > > --------------
> > > >
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
>



-- 
--
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Otto Fowler <ot...@gmail.com>.

We have jiras about ‘diverting’ and reading from nifi flows already


On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:

I'd be in strong support of that, Simon. I think we should have some other
NiFi components in Metron to enable users to interact with our
infrastructure from NiFi (e.g. being able to transform via stellar, etc).

On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> Do we, the community, think it would be a good idea to create a
> PutMetronEnrichment NiFi processor for this use case? It seems a number
of
> people want to use NiFi to manage and schedule loading of enrichments for
> example.
>
> Simon
>
> On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
>
> > The problem, as you correctly diagnosed, is the key in HBase. We
> construct
> > the key very specifically in Metron, so it's unlikely to work out of
the
> > box with the NiFi processor unfortunately. The key that we use is
formed
> > here in the codebase:
> > https://github.com/cestella/incubator-metron/blob/master/
> > metron-platform/metron-enrichment/src/main/java/org/
> > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >
> > To put that in english, consider the following:
> >
> > - type - The enrichment type
> > - indicator - the indicator to use
> > - hash(*) - A murmur 3 128bit hash function
> >
> > the key is hash(indicator) + type + indicator
> >
> > This hash prefixing is a standard practice in hbase key design that
> allows
> > the keys to be uniformly distributed among the regions and prevents
> > hotspotting. Depending on how the PutHBaseJSON processor works, if you
> can
> > construct the key and pass it in, then you might be able to either
> > construct the key in NiFi or write a processor to construct the key.
> > Ultimately though, what Carolyn said is true..the easiest approach is
> > probably using the flatfile loader.
> > If you do get this working in NiFi, however, do please let us know
and/or
> > consider contributing it back to the project as a PR :)
> >
> >
> >
> > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > Charles.Joynt@gresearch.co.uk>
> > wrote:
> >
> > > Hello,
> > >
> > > I work as a Dev/Ops Data Engineer within the security team at a
company
> > in
> > > London where we are in the process of implementing Metron. I have
been
> > > tasked with implementing feeds of network environment data into HBase
> so
> > > that this data can be used as enrichment sources for our security
> events.
> > > First-off I wanted to pull in DNS data for an internal domain.
> > >
> > > I am assuming that I need to write data into HBase in such a way that
> it
> > > exactly matches what I would get from the flatfile_loader.sh script.
A
> > > colleague of mine has already loaded some DNS data using that script,
> so
> > I
> > > am using that as a reference.
> > >
> > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > listener and routes it to a PutHBaseJSON processor. The flow is
> working,
> > in
> > > the sense that data is successfully written to HBase, but despite
> > (naively)
> > > specifying "Row Identifier Encoding Strategy = Binary", the results
in
> > > HBase don't look correct. Comparing the output from HBase scan
> commands I
> > > see:
> > >
> > > flatfile_loader.sh produced:
> > >
> > > ROW:
> > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > x05whois\x00\x0E192.168.0.198
> > > CELL: column=data:v, timestamp=1516896203840,
> > > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> > >
> > > PutHBaseJSON produced:
> > >
> > > ROW: server.domain.local
> > > CELL: column=dns:v, timestamp=1527778603783,
> > >
value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > >
> > > From source JSON:
> > >
> > >
> > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > ,"type":"A","data":"192.168.0.198"}}
> > >
> > > I know that there are some differences in column family / field
names,
> > but
> > > my worry is the ROW id. Presumably I need to encode my row key, "k"
in
> > the
> > > JSON data, in a way that matches how the flatfile_loader.sh script
did
> > it.
> > >
> > > Can anyone explain how I might convert my Id to the correct format?
> > > -or-
> > > Does this matter-can Metron use the human-readable ROW ids?
> > >
> > > Charlie Joynt
> > >
> > > --------------
> > > G-RESEARCH believes the information provided herein is reliable.
While
> > > every care has been taken to ensure accuracy, the information is
> > furnished
> > > to the recipients with no warranty as to the completeness and
accuracy
> of
> > > its contents and on condition that any errors or omissions shall not
be
> > > made the basis of any claim, demand or cause of action.
> > > The information in this email is intended only for the named
recipient.
> > > If you are not the intended recipient please notify us immediately
and
> do
> > > not copy, distribute or take action based on this e-mail.
> > > All messages sent to and from this e-mail address will be logged by
> > > G-RESEARCH and are subject to archival storage, monitoring, review
and
> > > disclosure.
> > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > > Trenchant Limited is a company registered in England with company
> number
> > > 08127121.
> > > --------------
> > >
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Casey Stella <ce...@gmail.com>.

I'd be in strong support of that, Simon.  I think we should have some other
NiFi components in Metron to enable users to interact with our
infrastructure from NiFi (e.g. being able to transform via stellar, etc).

On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> Do we, the community, think it would be a good idea to create a
> PutMetronEnrichment NiFi processor for this use case? It seems a number of
> people want to use NiFi to manage and schedule loading of enrichments for
> example.
>
> Simon
>
> On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:
>
> > The problem, as you correctly diagnosed, is the key in HBase.  We
> construct
> > the key very specifically in Metron, so it's unlikely to work out of the
> > box with the NiFi processor unfortunately.  The key that we use is formed
> > here in the codebase:
> > https://github.com/cestella/incubator-metron/blob/master/
> > metron-platform/metron-enrichment/src/main/java/org/
> > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >
> > To put that in english, consider the following:
> >
> >    - type - The enrichment type
> >    - indicator - the indicator to use
> >    - hash(*) - A murmur 3 128bit hash function
> >
> > the key is hash(indicator) + type + indicator
> >
> > This hash prefixing is a standard practice in hbase key design that
> allows
> > the keys to be uniformly distributed among the regions and prevents
> > hotspotting.  Depending on how the PutHBaseJSON processor works, if you
> can
> > construct the key and pass it in, then you might be able to either
> > construct the key in NiFi or write a processor to construct the key.
> > Ultimately though, what Carolyn said is true..the easiest approach is
> > probably using the flatfile loader.
> > If you do get this working in NiFi, however, do please let us know and/or
> > consider contributing it back to the project as a PR :)
> >
> >
> >
> > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > Charles.Joynt@gresearch.co.uk>
> > wrote:
> >
> > > Hello,
> > >
> > > I work as a Dev/Ops Data Engineer within the security team at a company
> > in
> > > London where we are in the process of implementing Metron. I have been
> > > tasked with implementing feeds of network environment data into HBase
> so
> > > that this data can be used as enrichment sources for our security
> events.
> > > First-off I wanted to pull in DNS data for an internal domain.
> > >
> > > I am assuming that I need to write data into HBase in such a way that
> it
> > > exactly matches what I would get from the flatfile_loader.sh script. A
> > > colleague of mine has already loaded some DNS data using that script,
> so
> > I
> > > am using that as a reference.
> > >
> > > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > > listener and routes it to a PutHBaseJSON processor. The flow is
> working,
> > in
> > > the sense that data is successfully written to HBase, but despite
> > (naively)
> > > specifying "Row Identifier Encoding Strategy = Binary", the results in
> > > HBase don't look correct. Comparing the output from HBase scan
> commands I
> > > see:
> > >
> > > flatfile_loader.sh produced:
> > >
> > > ROW:
> > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > x05whois\x00\x0E192.168.0.198
> > > CELL: column=data:v, timestamp=1516896203840,
> > > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> > >
> > > PutHBaseJSON produced:
> > >
> > > ROW:  server.domain.local
> > > CELL: column=dns:v, timestamp=1527778603783,
> > > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > >
> > > From source JSON:
> > >
> > >
> > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > ,"type":"A","data":"192.168.0.198"}}
> > >
> > > I know that there are some differences in column family / field names,
> > but
> > > my worry is the ROW id. Presumably I need to encode my row key, "k" in
> > the
> > > JSON data, in a way that matches how the flatfile_loader.sh script did
> > it.
> > >
> > > Can anyone explain how I might convert my Id to the correct format?
> > > -or-
> > > Does this matter-can Metron use the human-readable ROW ids?
> > >
> > > Charlie Joynt
> > >
> > > --------------
> > > G-RESEARCH believes the information provided herein is reliable. While
> > > every care has been taken to ensure accuracy, the information is
> > furnished
> > > to the recipients with no warranty as to the completeness and accuracy
> of
> > > its contents and on condition that any errors or omissions shall not be
> > > made the basis of any claim, demand or cause of action.
> > > The information in this email is intended only for the named recipient.
> > > If you are not the intended recipient please notify us immediately and
> do
> > > not copy, distribute or take action based on this e-mail.
> > > All messages sent to and from this e-mail address will be logged by
> > > G-RESEARCH and are subject to archival storage, monitoring, review and
> > > disclosure.
> > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > > Trenchant Limited is a company registered in England with company
> number
> > > 08127121.
> > > --------------
> > >
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Do we, the community, think it would be a good idea to create a
PutMetronEnrichment NiFi processor for this use case? It seems a number of
people want to use NiFi to manage and schedule loading of enrichments for
example.

Simon

On 5 June 2018 at 06:56, Casey Stella <ce...@gmail.com> wrote:

> The problem, as you correctly diagnosed, is the key in HBase.  We construct
> the key very specifically in Metron, so it's unlikely to work out of the
> box with the NiFi processor unfortunately.  The key that we use is formed
> here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
>    - type - The enrichment type
>    - indicator - the indicator to use
>    - hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> Charles.Joynt@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a company
> in
> > London where we are in the process of implementing Metron. I have been
> > tasked with implementing feeds of network environment data into HBase so
> > that this data can be used as enrichment sources for our security events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that it
> > exactly matches what I would get from the flatfile_loader.sh script. A
> > colleague of mine has already loaded some DNS data using that script, so
> I
> > am using that as a reference.
> >
> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > listener and routes it to a PutHBaseJSON processor. The flow is working,
> in
> > the sense that data is successfully written to HBase, but despite
> (naively)
> > specifying "Row Identifier Encoding Strategy = Binary", the results in
> > HBase don't look correct. Comparing the output from HBase scan commands I
> > see:
> >
> > flatfile_loader.sh produced:
> >
> > ROW:
> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> x05whois\x00\x0E192.168.0.198
> > CELL: column=data:v, timestamp=1516896203840,
> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> >
> > PutHBaseJSON produced:
> >
> > ROW:  server.domain.local
> > CELL: column=dns:v, timestamp=1527778603783,
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > From source JSON:
> >
> >
> > {"k":"server.domain.local","v":{"name":"server.domain.local"
> ,"type":"A","data":"192.168.0.198"}}
> >
> > I know that there are some differences in column family / field names,
> but
> > my worry is the ROW id. Presumably I need to encode my row key, "k" in
> the
> > JSON data, in a way that matches how the flatfile_loader.sh script did
> it.
> >
> > Can anyone explain how I might convert my Id to the correct format?
> > -or-
> > Does this matter-can Metron use the human-readable ROW ids?
> >
> > Charlie Joynt
> >
> > --------------
> > G-RESEARCH believes the information provided herein is reliable. While
> > every care has been taken to ensure accuracy, the information is
> furnished
> > to the recipients with no warranty as to the completeness and accuracy of
> > its contents and on condition that any errors or omissions shall not be
> > made the basis of any claim, demand or cause of action.
> > The information in this email is intended only for the named recipient.
> > If you are not the intended recipient please notify us immediately and do
> > not copy, distribute or take action based on this e-mail.
> > All messages sent to and from this e-mail address will be logged by
> > G-RESEARCH and are subject to archival storage, monitoring, review and
> > disclosure.
> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > Trenchant Limited is a company registered in England with company number
> > 08127121.
> > --------------
> >
>



-- 
--
simon elliston ball
@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Posted by Casey Stella <ce...@gmail.com>.

The problem, as you correctly diagnosed, is the key in HBase.  We construct
the key very specifically in Metron, so it's unlikely to work out of the
box with the NiFi processor unfortunately.  The key that we use is formed
here in the codebase:
https://github.com/cestella/incubator-metron/blob/master/metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/converter/EnrichmentKey.java#L51

To put that in english, consider the following:

   - type - The enrichment type
   - indicator - the indicator to use
   - hash(*) - A murmur 3 128bit hash function

the key is hash(indicator) + type + indicator

This hash prefixing is a standard practice in hbase key design that allows
the keys to be uniformly distributed among the regions and prevents
hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
construct the key and pass it in, then you might be able to either
construct the key in NiFi or write a processor to construct the key.
Ultimately though, what Carolyn said is true..the easiest approach is
probably using the flatfile loader.
If you do get this working in NiFi, however, do please let us know and/or
consider contributing it back to the project as a PR :)



On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <Ch...@gresearch.co.uk>
wrote:

> Hello,
>
> I work as a Dev/Ops Data Engineer within the security team at a company in
> London where we are in the process of implementing Metron. I have been
> tasked with implementing feeds of network environment data into HBase so
> that this data can be used as enrichment sources for our security events.
> First-off I wanted to pull in DNS data for an internal domain.
>
> I am assuming that I need to write data into HBase in such a way that it
> exactly matches what I would get from the flatfile_loader.sh script. A
> colleague of mine has already loaded some DNS data using that script, so I
> am using that as a reference.
>
> I have implemented a flow in NiFi which takes JSON data from a HTTP
> listener and routes it to a PutHBaseJSON processor. The flow is working, in
> the sense that data is successfully written to HBase, but despite (naively)
> specifying "Row Identifier Encoding Strategy = Binary", the results in
> HBase don't look correct. Comparing the output from HBase scan commands I
> see:
>
> flatfile_loader.sh produced:
>
> ROW:
> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\x0E192.168.0.198
> CELL: column=data:v, timestamp=1516896203840,
> value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>
> PutHBaseJSON produced:
>
> ROW:  server.domain.local
> CELL: column=dns:v, timestamp=1527778603783,
> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
> From source JSON:
>
>
> {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A","data":"192.168.0.198"}}
>
> I know that there are some differences in column family / field names, but
> my worry is the ROW id. Presumably I need to encode my row key, "k" in the
> JSON data, in a way that matches how the flatfile_loader.sh script did it.
>
> Can anyone explain how I might convert my Id to the correct format?
> -or-
> Does this matter-can Metron use the human-readable ROW ids?
>
> Charlie Joynt
>
> --------------
> G-RESEARCH believes the information provided herein is reliable. While
> every care has been taken to ensure accuracy, the information is furnished
> to the recipients with no warranty as to the completeness and accuracy of
> its contents and on condition that any errors or omissions shall not be
> made the basis of any claim, demand or cause of action.
> The information in this email is intended only for the named recipient.
> If you are not the intended recipient please notify us immediately and do
> not copy, distribute or take action based on this e-mail.
> All messages sent to and from this e-mail address will be logged by
> G-RESEARCH and are subject to archival storage, monitoring, review and
> disclosure.
> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Trenchant Limited is a company registered in England with company number
> 08127121.
> --------------
>