You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by Ali Nazemian <al...@gmail.com> on 2017/05/24 14:08:33 UTC

[Discuss] Cyber Security Asset Management for Metron

Hi all,

We are going to design and develop an asset database for Metron. For this
purpose, I have been thinking of a graph schema model to map assets as
Nodes and provide relations as Edges. This can be extended to event level
to have a particular relation to assets as well as an event to event
relation. Regarding technology, I was thinking of using Titan Graph
Database (probably JanusGraph) and using HBase and Elasticsearch/Solr as
backends. However, there might be a performance issue regarding this
decision if we want to use lots of Composite Indices. The problem we will
be facing would be the fact that Titan creates separate column family for
each Composite Index which HBase is not very good for it. Basically, it
would be better to use Cassandra for this purpose.

I would like to understand what work have been done already regarding this
problem and what the roadmap will be, so I can make sure we will follow the
same strategy.

Regards,
Ali

Re: [Discuss] Cyber Security Asset Management for Metron

Posted by Casey Stella <ce...@gmail.com>.
I definitely sympathize with the desire to have a graph database part of
the architecture, but I concur with Ali; the reputations for scalable graph
databases aren't the best.  I have resisted in pushing it so far because of
the concern about stability of an implementation.  I think we should tread
very carefully and really consider carefully if we need a full graph
database and whether the usecases justify introducing something that has
very unknown stability and performance.


On Wed, May 24, 2017 at 11:05 PM, Ali Nazemian <al...@gmail.com>
wrote:

> Agreed on having a separate discussion/proposal. Having a graph database
> from the design perspective is one thing and having a stable and
> high-performance implementation of it is another thing. I have used
> different graph databases for multiple projects so far. It is very good on
> paper, but we should be careful about the implementation.
>
> The good point about using Titan for this purpose is it comes with a native
> ThinkerPop implementation that will be helpful in OLAP using Spark directly
> that we can use them out of the box. However, there were lots of issues
> regarding the stability of Titan (we were working on making that stable for
> 8 months!). I am not sure they have been fixed or not as a part of
> JanusGraph. I know Atlas team members are involved in JanusGraph
> development. The fact that they are using HBase as a backend would also be
> helpful, so we may need to share the conversation with them and use some of
> their experiences.
>
> Anyway, I was wondering anybody has done anything regarding this or not so
> I need to be aligned with that work and avoid any re-work.
>
> Cheers,
> Ali
>
> On Thu, May 25, 2017 at 4:21 AM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > We should have a discussion or a proposal on what should go in the graph
> > vs. what should go
> > in other stores.
> >
> >
> > On May 24, 2017 at 14:09:59, Zeolla@GMail.com (zeolla@gmail.com) wrote:
> >
> > I would be very interested in a graph db that could leverage the
> > ip_src_addr and ip_dst_addr fields in a broad sense (who is talking to
> who,
> > visualize top talkers, etc.). In order to be very useful it would need to
> > have the ability to apply filters (IPs, ports, connection durations,
> bytes
> > transferred, etc.) and to narrow down certain time-based windows. I
> > probably have an environment where I could test this at semi-scale (a
> > couple billion messages per day) and flesh out some of the performance
> > concerns if this turns into something. Even if it was very early in
> > development, as I frequently rebuild that environment from scratch for
> > testing things.
> >
> > Jon
> >
> > On Wed, May 24, 2017 at 12:46 PM Nick Allen <ni...@nickallen.org> wrote:
> >
> > > I think the addition of a graph capability would be very powerful. I
> know
> > > many who would love the idea, but I know of no implementations that
> have
> > > occurred.
> > >
> > > It might be good to discuss in the community specific use cases that
> > would
> > > be enabled by a graph database. That might help to flesh out the
> > technical
> > > aspects of it.
> > >
> > >
> > >
> > >
> > >
> > > On Wed, May 24, 2017 at 10:08 AM, Ali Nazemian <al...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > We are going to design and develop an asset database for Metron. For
> > this
> > > > purpose, I have been thinking of a graph schema model to map assets
> as
> > > > Nodes and provide relations as Edges. This can be extended to event
> > level
> > > > to have a particular relation to assets as well as an event to event
> > > > relation. Regarding technology, I was thinking of using Titan Graph
> > > > Database (probably JanusGraph) and using HBase and Elasticsearch/Solr
> > as
> > > > backends. However, there might be a performance issue regarding this
> > > > decision if we want to use lots of Composite Indices. The problem we
> > will
> > > > be facing would be the fact that Titan creates separate column family
> > for
> > > > each Composite Index which HBase is not very good for it. Basically,
> it
> > > > would be better to use Cassandra for this purpose.
> > > >
> > > > I would like to understand what work have been done already regarding
> > > this
> > > > problem and what the roadmap will be, so I can make sure we will
> follow
> > > the
> > > > same strategy.
> > > >
> > > > Regards,
> > > > Ali
> > > >
> > >
> > --
> >
> > Jon
> >
>
>
>
> --
> A.Nazemian
>

Re: [Discuss] Cyber Security Asset Management for Metron

Posted by Ali Nazemian <al...@gmail.com>.
Agreed on having a separate discussion/proposal. Having a graph database
from the design perspective is one thing and having a stable and
high-performance implementation of it is another thing. I have used
different graph databases for multiple projects so far. It is very good on
paper, but we should be careful about the implementation.

The good point about using Titan for this purpose is it comes with a native
ThinkerPop implementation that will be helpful in OLAP using Spark directly
that we can use them out of the box. However, there were lots of issues
regarding the stability of Titan (we were working on making that stable for
8 months!). I am not sure they have been fixed or not as a part of
JanusGraph. I know Atlas team members are involved in JanusGraph
development. The fact that they are using HBase as a backend would also be
helpful, so we may need to share the conversation with them and use some of
their experiences.

Anyway, I was wondering anybody has done anything regarding this or not so
I need to be aligned with that work and avoid any re-work.

Cheers,
Ali

On Thu, May 25, 2017 at 4:21 AM, Otto Fowler <ot...@gmail.com>
wrote:

> We should have a discussion or a proposal on what should go in the graph
> vs. what should go
> in other stores.
>
>
> On May 24, 2017 at 14:09:59, Zeolla@GMail.com (zeolla@gmail.com) wrote:
>
> I would be very interested in a graph db that could leverage the
> ip_src_addr and ip_dst_addr fields in a broad sense (who is talking to who,
> visualize top talkers, etc.). In order to be very useful it would need to
> have the ability to apply filters (IPs, ports, connection durations, bytes
> transferred, etc.) and to narrow down certain time-based windows. I
> probably have an environment where I could test this at semi-scale (a
> couple billion messages per day) and flesh out some of the performance
> concerns if this turns into something. Even if it was very early in
> development, as I frequently rebuild that environment from scratch for
> testing things.
>
> Jon
>
> On Wed, May 24, 2017 at 12:46 PM Nick Allen <ni...@nickallen.org> wrote:
>
> > I think the addition of a graph capability would be very powerful. I know
> > many who would love the idea, but I know of no implementations that have
> > occurred.
> >
> > It might be good to discuss in the community specific use cases that
> would
> > be enabled by a graph database. That might help to flesh out the
> technical
> > aspects of it.
> >
> >
> >
> >
> >
> > On Wed, May 24, 2017 at 10:08 AM, Ali Nazemian <al...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > We are going to design and develop an asset database for Metron. For
> this
> > > purpose, I have been thinking of a graph schema model to map assets as
> > > Nodes and provide relations as Edges. This can be extended to event
> level
> > > to have a particular relation to assets as well as an event to event
> > > relation. Regarding technology, I was thinking of using Titan Graph
> > > Database (probably JanusGraph) and using HBase and Elasticsearch/Solr
> as
> > > backends. However, there might be a performance issue regarding this
> > > decision if we want to use lots of Composite Indices. The problem we
> will
> > > be facing would be the fact that Titan creates separate column family
> for
> > > each Composite Index which HBase is not very good for it. Basically, it
> > > would be better to use Cassandra for this purpose.
> > >
> > > I would like to understand what work have been done already regarding
> > this
> > > problem and what the roadmap will be, so I can make sure we will follow
> > the
> > > same strategy.
> > >
> > > Regards,
> > > Ali
> > >
> >
> --
>
> Jon
>



-- 
A.Nazemian

Re: [Discuss] Cyber Security Asset Management for Metron

Posted by Otto Fowler <ot...@gmail.com>.
We should have a discussion or a proposal on what should go in the graph
vs. what should go
in other stores.


On May 24, 2017 at 14:09:59, Zeolla@GMail.com (zeolla@gmail.com) wrote:

I would be very interested in a graph db that could leverage the
ip_src_addr and ip_dst_addr fields in a broad sense (who is talking to who,
visualize top talkers, etc.). In order to be very useful it would need to
have the ability to apply filters (IPs, ports, connection durations, bytes
transferred, etc.) and to narrow down certain time-based windows. I
probably have an environment where I could test this at semi-scale (a
couple billion messages per day) and flesh out some of the performance
concerns if this turns into something. Even if it was very early in
development, as I frequently rebuild that environment from scratch for
testing things.

Jon

On Wed, May 24, 2017 at 12:46 PM Nick Allen <ni...@nickallen.org> wrote:

> I think the addition of a graph capability would be very powerful. I know
> many who would love the idea, but I know of no implementations that have
> occurred.
>
> It might be good to discuss in the community specific use cases that
would
> be enabled by a graph database. That might help to flesh out the
technical
> aspects of it.
>
>
>
>
>
> On Wed, May 24, 2017 at 10:08 AM, Ali Nazemian <al...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > We are going to design and develop an asset database for Metron. For
this
> > purpose, I have been thinking of a graph schema model to map assets as
> > Nodes and provide relations as Edges. This can be extended to event
level
> > to have a particular relation to assets as well as an event to event
> > relation. Regarding technology, I was thinking of using Titan Graph
> > Database (probably JanusGraph) and using HBase and Elasticsearch/Solr
as
> > backends. However, there might be a performance issue regarding this
> > decision if we want to use lots of Composite Indices. The problem we
will
> > be facing would be the fact that Titan creates separate column family
for
> > each Composite Index which HBase is not very good for it. Basically, it
> > would be better to use Cassandra for this purpose.
> >
> > I would like to understand what work have been done already regarding
> this
> > problem and what the roadmap will be, so I can make sure we will follow
> the
> > same strategy.
> >
> > Regards,
> > Ali
> >
>
-- 

Jon

Re: [Discuss] Cyber Security Asset Management for Metron

Posted by "Zeolla@GMail.com" <ze...@gmail.com>.
I would be very interested in a graph db that could leverage the
ip_src_addr and ip_dst_addr fields in a broad sense (who is talking to who,
visualize top talkers, etc.).  In order to be very useful it would need to
have the ability to apply filters (IPs, ports, connection durations, bytes
transferred, etc.) and to narrow down certain time-based windows.  I
probably have an environment where I could test this at semi-scale (a
couple billion messages per day) and flesh out some of the performance
concerns if this turns into something.  Even if it was very early in
development, as I frequently rebuild that environment from scratch for
testing things.

Jon

On Wed, May 24, 2017 at 12:46 PM Nick Allen <ni...@nickallen.org> wrote:

> I think the addition of a graph capability would be very powerful.  I know
> many who would love the idea, but I know of no implementations that have
> occurred.
>
> It might be good to discuss in the community specific use cases that would
> be enabled by a graph database.  That might help to flesh out the technical
> aspects of it.
>
>
>
>
>
> On Wed, May 24, 2017 at 10:08 AM, Ali Nazemian <al...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > We are going to design and develop an asset database for Metron. For this
> > purpose, I have been thinking of a graph schema model to map assets as
> > Nodes and provide relations as Edges. This can be extended to event level
> > to have a particular relation to assets as well as an event to event
> > relation. Regarding technology, I was thinking of using Titan Graph
> > Database (probably JanusGraph) and using HBase and Elasticsearch/Solr as
> > backends. However, there might be a performance issue regarding this
> > decision if we want to use lots of Composite Indices. The problem we will
> > be facing would be the fact that Titan creates separate column family for
> > each Composite Index which HBase is not very good for it. Basically, it
> > would be better to use Cassandra for this purpose.
> >
> > I would like to understand what work have been done already regarding
> this
> > problem and what the roadmap will be, so I can make sure we will follow
> the
> > same strategy.
> >
> > Regards,
> > Ali
> >
>
-- 

Jon

Re: [Discuss] Cyber Security Asset Management for Metron

Posted by Nick Allen <ni...@nickallen.org>.
I think the addition of a graph capability would be very powerful.  I know
many who would love the idea, but I know of no implementations that have
occurred.

It might be good to discuss in the community specific use cases that would
be enabled by a graph database.  That might help to flesh out the technical
aspects of it.





On Wed, May 24, 2017 at 10:08 AM, Ali Nazemian <al...@gmail.com>
wrote:

> Hi all,
>
> We are going to design and develop an asset database for Metron. For this
> purpose, I have been thinking of a graph schema model to map assets as
> Nodes and provide relations as Edges. This can be extended to event level
> to have a particular relation to assets as well as an event to event
> relation. Regarding technology, I was thinking of using Titan Graph
> Database (probably JanusGraph) and using HBase and Elasticsearch/Solr as
> backends. However, there might be a performance issue regarding this
> decision if we want to use lots of Composite Indices. The problem we will
> be facing would be the fact that Titan creates separate column family for
> each Composite Index which HBase is not very good for it. Basically, it
> would be better to use Cassandra for this purpose.
>
> I would like to understand what work have been done already regarding this
> problem and what the roadmap will be, so I can make sure we will follow the
> same strategy.
>
> Regards,
> Ali
>