You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by Ryan Merriman <me...@gmail.com> on 2018/02/01 14:11:21 UTC

[DISCUSS] Persistence store for user profile settings

There is currently a PR up for review that allows a user to configure and
save the list of facet fields that appear in the left column of the Alerts
UI:  https://github.com/apache/metron/pull/853.  The REST layer has ORM
support which means we can store those in a relational database.

However I'm not 100% sure this is the best place to keep this.  As we add
more use cases like this the backing tables in the RDBMS will need to be
managed.  This could make upgrading more tedious and error-prone.  Is there
are a better way to store this, assuming we can leverage a component that's
already included in our stack?

Ryan

Re: [DISCUSS] Persistence store for user profile settings

Posted by Michael Miklavcic <mi...@gmail.com>.
I'm also good with HBase.

On Fri, Feb 9, 2018 at 2:14 PM, Nick Allen <ni...@nickallen.org> wrote:

> +1 I think going with HBase is a good approach for now.  Thanks for laying
> out the pros and cons.
>
> On Fri, Feb 9, 2018 at 3:46 PM, Ryan Merriman <me...@gmail.com> wrote:
>
> > I would like to bring this discussion to a conclusion and update the PR
> > accordingly.  To clarify on whether we depend on an RDBMS right now, we
> do
> > but only for authentication which will probably be replaced at some
> point.
> > So the answer is not really.  I personally agree with Simon and think we
> > should use HBase because this use case fits the data model and it's
> already
> > in our stack.  I would add that with HBase we can move the schema
> evolution
> > complexity to the application layer and hide it from the user.  This will
> > make upgrades easier which is my main point of contention.  I also agree
> > with Nick in that I do think there may be a place for a RDBMS in the
> future
> > but we can always add it back.
> >
> > The 2 choices seems to be either an RDBMS or HBase.  Here is a summary
> > based on comments in this discussion:
> >
> > RDBMS
> > - some are not too worried about schema evolution as the data model will
> > likely be simple
> > - avoiding having to alter tables when upgrading would be ideal
> > - works with ORM tools
> > - is flexible and could be useful for future use cases
> >
> > HBase
> > - might involve boilerplate code if not covered elsewhere in Metron
> > - key/value is good enough for user profile settings
> > - data replication for free
> >
> > Reading over this thread again I get the impression there is a slight
> > preference for HBase.  Want to give people one more change to chime in or
> > argue the other solution.  Let me know if I missed anything or didn't
> > include someone's argument.
> >
> >
> >
> > On Fri, Feb 2, 2018 at 8:24 AM, Nick Allen <ni...@nickallen.org> wrote:
> >
> > > > Glad you agree with me that this isn’t HBase scale… it’s clearly
> not. I
> > > would never suggest introducing HBase for something like this, but
> since
> > > it’s there.
> > >
> > > Ah, gotcha.  Misunderstood your statement.
> > >
> > >
> > >
> > > On Fri, Feb 2, 2018 at 9:01 AM Simon Elliston Ball <
> > > simon@simonellistonball.com> wrote:
> > >
> > > > Glad you agree with me that this isn’t HBase scale… it’s clearly
> not. I
> > > > would never suggest introducing HBase for something like this, but
> > since
> > > > it’s there.
> > > >
> > > > On the idea of using the Ambari RDBMS for the same basis of it being
> > > > there, I see your point. That said, it can be postgres, sql server,
> > > mysql,
> > > > maria, oracle… various. Yes we have an ORM, but those are not nearly
> as
> > > > magic as they claim, and upgrade / schema evolution of an RDBMS often
> > > > involves some sort of platform dependent SQL migration in my
> > experience.
> > > I
> > > > would suggest that supporting that range of options is not a good
> idea
> > > for
> > > > us. The Ambari project also pretty much reserve the right to blow
> away
> > > that
> > > > infrastructure in upgrades (which is fair enough). So relying on
> there
> > > > being an RDBMS owned by another component is not something I would
> > > > necessarily say was a clean choice.
> > > >
> > > > Simon
> > > >
> > > > > On 2 Feb 2018, at 13:50, Nick Allen <ni...@nickallen.org> wrote:
> > > > >
> > > > > I fall marginally on the side of an RDBMS.  There is definitely a
> > case
> > > to
> > > > > be made on both sides, but I'll point out a few things for the
> RDBMS.
> > > > >
> > > > >
> > > > > (1) Flexibility.  Using an RDBMS is going to provide us with much
> > > greater
> > > > > flexibility going forward.  We really don't know what the specific
> > use
> > > > > cases will be, but I am willing to bet they are user-focused
> > > > (preferences,
> > > > > etc).  The type of use cases that most web applications use an
> RDBMS
> > > for.
> > > > >
> > > > >
> > > > >> If anything I would like to see the current RDBMS dependency come
> > > out...
> > > > >
> > > > > (2) Don't we already have an RDBMS requirement for Ambari?  That's
> a
> > > > > dependency that we do not control.
> > > > >
> > > > >
> > > > >> ... hbase seems a good option (because we already have it there,
> it
> > > > would
> > > > > be kinda crazy at this scale if we didn’t already have it)
> > > > >
> > > > > (3) In this scenario, the RDBMS would not scale proportionally with
> > the
> > > > > amount of telemetry, it would scale based on usage; primarily the
> > > number
> > > > of
> > > > > users.  This is not "big data" scale.  I don't think we can make
> the
> > > case
> > > > > for HBase based on scale here.
> > > > >
> > > > >
> > > > >> We would also end up with, as Mike points out, a whole new disk
> > > > > deployment patterns and a bunch of additional DBA ops process
> > > > requirements
> > > > > for every install.
> > > > >
> > > > > (4) Most users that need HA/DR (and other 'advanced stuff'), are
> > > > > enterprises and organizations that are already very familiar with
> > RDBMS
> > > > > solutions and have the infrastructure in place to manage those.
> For
> > > > users
> > > > > that don't need HA/DR, just use the DB that gets spun-up with
> Ambari.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
> > > > > simon@simonellistonball.com> wrote:
> > > > >
> > > > >> Introducing a RDBMS to the stack seems unnecessary for this.
> > > > >>
> > > > >> If we consider the data access patterns for user profiles, we are
> > > > unlikely
> > > > >> to query into them, or indeed do anything other than look them up,
> > or
> > > > write
> > > > >> them out by a username key. To that end, using an ORM to translate
> > a a
> > > > >> nested config object into a load of tables seems to introduce
> > > complexity
> > > > >> and brittleness we then have to take away through relying on
> > > relational
> > > > >> consistency models. We would also end up with, as Mike points
> out, a
> > > > whole
> > > > >> new disk deployment patterns and a bunch of additional DBA ops
> > process
> > > > >> requirements for every install.
> > > > >>
> > > > >> Since the access pattern is almost entirely key => value, hbase
> > seems
> > > a
> > > > >> good option (because we already have it there, it would be kinda
> > crazy
> > > > at
> > > > >> this scale if we didn’t already have it) or arguably zookeeper,
> but
> > > that
> > > > >> might be at the other end of the scale argument. I’d even go as
> far
> > as
> > > > to
> > > > >> suggest files on HDFS to keep it simple.
> > > > >>
> > > > >> Simon
> > > > >>
> > > > >>> On 1 Feb 2018, at 23:24, Michael Miklavcic <
> > > > michael.miklavcic@gmail.com>
> > > > >> wrote:
> > > > >>>
> > > > >>> Personally, I'd be in favor of something like Maria DB as an open
> > > > source
> > > > >>> repo. Or any other ansi sql store. On the positive side, it
> should
> > > mesh
> > > > >>> seamlessly with ORM tools. And the schema for this should be
> pretty
> > > > >>> vanilla, I'd imagine. I might even consider skipping ORM for
> > straight
> > > > >> JDBC
> > > > >>> and simple command scripts in Java for something this small. I'm
> > not
> > > > >>> worried so much about migrations of this sort. Large scale DBs
> can
> > > get
> > > > >>> involved with major schema changes, but thats usually when the
> > > > datastore
> > > > >> is
> > > > >>> a massive set of tables with complex relationships, at least in
> my
> > > > >>> experience.
> > > > >>>
> > > > >>> We could also use hbase, which probably wouldn't be that hard
> > either,
> > > > but
> > > > >>> there may be more boilerplate to write for the client as compared
> > to
> > > > >>> standard SQL. But I'm assuming we could reuse a fair amount of
> > > existing
> > > > >>> code from our enrichments. One additional reason in favor of
> hbase
> > > > might
> > > > >> be
> > > > >>> data replication. For a SQL instance we'd probably recommend a
> RAID
> > > > store
> > > > >>> or backup procedure, but we get that pretty easy with hbase too.
> > > > >>>
> > > > >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com>
> wrote:
> > > > >>>
> > > > >>>> So, I'll answer your question with some questions:
> > > > >>>>
> > > > >>>>  - No matter the data store we use upgrading will take some
> care,
> > > > >> right?
> > > > >>>>  - Do we currently depend on a RDBMS anywhere?  I want to say
> that
> > > we
> > > > >> do
> > > > >>>>  in the REST layer already, right?
> > > > >>>>  - If we don't use a RDBMs, what's the other option?  What are
> the
> > > > pros
> > > > >>>>  and cons?
> > > > >>>>  - Have we considered non-server offline persistent solutions
> > (e.g.
> > > > >>>>  https://www.html5rocks.com/en/features/storage)?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <
> > merrimanr@gmail.com>
> > > > >> wrote:
> > > > >>>>
> > > > >>>>> There is currently a PR up for review that allows a user to
> > > configure
> > > > >> and
> > > > >>>>> save the list of facet fields that appear in the left column of
> > the
> > > > >>>> Alerts
> > > > >>>>> UI:  https://github.com/apache/metron/pull/853.  The REST
> layer
> > > has
> > > > >> ORM
> > > > >>>>> support which means we can store those in a relational
> database.
> > > > >>>>>
> > > > >>>>> However I'm not 100% sure this is the best place to keep this.
> > As
> > > we
> > > > >> add
> > > > >>>>> more use cases like this the backing tables in the RDBMS will
> > need
> > > to
> > > > >> be
> > > > >>>>> managed.  This could make upgrading more tedious and
> error-prone.
> > > Is
> > > > >>>> there
> > > > >>>>> are a better way to store this, assuming we can leverage a
> > > component
> > > > >>>> that's
> > > > >>>>> already included in our stack?
> > > > >>>>>
> > > > >>>>> Ryan
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Persistence store for user profile settings

Posted by Nick Allen <ni...@nickallen.org>.
+1 I think going with HBase is a good approach for now.  Thanks for laying
out the pros and cons.

On Fri, Feb 9, 2018 at 3:46 PM, Ryan Merriman <me...@gmail.com> wrote:

> I would like to bring this discussion to a conclusion and update the PR
> accordingly.  To clarify on whether we depend on an RDBMS right now, we do
> but only for authentication which will probably be replaced at some point.
> So the answer is not really.  I personally agree with Simon and think we
> should use HBase because this use case fits the data model and it's already
> in our stack.  I would add that with HBase we can move the schema evolution
> complexity to the application layer and hide it from the user.  This will
> make upgrades easier which is my main point of contention.  I also agree
> with Nick in that I do think there may be a place for a RDBMS in the future
> but we can always add it back.
>
> The 2 choices seems to be either an RDBMS or HBase.  Here is a summary
> based on comments in this discussion:
>
> RDBMS
> - some are not too worried about schema evolution as the data model will
> likely be simple
> - avoiding having to alter tables when upgrading would be ideal
> - works with ORM tools
> - is flexible and could be useful for future use cases
>
> HBase
> - might involve boilerplate code if not covered elsewhere in Metron
> - key/value is good enough for user profile settings
> - data replication for free
>
> Reading over this thread again I get the impression there is a slight
> preference for HBase.  Want to give people one more change to chime in or
> argue the other solution.  Let me know if I missed anything or didn't
> include someone's argument.
>
>
>
> On Fri, Feb 2, 2018 at 8:24 AM, Nick Allen <ni...@nickallen.org> wrote:
>
> > > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I
> > would never suggest introducing HBase for something like this, but since
> > it’s there.
> >
> > Ah, gotcha.  Misunderstood your statement.
> >
> >
> >
> > On Fri, Feb 2, 2018 at 9:01 AM Simon Elliston Ball <
> > simon@simonellistonball.com> wrote:
> >
> > > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I
> > > would never suggest introducing HBase for something like this, but
> since
> > > it’s there.
> > >
> > > On the idea of using the Ambari RDBMS for the same basis of it being
> > > there, I see your point. That said, it can be postgres, sql server,
> > mysql,
> > > maria, oracle… various. Yes we have an ORM, but those are not nearly as
> > > magic as they claim, and upgrade / schema evolution of an RDBMS often
> > > involves some sort of platform dependent SQL migration in my
> experience.
> > I
> > > would suggest that supporting that range of options is not a good idea
> > for
> > > us. The Ambari project also pretty much reserve the right to blow away
> > that
> > > infrastructure in upgrades (which is fair enough). So relying on there
> > > being an RDBMS owned by another component is not something I would
> > > necessarily say was a clean choice.
> > >
> > > Simon
> > >
> > > > On 2 Feb 2018, at 13:50, Nick Allen <ni...@nickallen.org> wrote:
> > > >
> > > > I fall marginally on the side of an RDBMS.  There is definitely a
> case
> > to
> > > > be made on both sides, but I'll point out a few things for the RDBMS.
> > > >
> > > >
> > > > (1) Flexibility.  Using an RDBMS is going to provide us with much
> > greater
> > > > flexibility going forward.  We really don't know what the specific
> use
> > > > cases will be, but I am willing to bet they are user-focused
> > > (preferences,
> > > > etc).  The type of use cases that most web applications use an RDBMS
> > for.
> > > >
> > > >
> > > >> If anything I would like to see the current RDBMS dependency come
> > out...
> > > >
> > > > (2) Don't we already have an RDBMS requirement for Ambari?  That's a
> > > > dependency that we do not control.
> > > >
> > > >
> > > >> ... hbase seems a good option (because we already have it there, it
> > > would
> > > > be kinda crazy at this scale if we didn’t already have it)
> > > >
> > > > (3) In this scenario, the RDBMS would not scale proportionally with
> the
> > > > amount of telemetry, it would scale based on usage; primarily the
> > number
> > > of
> > > > users.  This is not "big data" scale.  I don't think we can make the
> > case
> > > > for HBase based on scale here.
> > > >
> > > >
> > > >> We would also end up with, as Mike points out, a whole new disk
> > > > deployment patterns and a bunch of additional DBA ops process
> > > requirements
> > > > for every install.
> > > >
> > > > (4) Most users that need HA/DR (and other 'advanced stuff'), are
> > > > enterprises and organizations that are already very familiar with
> RDBMS
> > > > solutions and have the infrastructure in place to manage those.  For
> > > users
> > > > that don't need HA/DR, just use the DB that gets spun-up with Ambari.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
> > > > simon@simonellistonball.com> wrote:
> > > >
> > > >> Introducing a RDBMS to the stack seems unnecessary for this.
> > > >>
> > > >> If we consider the data access patterns for user profiles, we are
> > > unlikely
> > > >> to query into them, or indeed do anything other than look them up,
> or
> > > write
> > > >> them out by a username key. To that end, using an ORM to translate
> a a
> > > >> nested config object into a load of tables seems to introduce
> > complexity
> > > >> and brittleness we then have to take away through relying on
> > relational
> > > >> consistency models. We would also end up with, as Mike points out, a
> > > whole
> > > >> new disk deployment patterns and a bunch of additional DBA ops
> process
> > > >> requirements for every install.
> > > >>
> > > >> Since the access pattern is almost entirely key => value, hbase
> seems
> > a
> > > >> good option (because we already have it there, it would be kinda
> crazy
> > > at
> > > >> this scale if we didn’t already have it) or arguably zookeeper, but
> > that
> > > >> might be at the other end of the scale argument. I’d even go as far
> as
> > > to
> > > >> suggest files on HDFS to keep it simple.
> > > >>
> > > >> Simon
> > > >>
> > > >>> On 1 Feb 2018, at 23:24, Michael Miklavcic <
> > > michael.miklavcic@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>> Personally, I'd be in favor of something like Maria DB as an open
> > > source
> > > >>> repo. Or any other ansi sql store. On the positive side, it should
> > mesh
> > > >>> seamlessly with ORM tools. And the schema for this should be pretty
> > > >>> vanilla, I'd imagine. I might even consider skipping ORM for
> straight
> > > >> JDBC
> > > >>> and simple command scripts in Java for something this small. I'm
> not
> > > >>> worried so much about migrations of this sort. Large scale DBs can
> > get
> > > >>> involved with major schema changes, but thats usually when the
> > > datastore
> > > >> is
> > > >>> a massive set of tables with complex relationships, at least in my
> > > >>> experience.
> > > >>>
> > > >>> We could also use hbase, which probably wouldn't be that hard
> either,
> > > but
> > > >>> there may be more boilerplate to write for the client as compared
> to
> > > >>> standard SQL. But I'm assuming we could reuse a fair amount of
> > existing
> > > >>> code from our enrichments. One additional reason in favor of hbase
> > > might
> > > >> be
> > > >>> data replication. For a SQL instance we'd probably recommend a RAID
> > > store
> > > >>> or backup procedure, but we get that pretty easy with hbase too.
> > > >>>
> > > >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:
> > > >>>
> > > >>>> So, I'll answer your question with some questions:
> > > >>>>
> > > >>>>  - No matter the data store we use upgrading will take some care,
> > > >> right?
> > > >>>>  - Do we currently depend on a RDBMS anywhere?  I want to say that
> > we
> > > >> do
> > > >>>>  in the REST layer already, right?
> > > >>>>  - If we don't use a RDBMs, what's the other option?  What are the
> > > pros
> > > >>>>  and cons?
> > > >>>>  - Have we considered non-server offline persistent solutions
> (e.g.
> > > >>>>  https://www.html5rocks.com/en/features/storage)?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <
> merrimanr@gmail.com>
> > > >> wrote:
> > > >>>>
> > > >>>>> There is currently a PR up for review that allows a user to
> > configure
> > > >> and
> > > >>>>> save the list of facet fields that appear in the left column of
> the
> > > >>>> Alerts
> > > >>>>> UI:  https://github.com/apache/metron/pull/853.  The REST layer
> > has
> > > >> ORM
> > > >>>>> support which means we can store those in a relational database.
> > > >>>>>
> > > >>>>> However I'm not 100% sure this is the best place to keep this.
> As
> > we
> > > >> add
> > > >>>>> more use cases like this the backing tables in the RDBMS will
> need
> > to
> > > >> be
> > > >>>>> managed.  This could make upgrading more tedious and error-prone.
> > Is
> > > >>>> there
> > > >>>>> are a better way to store this, assuming we can leverage a
> > component
> > > >>>> that's
> > > >>>>> already included in our stack?
> > > >>>>>
> > > >>>>> Ryan
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] Persistence store for user profile settings

Posted by Ryan Merriman <me...@gmail.com>.
I would like to bring this discussion to a conclusion and update the PR
accordingly.  To clarify on whether we depend on an RDBMS right now, we do
but only for authentication which will probably be replaced at some point.
So the answer is not really.  I personally agree with Simon and think we
should use HBase because this use case fits the data model and it's already
in our stack.  I would add that with HBase we can move the schema evolution
complexity to the application layer and hide it from the user.  This will
make upgrades easier which is my main point of contention.  I also agree
with Nick in that I do think there may be a place for a RDBMS in the future
but we can always add it back.

The 2 choices seems to be either an RDBMS or HBase.  Here is a summary
based on comments in this discussion:

RDBMS
- some are not too worried about schema evolution as the data model will
likely be simple
- avoiding having to alter tables when upgrading would be ideal
- works with ORM tools
- is flexible and could be useful for future use cases

HBase
- might involve boilerplate code if not covered elsewhere in Metron
- key/value is good enough for user profile settings
- data replication for free

Reading over this thread again I get the impression there is a slight
preference for HBase.  Want to give people one more change to chime in or
argue the other solution.  Let me know if I missed anything or didn't
include someone's argument.



On Fri, Feb 2, 2018 at 8:24 AM, Nick Allen <ni...@nickallen.org> wrote:

> > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I
> would never suggest introducing HBase for something like this, but since
> it’s there.
>
> Ah, gotcha.  Misunderstood your statement.
>
>
>
> On Fri, Feb 2, 2018 at 9:01 AM Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
>
> > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I
> > would never suggest introducing HBase for something like this, but since
> > it’s there.
> >
> > On the idea of using the Ambari RDBMS for the same basis of it being
> > there, I see your point. That said, it can be postgres, sql server,
> mysql,
> > maria, oracle… various. Yes we have an ORM, but those are not nearly as
> > magic as they claim, and upgrade / schema evolution of an RDBMS often
> > involves some sort of platform dependent SQL migration in my experience.
> I
> > would suggest that supporting that range of options is not a good idea
> for
> > us. The Ambari project also pretty much reserve the right to blow away
> that
> > infrastructure in upgrades (which is fair enough). So relying on there
> > being an RDBMS owned by another component is not something I would
> > necessarily say was a clean choice.
> >
> > Simon
> >
> > > On 2 Feb 2018, at 13:50, Nick Allen <ni...@nickallen.org> wrote:
> > >
> > > I fall marginally on the side of an RDBMS.  There is definitely a case
> to
> > > be made on both sides, but I'll point out a few things for the RDBMS.
> > >
> > >
> > > (1) Flexibility.  Using an RDBMS is going to provide us with much
> greater
> > > flexibility going forward.  We really don't know what the specific use
> > > cases will be, but I am willing to bet they are user-focused
> > (preferences,
> > > etc).  The type of use cases that most web applications use an RDBMS
> for.
> > >
> > >
> > >> If anything I would like to see the current RDBMS dependency come
> out...
> > >
> > > (2) Don't we already have an RDBMS requirement for Ambari?  That's a
> > > dependency that we do not control.
> > >
> > >
> > >> ... hbase seems a good option (because we already have it there, it
> > would
> > > be kinda crazy at this scale if we didn’t already have it)
> > >
> > > (3) In this scenario, the RDBMS would not scale proportionally with the
> > > amount of telemetry, it would scale based on usage; primarily the
> number
> > of
> > > users.  This is not "big data" scale.  I don't think we can make the
> case
> > > for HBase based on scale here.
> > >
> > >
> > >> We would also end up with, as Mike points out, a whole new disk
> > > deployment patterns and a bunch of additional DBA ops process
> > requirements
> > > for every install.
> > >
> > > (4) Most users that need HA/DR (and other 'advanced stuff'), are
> > > enterprises and organizations that are already very familiar with RDBMS
> > > solutions and have the infrastructure in place to manage those.  For
> > users
> > > that don't need HA/DR, just use the DB that gets spun-up with Ambari.
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
> > > simon@simonellistonball.com> wrote:
> > >
> > >> Introducing a RDBMS to the stack seems unnecessary for this.
> > >>
> > >> If we consider the data access patterns for user profiles, we are
> > unlikely
> > >> to query into them, or indeed do anything other than look them up, or
> > write
> > >> them out by a username key. To that end, using an ORM to translate a a
> > >> nested config object into a load of tables seems to introduce
> complexity
> > >> and brittleness we then have to take away through relying on
> relational
> > >> consistency models. We would also end up with, as Mike points out, a
> > whole
> > >> new disk deployment patterns and a bunch of additional DBA ops process
> > >> requirements for every install.
> > >>
> > >> Since the access pattern is almost entirely key => value, hbase seems
> a
> > >> good option (because we already have it there, it would be kinda crazy
> > at
> > >> this scale if we didn’t already have it) or arguably zookeeper, but
> that
> > >> might be at the other end of the scale argument. I’d even go as far as
> > to
> > >> suggest files on HDFS to keep it simple.
> > >>
> > >> Simon
> > >>
> > >>> On 1 Feb 2018, at 23:24, Michael Miklavcic <
> > michael.miklavcic@gmail.com>
> > >> wrote:
> > >>>
> > >>> Personally, I'd be in favor of something like Maria DB as an open
> > source
> > >>> repo. Or any other ansi sql store. On the positive side, it should
> mesh
> > >>> seamlessly with ORM tools. And the schema for this should be pretty
> > >>> vanilla, I'd imagine. I might even consider skipping ORM for straight
> > >> JDBC
> > >>> and simple command scripts in Java for something this small. I'm not
> > >>> worried so much about migrations of this sort. Large scale DBs can
> get
> > >>> involved with major schema changes, but thats usually when the
> > datastore
> > >> is
> > >>> a massive set of tables with complex relationships, at least in my
> > >>> experience.
> > >>>
> > >>> We could also use hbase, which probably wouldn't be that hard either,
> > but
> > >>> there may be more boilerplate to write for the client as compared to
> > >>> standard SQL. But I'm assuming we could reuse a fair amount of
> existing
> > >>> code from our enrichments. One additional reason in favor of hbase
> > might
> > >> be
> > >>> data replication. For a SQL instance we'd probably recommend a RAID
> > store
> > >>> or backup procedure, but we get that pretty easy with hbase too.
> > >>>
> > >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:
> > >>>
> > >>>> So, I'll answer your question with some questions:
> > >>>>
> > >>>>  - No matter the data store we use upgrading will take some care,
> > >> right?
> > >>>>  - Do we currently depend on a RDBMS anywhere?  I want to say that
> we
> > >> do
> > >>>>  in the REST layer already, right?
> > >>>>  - If we don't use a RDBMs, what's the other option?  What are the
> > pros
> > >>>>  and cons?
> > >>>>  - Have we considered non-server offline persistent solutions (e.g.
> > >>>>  https://www.html5rocks.com/en/features/storage)?
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com>
> > >> wrote:
> > >>>>
> > >>>>> There is currently a PR up for review that allows a user to
> configure
> > >> and
> > >>>>> save the list of facet fields that appear in the left column of the
> > >>>> Alerts
> > >>>>> UI:  https://github.com/apache/metron/pull/853.  The REST layer
> has
> > >> ORM
> > >>>>> support which means we can store those in a relational database.
> > >>>>>
> > >>>>> However I'm not 100% sure this is the best place to keep this.  As
> we
> > >> add
> > >>>>> more use cases like this the backing tables in the RDBMS will need
> to
> > >> be
> > >>>>> managed.  This could make upgrading more tedious and error-prone.
> Is
> > >>>> there
> > >>>>> are a better way to store this, assuming we can leverage a
> component
> > >>>> that's
> > >>>>> already included in our stack?
> > >>>>>
> > >>>>> Ryan
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] Persistence store for user profile settings

Posted by Nick Allen <ni...@nickallen.org>.
> Glad you agree with me that this isn’t HBase scale… it’s clearly not. I
would never suggest introducing HBase for something like this, but since
it’s there.

Ah, gotcha.  Misunderstood your statement.



On Fri, Feb 2, 2018 at 9:01 AM Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> Glad you agree with me that this isn’t HBase scale… it’s clearly not. I
> would never suggest introducing HBase for something like this, but since
> it’s there.
>
> On the idea of using the Ambari RDBMS for the same basis of it being
> there, I see your point. That said, it can be postgres, sql server, mysql,
> maria, oracle… various. Yes we have an ORM, but those are not nearly as
> magic as they claim, and upgrade / schema evolution of an RDBMS often
> involves some sort of platform dependent SQL migration in my experience. I
> would suggest that supporting that range of options is not a good idea for
> us. The Ambari project also pretty much reserve the right to blow away that
> infrastructure in upgrades (which is fair enough). So relying on there
> being an RDBMS owned by another component is not something I would
> necessarily say was a clean choice.
>
> Simon
>
> > On 2 Feb 2018, at 13:50, Nick Allen <ni...@nickallen.org> wrote:
> >
> > I fall marginally on the side of an RDBMS.  There is definitely a case to
> > be made on both sides, but I'll point out a few things for the RDBMS.
> >
> >
> > (1) Flexibility.  Using an RDBMS is going to provide us with much greater
> > flexibility going forward.  We really don't know what the specific use
> > cases will be, but I am willing to bet they are user-focused
> (preferences,
> > etc).  The type of use cases that most web applications use an RDBMS for.
> >
> >
> >> If anything I would like to see the current RDBMS dependency come out...
> >
> > (2) Don't we already have an RDBMS requirement for Ambari?  That's a
> > dependency that we do not control.
> >
> >
> >> ... hbase seems a good option (because we already have it there, it
> would
> > be kinda crazy at this scale if we didn’t already have it)
> >
> > (3) In this scenario, the RDBMS would not scale proportionally with the
> > amount of telemetry, it would scale based on usage; primarily the number
> of
> > users.  This is not "big data" scale.  I don't think we can make the case
> > for HBase based on scale here.
> >
> >
> >> We would also end up with, as Mike points out, a whole new disk
> > deployment patterns and a bunch of additional DBA ops process
> requirements
> > for every install.
> >
> > (4) Most users that need HA/DR (and other 'advanced stuff'), are
> > enterprises and organizations that are already very familiar with RDBMS
> > solutions and have the infrastructure in place to manage those.  For
> users
> > that don't need HA/DR, just use the DB that gets spun-up with Ambari.
> >
> >
> >
> >
> >
> > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
> > simon@simonellistonball.com> wrote:
> >
> >> Introducing a RDBMS to the stack seems unnecessary for this.
> >>
> >> If we consider the data access patterns for user profiles, we are
> unlikely
> >> to query into them, or indeed do anything other than look them up, or
> write
> >> them out by a username key. To that end, using an ORM to translate a a
> >> nested config object into a load of tables seems to introduce complexity
> >> and brittleness we then have to take away through relying on relational
> >> consistency models. We would also end up with, as Mike points out, a
> whole
> >> new disk deployment patterns and a bunch of additional DBA ops process
> >> requirements for every install.
> >>
> >> Since the access pattern is almost entirely key => value, hbase seems a
> >> good option (because we already have it there, it would be kinda crazy
> at
> >> this scale if we didn’t already have it) or arguably zookeeper, but that
> >> might be at the other end of the scale argument. I’d even go as far as
> to
> >> suggest files on HDFS to keep it simple.
> >>
> >> Simon
> >>
> >>> On 1 Feb 2018, at 23:24, Michael Miklavcic <
> michael.miklavcic@gmail.com>
> >> wrote:
> >>>
> >>> Personally, I'd be in favor of something like Maria DB as an open
> source
> >>> repo. Or any other ansi sql store. On the positive side, it should mesh
> >>> seamlessly with ORM tools. And the schema for this should be pretty
> >>> vanilla, I'd imagine. I might even consider skipping ORM for straight
> >> JDBC
> >>> and simple command scripts in Java for something this small. I'm not
> >>> worried so much about migrations of this sort. Large scale DBs can get
> >>> involved with major schema changes, but thats usually when the
> datastore
> >> is
> >>> a massive set of tables with complex relationships, at least in my
> >>> experience.
> >>>
> >>> We could also use hbase, which probably wouldn't be that hard either,
> but
> >>> there may be more boilerplate to write for the client as compared to
> >>> standard SQL. But I'm assuming we could reuse a fair amount of existing
> >>> code from our enrichments. One additional reason in favor of hbase
> might
> >> be
> >>> data replication. For a SQL instance we'd probably recommend a RAID
> store
> >>> or backup procedure, but we get that pretty easy with hbase too.
> >>>
> >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:
> >>>
> >>>> So, I'll answer your question with some questions:
> >>>>
> >>>>  - No matter the data store we use upgrading will take some care,
> >> right?
> >>>>  - Do we currently depend on a RDBMS anywhere?  I want to say that we
> >> do
> >>>>  in the REST layer already, right?
> >>>>  - If we don't use a RDBMs, what's the other option?  What are the
> pros
> >>>>  and cons?
> >>>>  - Have we considered non-server offline persistent solutions (e.g.
> >>>>  https://www.html5rocks.com/en/features/storage)?
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com>
> >> wrote:
> >>>>
> >>>>> There is currently a PR up for review that allows a user to configure
> >> and
> >>>>> save the list of facet fields that appear in the left column of the
> >>>> Alerts
> >>>>> UI:  https://github.com/apache/metron/pull/853.  The REST layer has
> >> ORM
> >>>>> support which means we can store those in a relational database.
> >>>>>
> >>>>> However I'm not 100% sure this is the best place to keep this.  As we
> >> add
> >>>>> more use cases like this the backing tables in the RDBMS will need to
> >> be
> >>>>> managed.  This could make upgrading more tedious and error-prone.  Is
> >>>> there
> >>>>> are a better way to store this, assuming we can leverage a component
> >>>> that's
> >>>>> already included in our stack?
> >>>>>
> >>>>> Ryan
> >>>>>
> >>>>
> >>
> >>
>
>

Re: [DISCUSS] Persistence store for user profile settings

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
Glad you agree with me that this isn’t HBase scale… it’s clearly not. I would never suggest introducing HBase for something like this, but since it’s there.

On the idea of using the Ambari RDBMS for the same basis of it being there, I see your point. That said, it can be postgres, sql server, mysql, maria, oracle… various. Yes we have an ORM, but those are not nearly as magic as they claim, and upgrade / schema evolution of an RDBMS often involves some sort of platform dependent SQL migration in my experience. I would suggest that supporting that range of options is not a good idea for us. The Ambari project also pretty much reserve the right to blow away that infrastructure in upgrades (which is fair enough). So relying on there being an RDBMS owned by another component is not something I would necessarily say was a clean choice. 

Simon

> On 2 Feb 2018, at 13:50, Nick Allen <ni...@nickallen.org> wrote:
> 
> I fall marginally on the side of an RDBMS.  There is definitely a case to
> be made on both sides, but I'll point out a few things for the RDBMS.
> 
> 
> (1) Flexibility.  Using an RDBMS is going to provide us with much greater
> flexibility going forward.  We really don't know what the specific use
> cases will be, but I am willing to bet they are user-focused (preferences,
> etc).  The type of use cases that most web applications use an RDBMS for.
> 
> 
>> If anything I would like to see the current RDBMS dependency come out...
> 
> (2) Don't we already have an RDBMS requirement for Ambari?  That's a
> dependency that we do not control.
> 
> 
>> ... hbase seems a good option (because we already have it there, it would
> be kinda crazy at this scale if we didn’t already have it)
> 
> (3) In this scenario, the RDBMS would not scale proportionally with the
> amount of telemetry, it would scale based on usage; primarily the number of
> users.  This is not "big data" scale.  I don't think we can make the case
> for HBase based on scale here.
> 
> 
>> We would also end up with, as Mike points out, a whole new disk
> deployment patterns and a bunch of additional DBA ops process requirements
> for every install.
> 
> (4) Most users that need HA/DR (and other 'advanced stuff'), are
> enterprises and organizations that are already very familiar with RDBMS
> solutions and have the infrastructure in place to manage those.  For users
> that don't need HA/DR, just use the DB that gets spun-up with Ambari.
> 
> 
> 
> 
> 
> On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
> 
>> Introducing a RDBMS to the stack seems unnecessary for this.
>> 
>> If we consider the data access patterns for user profiles, we are unlikely
>> to query into them, or indeed do anything other than look them up, or write
>> them out by a username key. To that end, using an ORM to translate a a
>> nested config object into a load of tables seems to introduce complexity
>> and brittleness we then have to take away through relying on relational
>> consistency models. We would also end up with, as Mike points out, a whole
>> new disk deployment patterns and a bunch of additional DBA ops process
>> requirements for every install.
>> 
>> Since the access pattern is almost entirely key => value, hbase seems a
>> good option (because we already have it there, it would be kinda crazy at
>> this scale if we didn’t already have it) or arguably zookeeper, but that
>> might be at the other end of the scale argument. I’d even go as far as to
>> suggest files on HDFS to keep it simple.
>> 
>> Simon
>> 
>>> On 1 Feb 2018, at 23:24, Michael Miklavcic <mi...@gmail.com>
>> wrote:
>>> 
>>> Personally, I'd be in favor of something like Maria DB as an open source
>>> repo. Or any other ansi sql store. On the positive side, it should mesh
>>> seamlessly with ORM tools. And the schema for this should be pretty
>>> vanilla, I'd imagine. I might even consider skipping ORM for straight
>> JDBC
>>> and simple command scripts in Java for something this small. I'm not
>>> worried so much about migrations of this sort. Large scale DBs can get
>>> involved with major schema changes, but thats usually when the datastore
>> is
>>> a massive set of tables with complex relationships, at least in my
>>> experience.
>>> 
>>> We could also use hbase, which probably wouldn't be that hard either, but
>>> there may be more boilerplate to write for the client as compared to
>>> standard SQL. But I'm assuming we could reuse a fair amount of existing
>>> code from our enrichments. One additional reason in favor of hbase might
>> be
>>> data replication. For a SQL instance we'd probably recommend a RAID store
>>> or backup procedure, but we get that pretty easy with hbase too.
>>> 
>>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:
>>> 
>>>> So, I'll answer your question with some questions:
>>>> 
>>>>  - No matter the data store we use upgrading will take some care,
>> right?
>>>>  - Do we currently depend on a RDBMS anywhere?  I want to say that we
>> do
>>>>  in the REST layer already, right?
>>>>  - If we don't use a RDBMs, what's the other option?  What are the pros
>>>>  and cons?
>>>>  - Have we considered non-server offline persistent solutions (e.g.
>>>>  https://www.html5rocks.com/en/features/storage)?
>>>> 
>>>> 
>>>> 
>>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com>
>> wrote:
>>>> 
>>>>> There is currently a PR up for review that allows a user to configure
>> and
>>>>> save the list of facet fields that appear in the left column of the
>>>> Alerts
>>>>> UI:  https://github.com/apache/metron/pull/853.  The REST layer has
>> ORM
>>>>> support which means we can store those in a relational database.
>>>>> 
>>>>> However I'm not 100% sure this is the best place to keep this.  As we
>> add
>>>>> more use cases like this the backing tables in the RDBMS will need to
>> be
>>>>> managed.  This could make upgrading more tedious and error-prone.  Is
>>>> there
>>>>> are a better way to store this, assuming we can leverage a component
>>>> that's
>>>>> already included in our stack?
>>>>> 
>>>>> Ryan
>>>>> 
>>>> 
>> 
>> 


Re: [DISCUSS] Persistence store for user profile settings

Posted by Nick Allen <ni...@nickallen.org>.
I fall marginally on the side of an RDBMS.  There is definitely a case to
be made on both sides, but I'll point out a few things for the RDBMS.


(1) Flexibility.  Using an RDBMS is going to provide us with much greater
flexibility going forward.  We really don't know what the specific use
cases will be, but I am willing to bet they are user-focused (preferences,
etc).  The type of use cases that most web applications use an RDBMS for.


> If anything I would like to see the current RDBMS dependency come out...

(2) Don't we already have an RDBMS requirement for Ambari?  That's a
dependency that we do not control.


> ... hbase seems a good option (because we already have it there, it would
be kinda crazy at this scale if we didn’t already have it)

(3) In this scenario, the RDBMS would not scale proportionally with the
amount of telemetry, it would scale based on usage; primarily the number of
users.  This is not "big data" scale.  I don't think we can make the case
for HBase based on scale here.


> We would also end up with, as Mike points out, a whole new disk
deployment patterns and a bunch of additional DBA ops process requirements
for every install.

(4) Most users that need HA/DR (and other 'advanced stuff'), are
enterprises and organizations that are already very familiar with RDBMS
solutions and have the infrastructure in place to manage those.  For users
that don't need HA/DR, just use the DB that gets spun-up with Ambari.





On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> Introducing a RDBMS to the stack seems unnecessary for this.
>
> If we consider the data access patterns for user profiles, we are unlikely
> to query into them, or indeed do anything other than look them up, or write
> them out by a username key. To that end, using an ORM to translate a a
> nested config object into a load of tables seems to introduce complexity
> and brittleness we then have to take away through relying on relational
> consistency models. We would also end up with, as Mike points out, a whole
> new disk deployment patterns and a bunch of additional DBA ops process
> requirements for every install.
>
> Since the access pattern is almost entirely key => value, hbase seems a
> good option (because we already have it there, it would be kinda crazy at
> this scale if we didn’t already have it) or arguably zookeeper, but that
> might be at the other end of the scale argument. I’d even go as far as to
> suggest files on HDFS to keep it simple.
>
> Simon
>
> > On 1 Feb 2018, at 23:24, Michael Miklavcic <mi...@gmail.com>
> wrote:
> >
> > Personally, I'd be in favor of something like Maria DB as an open source
> > repo. Or any other ansi sql store. On the positive side, it should mesh
> > seamlessly with ORM tools. And the schema for this should be pretty
> > vanilla, I'd imagine. I might even consider skipping ORM for straight
> JDBC
> > and simple command scripts in Java for something this small. I'm not
> > worried so much about migrations of this sort. Large scale DBs can get
> > involved with major schema changes, but thats usually when the datastore
> is
> > a massive set of tables with complex relationships, at least in my
> > experience.
> >
> > We could also use hbase, which probably wouldn't be that hard either, but
> > there may be more boilerplate to write for the client as compared to
> > standard SQL. But I'm assuming we could reuse a fair amount of existing
> > code from our enrichments. One additional reason in favor of hbase might
> be
> > data replication. For a SQL instance we'd probably recommend a RAID store
> > or backup procedure, but we get that pretty easy with hbase too.
> >
> > On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:
> >
> >> So, I'll answer your question with some questions:
> >>
> >>   - No matter the data store we use upgrading will take some care,
> right?
> >>   - Do we currently depend on a RDBMS anywhere?  I want to say that we
> do
> >>   in the REST layer already, right?
> >>   - If we don't use a RDBMs, what's the other option?  What are the pros
> >>   and cons?
> >>   - Have we considered non-server offline persistent solutions (e.g.
> >>   https://www.html5rocks.com/en/features/storage)?
> >>
> >>
> >>
> >> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com>
> wrote:
> >>
> >>> There is currently a PR up for review that allows a user to configure
> and
> >>> save the list of facet fields that appear in the left column of the
> >> Alerts
> >>> UI:  https://github.com/apache/metron/pull/853.  The REST layer has
> ORM
> >>> support which means we can store those in a relational database.
> >>>
> >>> However I'm not 100% sure this is the best place to keep this.  As we
> add
> >>> more use cases like this the backing tables in the RDBMS will need to
> be
> >>> managed.  This could make upgrading more tedious and error-prone.  Is
> >> there
> >>> are a better way to store this, assuming we can leverage a component
> >> that's
> >>> already included in our stack?
> >>>
> >>> Ryan
> >>>
> >>
>
>

Re: [DISCUSS] Persistence store for user profile settings

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
Couldn’t agree with you more Otto! On the perms / ACLs / AXOs / groups / users etc concerns though, there are other Apache projects (such as Ranger) which have already done a lot of the hard thinking and architecture / data structure / admin ui and persistence pieces for us, so I’d say we lean on them before designing our own approach to IAM. 

Simon

> On 2 Feb 2018, at 13:22, Otto Fowler <ot...@gmail.com> wrote:
> 
> Fair enough,  I don’t have a preference.  I think my point is that we need to understand the use cases we can think of more, especially if we are going to be having permissions, grouping and crud around that, and preloading, before just throwing everything in RDBMS -or- HBASE.
> 
> 
> 
> On February 2, 2018 at 08:08:24, Simon Elliston Ball (simon@simonellistonball.com <ma...@simonellistonball.com>) wrote:
> 
>> True, and that is a requirement I’ve heard a lot (standard views or field sets in shared sets of saved search for example). That would definitely rule out sticking with the current approach (browser local storage, per Casey’s suggestion below). 
>> 
>> That said, I’m not sure that changes my views on RDBMS. There is an argument that a single query from RDBMS could return a set of group prefs with a user overlay, but that’s not that much better than pulling groups and overwriting the maps clientside with user, from the key value store. We’re not talking about huge amounts of preference data here. I could be swayed the other way if we were to use the RDBMS as a canonical store for user and group information (we use it for users right now, in a really not great way) but I would much rather see us plugin to the Hadoop ecosystem and use something like Ranger to sync users, or an LDAP source directly for user and group data, because I suspect no one wants to have to administer a separate user database for Metron and open up the result IAM security hole we currently have (on that, let’s at least stop storing plain text passwords!) /rant. 
>> 
>> If anything I would like to see the current RDBMS dependency come out to reduce the overall complexity, unless we have a use case that genuinely benefits from a normalised data structure, or from SQL access patterns. 
>> 
>> In short, I would still go with LDAP / Ranger or users and groups, and instead of adding an RDBMS, using group prefs and user prefs in the existing KV store (HBase) to reduce the operational maintenance burden on the platform. 
>> 
>> Simon
>> 
>>> On 2 Feb 2018, at 12:50, Otto Fowler <ottobackwards@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> It is not uncommon to want to have ‘shared’ preferences or setups.   Think of shared dashboards or queries vs. personal version in jira.  Would RDBMS help with that?
>>> 
>>> 
>>> 
>>> On February 2, 2018 at 07:17:04, Simon Elliston Ball (simon@simonellistonball.com <ma...@simonellistonball.com>) wrote:
>>> 
>>>> Introducing a RDBMS to the stack seems unnecessary for this. 
>>>> 
>>>> If we consider the data access patterns for user profiles, we are unlikely to query into them, or indeed do anything other than look them up, or write them out by a username key. To that end, using an ORM to translate a a nested config object into a load of tables seems to introduce complexity and brittleness we then have to take away through relying on relational consistency models. We would also end up with, as Mike points out, a whole new disk deployment patterns and a bunch of additional DBA ops process requirements for every install. 
>>>> 
>>>> Since the access pattern is almost entirely key => value, hbase seems a good option (because we already have it there, it would be kinda crazy at this scale if we didn’t already have it) or arguably zookeeper, but that might be at the other end of the scale argument. I’d even go as far as to suggest files on HDFS to keep it simple.  
>>>> 
>>>> Simon 
>>>> 
>>>> > On 1 Feb 2018, at 23:24, Michael Miklavcic <michael.miklavcic@gmail.com <ma...@gmail.com>> wrote: 
>>>> >  
>>>> > Personally, I'd be in favor of something like Maria DB as an open source 
>>>> > repo. Or any other ansi sql store. On the positive side, it should mesh 
>>>> > seamlessly with ORM tools. And the schema for this should be pretty 
>>>> > vanilla, I'd imagine. I might even consider skipping ORM for straight JDBC 
>>>> > and simple command scripts in Java for something this small. I'm not 
>>>> > worried so much about migrations of this sort. Large scale DBs can get 
>>>> > involved with major schema changes, but thats usually when the datastore is 
>>>> > a massive set of tables with complex relationships, at least in my 
>>>> > experience. 
>>>> >  
>>>> > We could also use hbase, which probably wouldn't be that hard either, but 
>>>> > there may be more boilerplate to write for the client as compared to 
>>>> > standard SQL. But I'm assuming we could reuse a fair amount of existing 
>>>> > code from our enrichments. One additional reason in favor of hbase might be 
>>>> > data replication. For a SQL instance we'd probably recommend a RAID store 
>>>> > or backup procedure, but we get that pretty easy with hbase too. 
>>>> >  
>>>> > On Feb 1, 2018 2:45 PM, "Casey Stella" <cestella@gmail.com <ma...@gmail.com>> wrote: 
>>>> >  
>>>> >> So, I'll answer your question with some questions: 
>>>> >>  
>>>> >> - No matter the data store we use upgrading will take some care, right? 
>>>> >> - Do we currently depend on a RDBMS anywhere? I want to say that we do 
>>>> >> in the REST layer already, right? 
>>>> >> - If we don't use a RDBMs, what's the other option? What are the pros 
>>>> >> and cons? 
>>>> >> - Have we considered non-server offline persistent solutions (e.g. 
>>>> >>  https://www.html5rocks.com/en/features/storage <https://www.html5rocks.com/en/features/storage>)? 
>>>> >>  
>>>> >>  
>>>> >>  
>>>> >> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <merrimanr@gmail.com <ma...@gmail.com>> wrote: 
>>>> >>  
>>>> >>> There is currently a PR up for review that allows a user to configure and 
>>>> >>> save the list of facet fields that appear in the left column of the 
>>>> >> Alerts 
>>>> >>> UI:  https://github.com/apache/metron/pull/853 <https://github.com/apache/metron/pull/853>. The REST layer has ORM 
>>>> >>> support which means we can store those in a relational database. 
>>>> >>>  
>>>> >>> However I'm not 100% sure this is the best place to keep this. As we add 
>>>> >>> more use cases like this the backing tables in the RDBMS will need to be 
>>>> >>> managed. This could make upgrading more tedious and error-prone. Is 
>>>> >> there 
>>>> >>> are a better way to store this, assuming we can leverage a component 
>>>> >> that's 
>>>> >>> already included in our stack? 
>>>> >>>  
>>>> >>> Ryan 
>>>> >>>  
>>>> >>


Re: [DISCUSS] Persistence store for user profile settings

Posted by Otto Fowler <ot...@gmail.com>.
Fair enough,  I don’t have a preference.  I think my point is that we need
to understand the use cases we can think of more, especially if we are
going to be having permissions, grouping and crud around that, and
preloading, before just throwing everything in RDBMS -or- HBASE.



On February 2, 2018 at 08:08:24, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

True, and that is a requirement I’ve heard a lot (standard views or field
sets in shared sets of saved search for example). That would definitely
rule out sticking with the current approach (browser local storage, per
Casey’s suggestion below).

That said, I’m not sure that changes my views on RDBMS. There is an
argument that a single query from RDBMS could return a set of group prefs
with a user overlay, but that’s not that much better than pulling groups
and overwriting the maps clientside with user, from the key value store.
We’re not talking about huge amounts of preference data here. I could be
swayed the other way if we were to use the RDBMS as a canonical store for
user and group information (we use it for users right now, in a really not
great way) but I would much rather see us plugin to the Hadoop ecosystem
and use something like Ranger to sync users, or an LDAP source directly for
user and group data, because I suspect no one wants to have to administer a
separate user database for Metron and open up the result IAM security hole
we currently have (on that, let’s at least stop storing plain text
passwords!) /rant.

If anything I would like to see the current RDBMS dependency come out to
reduce the overall complexity, unless we have a use case that genuinely
benefits from a normalised data structure, or from SQL access patterns.

In short, I would still go with LDAP / Ranger or users and groups, and
instead of adding an RDBMS, using group prefs and user prefs in the
existing KV store (HBase) to reduce the operational maintenance burden on
the platform.

Simon

On 2 Feb 2018, at 12:50, Otto Fowler <ot...@gmail.com> wrote:

It is not uncommon to want to have ‘shared’ preferences or setups.   Think
of shared dashboards or queries vs. personal version in jira.  Would RDBMS
help with that?



On February 2, 2018 at 07:17:04, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

Introducing a RDBMS to the stack seems unnecessary for this.

If we consider the data access patterns for user profiles, we are unlikely
to query into them, or indeed do anything other than look them up, or write
them out by a username key. To that end, using an ORM to translate a a
nested config object into a load of tables seems to introduce complexity
and brittleness we then have to take away through relying on relational
consistency models. We would also end up with, as Mike points out, a whole
new disk deployment patterns and a bunch of additional DBA ops process
requirements for every install.

Since the access pattern is almost entirely key => value, hbase seems a
good option (because we already have it there, it would be kinda crazy at
this scale if we didn’t already have it) or arguably zookeeper, but that
might be at the other end of the scale argument. I’d even go as far as to
suggest files on HDFS to keep it simple.

Simon

> On 1 Feb 2018, at 23:24, Michael Miklavcic <mi...@gmail.com>
wrote:
>
> Personally, I'd be in favor of something like Maria DB as an open source
> repo. Or any other ansi sql store. On the positive side, it should mesh
> seamlessly with ORM tools. And the schema for this should be pretty
> vanilla, I'd imagine. I might even consider skipping ORM for straight JDBC

> and simple command scripts in Java for something this small. I'm not
> worried so much about migrations of this sort. Large scale DBs can get
> involved with major schema changes, but thats usually when the datastore
is
> a massive set of tables with complex relationships, at least in my
> experience.
>
> We could also use hbase, which probably wouldn't be that hard either, but
> there may be more boilerplate to write for the client as compared to
> standard SQL. But I'm assuming we could reuse a fair amount of existing
> code from our enrichments. One additional reason in favor of hbase might
be
> data replication. For a SQL instance we'd probably recommend a RAID store
> or backup procedure, but we get that pretty easy with hbase too.
>
> On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:
>
>> So, I'll answer your question with some questions:
>>
>> - No matter the data store we use upgrading will take some care, right?
>> - Do we currently depend on a RDBMS anywhere? I want to say that we do
>> in the REST layer already, right?
>> - If we don't use a RDBMs, what's the other option? What are the pros
>> and cons?
>> - Have we considered non-server offline persistent solutions (e.g.
>>  https://www.html5rocks.com/en/features/storage)?
>>
>>
>>
>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com>
wrote:
>>
>>> There is currently a PR up for review that allows a user to configure
and
>>> save the list of facet fields that appear in the left column of the
>> Alerts
>>> UI:  https://github.com/apache/metron/pull/853. The REST layer has ORM
>>> support which means we can store those in a relational database.
>>>
>>> However I'm not 100% sure this is the best place to keep this. As we add

>>> more use cases like this the backing tables in the RDBMS will need to be

>>> managed. This could make upgrading more tedious and error-prone. Is
>> there
>>> are a better way to store this, assuming we can leverage a component
>> that's
>>> already included in our stack?
>>>
>>> Ryan
>>>
>>

Re: [DISCUSS] Persistence store for user profile settings

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
True, and that is a requirement I’ve heard a lot (standard views or field sets in shared sets of saved search for example). That would definitely rule out sticking with the current approach (browser local storage, per Casey’s suggestion below). 

That said, I’m not sure that changes my views on RDBMS. There is an argument that a single query from RDBMS could return a set of group prefs with a user overlay, but that’s not that much better than pulling groups and overwriting the maps clientside with user, from the key value store. We’re not talking about huge amounts of preference data here. I could be swayed the other way if we were to use the RDBMS as a canonical store for user and group information (we use it for users right now, in a really not great way) but I would much rather see us plugin to the Hadoop ecosystem and use something like Ranger to sync users, or an LDAP source directly for user and group data, because I suspect no one wants to have to administer a separate user database for Metron and open up the result IAM security hole we currently have (on that, let’s at least stop storing plain text passwords!) /rant. 

If anything I would like to see the current RDBMS dependency come out to reduce the overall complexity, unless we have a use case that genuinely benefits from a normalised data structure, or from SQL access patterns. 

In short, I would still go with LDAP / Ranger or users and groups, and instead of adding an RDBMS, using group prefs and user prefs in the existing KV store (HBase) to reduce the operational maintenance burden on the platform. 

Simon

> On 2 Feb 2018, at 12:50, Otto Fowler <ot...@gmail.com> wrote:
> 
> It is not uncommon to want to have ‘shared’ preferences or setups.   Think of shared dashboards or queries vs. personal version in jira.  Would RDBMS help with that?
> 
> 
> 
> On February 2, 2018 at 07:17:04, Simon Elliston Ball (simon@simonellistonball.com <ma...@simonellistonball.com>) wrote:
> 
>> Introducing a RDBMS to the stack seems unnecessary for this. 
>> 
>> If we consider the data access patterns for user profiles, we are unlikely to query into them, or indeed do anything other than look them up, or write them out by a username key. To that end, using an ORM to translate a a nested config object into a load of tables seems to introduce complexity and brittleness we then have to take away through relying on relational consistency models. We would also end up with, as Mike points out, a whole new disk deployment patterns and a bunch of additional DBA ops process requirements for every install. 
>> 
>> Since the access pattern is almost entirely key => value, hbase seems a good option (because we already have it there, it would be kinda crazy at this scale if we didn’t already have it) or arguably zookeeper, but that might be at the other end of the scale argument. I’d even go as far as to suggest files on HDFS to keep it simple.  
>> 
>> Simon 
>> 
>> > On 1 Feb 2018, at 23:24, Michael Miklavcic <michael.miklavcic@gmail.com <ma...@gmail.com>> wrote: 
>> >  
>> > Personally, I'd be in favor of something like Maria DB as an open source 
>> > repo. Or any other ansi sql store. On the positive side, it should mesh 
>> > seamlessly with ORM tools. And the schema for this should be pretty 
>> > vanilla, I'd imagine. I might even consider skipping ORM for straight JDBC 
>> > and simple command scripts in Java for something this small. I'm not 
>> > worried so much about migrations of this sort. Large scale DBs can get 
>> > involved with major schema changes, but thats usually when the datastore is 
>> > a massive set of tables with complex relationships, at least in my 
>> > experience. 
>> >  
>> > We could also use hbase, which probably wouldn't be that hard either, but 
>> > there may be more boilerplate to write for the client as compared to 
>> > standard SQL. But I'm assuming we could reuse a fair amount of existing 
>> > code from our enrichments. One additional reason in favor of hbase might be 
>> > data replication. For a SQL instance we'd probably recommend a RAID store 
>> > or backup procedure, but we get that pretty easy with hbase too. 
>> >  
>> > On Feb 1, 2018 2:45 PM, "Casey Stella" <cestella@gmail.com <ma...@gmail.com>> wrote: 
>> >  
>> >> So, I'll answer your question with some questions: 
>> >>  
>> >> - No matter the data store we use upgrading will take some care, right? 
>> >> - Do we currently depend on a RDBMS anywhere? I want to say that we do 
>> >> in the REST layer already, right? 
>> >> - If we don't use a RDBMs, what's the other option? What are the pros 
>> >> and cons? 
>> >> - Have we considered non-server offline persistent solutions (e.g. 
>> >>  https://www.html5rocks.com/en/features/storage <https://www.html5rocks.com/en/features/storage>)? 
>> >>  
>> >>  
>> >>  
>> >> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <merrimanr@gmail.com <ma...@gmail.com>> wrote: 
>> >>  
>> >>> There is currently a PR up for review that allows a user to configure and 
>> >>> save the list of facet fields that appear in the left column of the 
>> >> Alerts 
>> >>> UI:  https://github.com/apache/metron/pull/853 <https://github.com/apache/metron/pull/853>. The REST layer has ORM 
>> >>> support which means we can store those in a relational database. 
>> >>>  
>> >>> However I'm not 100% sure this is the best place to keep this. As we add 
>> >>> more use cases like this the backing tables in the RDBMS will need to be 
>> >>> managed. This could make upgrading more tedious and error-prone. Is 
>> >> there 
>> >>> are a better way to store this, assuming we can leverage a component 
>> >> that's 
>> >>> already included in our stack? 
>> >>>  
>> >>> Ryan 
>> >>>  
>> >> 


Re: [DISCUSS] Persistence store for user profile settings

Posted by Otto Fowler <ot...@gmail.com>.
It is not uncommon to want to have ‘shared’ preferences or setups.   Think
of shared dashboards or queries vs. personal version in jira.  Would RDBMS
help with that?



On February 2, 2018 at 07:17:04, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

Introducing a RDBMS to the stack seems unnecessary for this.

If we consider the data access patterns for user profiles, we are unlikely
to query into them, or indeed do anything other than look them up, or write
them out by a username key. To that end, using an ORM to translate a a
nested config object into a load of tables seems to introduce complexity
and brittleness we then have to take away through relying on relational
consistency models. We would also end up with, as Mike points out, a whole
new disk deployment patterns and a bunch of additional DBA ops process
requirements for every install.

Since the access pattern is almost entirely key => value, hbase seems a
good option (because we already have it there, it would be kinda crazy at
this scale if we didn’t already have it) or arguably zookeeper, but that
might be at the other end of the scale argument. I’d even go as far as to
suggest files on HDFS to keep it simple.

Simon

> On 1 Feb 2018, at 23:24, Michael Miklavcic <mi...@gmail.com>
wrote:
>
> Personally, I'd be in favor of something like Maria DB as an open source
> repo. Or any other ansi sql store. On the positive side, it should mesh
> seamlessly with ORM tools. And the schema for this should be pretty
> vanilla, I'd imagine. I might even consider skipping ORM for straight
JDBC
> and simple command scripts in Java for something this small. I'm not
> worried so much about migrations of this sort. Large scale DBs can get
> involved with major schema changes, but thats usually when the datastore
is
> a massive set of tables with complex relationships, at least in my
> experience.
>
> We could also use hbase, which probably wouldn't be that hard either, but
> there may be more boilerplate to write for the client as compared to
> standard SQL. But I'm assuming we could reuse a fair amount of existing
> code from our enrichments. One additional reason in favor of hbase might
be
> data replication. For a SQL instance we'd probably recommend a RAID store
> or backup procedure, but we get that pretty easy with hbase too.
>
> On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:
>
>> So, I'll answer your question with some questions:
>>
>> - No matter the data store we use upgrading will take some care, right?
>> - Do we currently depend on a RDBMS anywhere? I want to say that we do
>> in the REST layer already, right?
>> - If we don't use a RDBMs, what's the other option? What are the pros
>> and cons?
>> - Have we considered non-server offline persistent solutions (e.g.
>> https://www.html5rocks.com/en/features/storage)?
>>
>>
>>
>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com>
wrote:
>>
>>> There is currently a PR up for review that allows a user to configure
and
>>> save the list of facet fields that appear in the left column of the
>> Alerts
>>> UI: https://github.com/apache/metron/pull/853. The REST layer has ORM
>>> support which means we can store those in a relational database.
>>>
>>> However I'm not 100% sure this is the best place to keep this. As we
add
>>> more use cases like this the backing tables in the RDBMS will need to
be
>>> managed. This could make upgrading more tedious and error-prone. Is
>> there
>>> are a better way to store this, assuming we can leverage a component
>> that's
>>> already included in our stack?
>>>
>>> Ryan
>>>
>>

Re: [DISCUSS] Persistence store for user profile settings

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
Introducing a RDBMS to the stack seems unnecessary for this.

If we consider the data access patterns for user profiles, we are unlikely to query into them, or indeed do anything other than look them up, or write them out by a username key. To that end, using an ORM to translate a a nested config object into a load of tables seems to introduce complexity and brittleness we then have to take away through relying on relational consistency models. We would also end up with, as Mike points out, a whole new disk deployment patterns and a bunch of additional DBA ops process requirements for every install.

Since the access pattern is almost entirely key => value, hbase seems a good option (because we already have it there, it would be kinda crazy at this scale if we didn’t already have it) or arguably zookeeper, but that might be at the other end of the scale argument. I’d even go as far as to suggest files on HDFS to keep it simple. 

Simon

> On 1 Feb 2018, at 23:24, Michael Miklavcic <mi...@gmail.com> wrote:
> 
> Personally, I'd be in favor of something like Maria DB as an open source
> repo. Or any other ansi sql store. On the positive side, it should mesh
> seamlessly with ORM tools. And the schema for this should be pretty
> vanilla, I'd imagine. I might even consider skipping ORM for straight JDBC
> and simple command scripts in Java for something this small. I'm not
> worried so much about migrations of this sort. Large scale DBs can get
> involved with major schema changes, but thats usually when the datastore is
> a massive set of tables with complex relationships, at least in my
> experience.
> 
> We could also use hbase, which probably wouldn't be that hard either, but
> there may be more boilerplate to write for the client as compared to
> standard SQL. But I'm assuming we could reuse a fair amount of existing
> code from our enrichments. One additional reason in favor of hbase might be
> data replication. For a SQL instance we'd probably recommend a RAID store
> or backup procedure, but we get that pretty easy with hbase too.
> 
> On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:
> 
>> So, I'll answer your question with some questions:
>> 
>>   - No matter the data store we use upgrading will take some care, right?
>>   - Do we currently depend on a RDBMS anywhere?  I want to say that we do
>>   in the REST layer already, right?
>>   - If we don't use a RDBMs, what's the other option?  What are the pros
>>   and cons?
>>   - Have we considered non-server offline persistent solutions (e.g.
>>   https://www.html5rocks.com/en/features/storage)?
>> 
>> 
>> 
>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com> wrote:
>> 
>>> There is currently a PR up for review that allows a user to configure and
>>> save the list of facet fields that appear in the left column of the
>> Alerts
>>> UI:  https://github.com/apache/metron/pull/853.  The REST layer has ORM
>>> support which means we can store those in a relational database.
>>> 
>>> However I'm not 100% sure this is the best place to keep this.  As we add
>>> more use cases like this the backing tables in the RDBMS will need to be
>>> managed.  This could make upgrading more tedious and error-prone.  Is
>> there
>>> are a better way to store this, assuming we can leverage a component
>> that's
>>> already included in our stack?
>>> 
>>> Ryan
>>> 
>> 


Re: [DISCUSS] Persistence store for user profile settings

Posted by Michael Miklavcic <mi...@gmail.com>.
Personally, I'd be in favor of something like Maria DB as an open source
repo. Or any other ansi sql store. On the positive side, it should mesh
seamlessly with ORM tools. And the schema for this should be pretty
vanilla, I'd imagine. I might even consider skipping ORM for straight JDBC
and simple command scripts in Java for something this small. I'm not
worried so much about migrations of this sort. Large scale DBs can get
involved with major schema changes, but thats usually when the datastore is
a massive set of tables with complex relationships, at least in my
experience.

We could also use hbase, which probably wouldn't be that hard either, but
there may be more boilerplate to write for the client as compared to
standard SQL. But I'm assuming we could reuse a fair amount of existing
code from our enrichments. One additional reason in favor of hbase might be
data replication. For a SQL instance we'd probably recommend a RAID store
or backup procedure, but we get that pretty easy with hbase too.

On Feb 1, 2018 2:45 PM, "Casey Stella" <ce...@gmail.com> wrote:

> So, I'll answer your question with some questions:
>
>    - No matter the data store we use upgrading will take some care, right?
>    - Do we currently depend on a RDBMS anywhere?  I want to say that we do
>    in the REST layer already, right?
>    - If we don't use a RDBMs, what's the other option?  What are the pros
>    and cons?
>    - Have we considered non-server offline persistent solutions (e.g.
>    https://www.html5rocks.com/en/features/storage)?
>
>
>
> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com> wrote:
>
> > There is currently a PR up for review that allows a user to configure and
> > save the list of facet fields that appear in the left column of the
> Alerts
> > UI:  https://github.com/apache/metron/pull/853.  The REST layer has ORM
> > support which means we can store those in a relational database.
> >
> > However I'm not 100% sure this is the best place to keep this.  As we add
> > more use cases like this the backing tables in the RDBMS will need to be
> > managed.  This could make upgrading more tedious and error-prone.  Is
> there
> > are a better way to store this, assuming we can leverage a component
> that's
> > already included in our stack?
> >
> > Ryan
> >
>

Re: [DISCUSS] Persistence store for user profile settings

Posted by Casey Stella <ce...@gmail.com>.
So, I'll answer your question with some questions:

   - No matter the data store we use upgrading will take some care, right?
   - Do we currently depend on a RDBMS anywhere?  I want to say that we do
   in the REST layer already, right?
   - If we don't use a RDBMs, what's the other option?  What are the pros
   and cons?
   - Have we considered non-server offline persistent solutions (e.g.
   https://www.html5rocks.com/en/features/storage)?



On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <me...@gmail.com> wrote:

> There is currently a PR up for review that allows a user to configure and
> save the list of facet fields that appear in the left column of the Alerts
> UI:  https://github.com/apache/metron/pull/853.  The REST layer has ORM
> support which means we can store those in a relational database.
>
> However I'm not 100% sure this is the best place to keep this.  As we add
> more use cases like this the backing tables in the RDBMS will need to be
> managed.  This could make upgrading more tedious and error-prone.  Is there
> are a better way to store this, assuming we can leverage a component that's
> already included in our stack?
>
> Ryan
>