You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Christopher <ct...@apache.org> on 2017/02/06 16:43:38 UTC

[DISCUSS] Rethinking monitor log receiver

Currently, the monitor has an HA-standby behavior as a side-effect of the
way it handles logging.

The way this works is that the other services read the monitor's lock in ZK
to get the current active monitor's host and port. They use this to set
some system properties, which are referenced in the default example
generic_logger.xml config file. They set these system properties and reset
the log4j system whenever they detect a change in the active monitor. This
has the effect of forcing all the logging to go to a particular socket open
on the monitor, wherever it is currently running.

The ZK lock is currently being used to restrict monitor functionality to a
single monitor instance. But, this isn't really necessary. There isn't any
reason to restrict concurrent monitors. The real purpose of the ZK lock, as
I've described, is to hijack the ZK lock mechanism because it's also a
service-advertisement feature.

This is a bit convoluted and makes a lot of assumptions, and has a lot of
issues. It is also could be impeding some possible avenues of
simplification under ACCUMULO-3005.

1. It locks us in to using Log4J (probably a specific range of versions).
2. It sends logs across the network insecurely.
3. It assumes that you only want a single monitor service running.
4. Code assumes particular configuration with particular variable's
embedded in them.
5. Extra threads needed to track changes
6. It adds code complexity.
7. It assumes the user wishes to use the monitor to watch logs, when other
tools are better suited for log aggregation, monitoring, and alerting.

This whole thing would be simpler if we just eliminated the monitor
log-watching feature entirely. However, we also have some options short of
that. For example, we could (1) use a different service-advertisement
mechanism that doesn't lock. (2). We could stop doing this variable
injection thing, and still leave the socket appender running, so that users
will have to configure their destination in their own configuration files,
if they want to emit logs to that socket. (3) We could use an Accumulo RPC
to send logs, rather than the log4j API. Lots of options.

What do you think?

-- 
Christopher

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Billie Rinaldi <bi...@gmail.com>.
On Tue, Feb 7, 2017 at 8:40 AM, Christopher <ct...@apache.org> wrote:

> On Tue, Feb 7, 2017 at 11:02 AM Billie Rinaldi <bi...@gmail.com>
> wrote:
>
> > I am a fan of the log-forwarding feature. I wouldn't mind if there were
> an
> > option to turn it off for users that don't want it. I also wouldn't mind
> if
> > someone wanted to improve how the log-forwarding mechanism is
> implemented.
> > Option (2) doesn't work for me, though. I need the log-forwarding
> > destination(s) to be discovered automatically rather than requiring it to
> > be known in advance / configured manually by the user.
> >
> >
> Using DNS (CNAME) for the monitor destination might be a better solution
> for auto-routing to the current monitor, rather than the auto-discovery
> happening inside Accumulo. The DNS solution won't work for random ports...
> but I'm not sure what the use is for a monitor that isn't served to a
> well-known location.


Past uses of auto-discovering a random port have been for running on YARN
and for ITs. I don't think this is necessary for YARN/Slider anymore.


> A custom appender which does the logic (rather than
> the standard socket appender, custom properties, and tserver logic) would
> also work, if the services are properly advertised.
>

I think a custom appender would be fine.


>
> What are your thoughts on these?
>
>
> > Seems like improving log forwarding is a separate (but related) issue
> from
> > whether we want to add the feature of supporting multiple monitors. To
> > implement that feature, we'd have to decide whether we'd want to forward
> to
> > all monitors, or just forward to the first monitor, or whatever. If we do
> > implement the multiple monitors feature, I think we should add a
> "Monitors"
> > tab to the monitor, so that users will be able to monitor their monitors.
> > :) (Seriously.)
> >
> >
> I'm leaning toward thinking that we shouldn't try to get into the business
> of supporting the logic of how many, and which ones, to forward to.
>
> From this discussion, I'm thinking:
>
> 1. Make the service discovery specifically for the log4j service, and make
> it optional. Turning off log4j service will also prevent registration in
> ZK. This would also make log4j an optional dependency of the monitor. When
> the feature is disabled, /log in the monitor will document that it is
> disabled.
>
2. Create a custom log4j appender which does the logic of looking up the
> advertised monitors, and decides whether to send to one or them all.
> 3. Disable all these by default with the example configs, but provide blog
> post explaining how to enable.
>

I'd prefer enabled by default. :)


>
> Does that sound reasonable?
>
>
> > On Mon, Feb 6, 2017 at 2:04 PM, Christopher <ct...@apache.org> wrote:
> >
> > > On Mon, Feb 6, 2017 at 12:15 PM Josh Elser <jo...@gmail.com>
> wrote:
> > >
> > > -1 for removal of the log-forwarding which does not include an
> > > out-of-the-box replacement. I'm going to avoid saying any more on this
> > > unless necessary.
> > >
> > >
> > > If we were to opt for removal (perhaps by agreeing that it were out of
> > > scope), I'd personally prefer it be phased out gradually, as users
> gained
> > > more experience integrating with superior tooling for distributed log
> > > aggregation. I think too many people rely on it right now to remove it
> > > abruptly.
> > >
> > >
> > > Most of the other things you bring up seem like you poking holes in how
> > > the log aggregation works presently. I don't think anyone would
> disagree
> > > that this could be made better. Instead of focusing on this, why not
> > > approach your issues by instead suggesting how you'd like to see it
> work
> > > and weigh the pros/cons?
> > >
> > >
> > > Yes, this discussion is about both whether, and how, we might be able
> to
> > > make it better. I haven't proposed a solution, because I don't have one
> > at
> > > this time. Hence, my desire to [DISCUSS] it.
> > >
> > >
> > > Christopher wrote:
> > > > Currently, the monitor has an HA-standby behavior as a side-effect of
> > the
> > > > way it handles logging.
> > > >
> > > > The way this works is that the other services read the monitor's lock
> > in
> > > ZK
> > > > to get the current active monitor's host and port. They use this to
> set
> > > > some system properties, which are referenced in the default example
> > > > generic_logger.xml config file. They set these system properties and
> > > reset
> > > > the log4j system whenever they detect a change in the active monitor.
> > > This
> > > > has the effect of forcing all the logging to go to a particular
> socket
> > > open
> > > > on the monitor, wherever it is currently running.
> > >
> > > If you'd like to change the implementation, great.
> > >
> > >
> > > Right. So, this is a discuss thread about how we might want to change
> > it. I
> > > don't have a good sense about what avenues are going to cause
> headaches.
> > >
> > >
> > > As a point of reference, the Master's ZK lock is also used for service
> > > discovery as well as distributed exclusion (a lock). I'm not sure why
> > > you find an issue with this.
> > >
> > >
> > > I don't find an issue with it. I recognize it does both. I stated
> that. I
> > > also recognize that it must do both for certain services. My point was
> > that
> > > only one of those aspects (service discovery), rather than both, was
> > needed
> > > for the monitor service. The other aspect (distributed exclusion) is an
> > > unnecessary side-effect which cripples redundant monitors because of
> this
> > > (optional) log receiving feature.
> > >
> > >
> > > > The ZK lock is currently being used to restrict monitor functionality
> > to
> > > a
> > > > single monitor instance. But, this isn't really necessary. There
> isn't
> > > any
> > > > reason to restrict concurrent monitors. The real purpose of the ZK
> > lock,
> > > as
> > > > I've described, is to hijack the ZK lock mechanism because it's also
> a
> > > > service-advertisement feature.
> > >
> > > The use of the ZK lock is to prevent users from accessing an instance
> of
> > > the Monitor which is not actually receiving the forwarded logs from the
> > > server. It would be terrible if half of an Ops team saw no errors on
> the
> > > monitor they went to while the other half saw the real errors.
> > >
> > >
> > > Yes. One solution would be to limit the distributed exclusion to just
> the
> > > log feature (along with any relevant messages), and leave the rest of
> the
> > > redundant monitor functional.
> > >
> > >
> > > > This is a bit convoluted and makes a lot of assumptions, and has a
> lot
> > of
> > > > issues. It is also could be impeding some possible avenues of
> > > > simplification under ACCUMULO-3005.
> > > >
> > > > 1. It locks us in to using Log4J (probably a specific range of
> > versions).
> > > > 2. It sends logs across the network insecurely.
> > >
> > > These seem like implementation details to me.
> > >
> > >
> > > Yes, but it's an inherent problem with the asynchronous log appender,
> > which
> > > speaks directly to its suitability for running a custom log receiver.
> > >
> > >
> > >
> > > > 3. It assumes that you only want a single monitor service running.
> > >
> > > Again, an incorrect assumption.
> > >
> > >
> > > It prevents you from running more than one concurrently. The incorrect
> > > assumption that you want this is occurring on the part of the current
> > > implementation, based on its fundamental design.
> > >
> > >
> > > > 4. Code assumes particular configuration with particular variable's
> > > > embedded in them.
> > >
> > > Implementation detail
> > >
> > >
> > > Well, yes... this thread is a discuss thread about changing
> > implementation
> > > in some way... so....
> > >
> > >
> > > > 5. Extra threads needed to track changes
> > >
> > >
> > >
> > > Implementation detail
> > >
> > >
> > > Yep.
> > >
> > >
> > >
> > > > 6. It adds code complexity.
> > >
> > > ??? I don't even know how to address this one.
> > >
> > >
> > > Responding to every point is optional. ;)
> > > But, seriously, my point here is that we often seem to throw the
> kitchen
> > > sink into our code, introducing unnecessary complexity, rather than
> > > encourage integration. This complexity makes integration harder, and
> the
> > > code less maintainable. While I don't yet know what my preferred
> solution
> > > is... I strongly suspect it is one that is both workable and less
> > complex.
> > >
> > >
> > > > 7. It assumes the user wishes to use the monitor to watch logs, when
> > > other
> > > > tools are better suited for log aggregation, monitoring, and
> alerting.
> > >
> > > Your argument assumes that *all* users have such a capability set up.
> > > This feature does not preclude users from doing whatever they'd like.
> > >
> > >
> > > It's not an argument. I'm not championing a position. That point was
> > simply
> > > an observation of the current default behavior, and the recognition
> that
> > > dedicated tools for this exist. I'm not assuming *any* users have any
> > > alternate capabilities set up, never mind "*all*". Whether or not users
> > > have such capabilities set up says nothing about whether we'd prefer to
> > > roll our own or help users integrate our product into alternatives. We
> > need
> > > not know about their capabilities in order to adopt a preferred
> > philosophy
> > > on this matter.
> > >
> > > And, it actually does preclude users from doing whatever they like. It
> > > imposes at least the following limitations on users: It's not possible
> > > (currently) for users to run the monitor without also listing on an
> > extra,
> > > insecure, port, awaiting receipt of arbitrary log data. It's not
> possible
> > > for users to run two monitors concurrently and get functionality out of
> > > both. It's not possible for users to avoid log system resets, which
> could
> > > be expensive (performance) and could cause log messages to be dropped
> > while
> > > it's reloading. It's not possible for users to be able to predict when
> > the
> > > logging configuration will be reloaded (and changes take effect) on the
> > > basis of some new instance of the monitor being started somewhere.
> > >
> > >
> > > > This whole thing would be simpler if we just eliminated the monitor
> > > > log-watching feature entirely. However, we also have some options
> short
> > > of
> > > > that. For example, we could (1) use a different service-advertisement
> > > > mechanism that doesn't lock.
> > >
> > > Are you talking about using a persistent ZNode to advertise the Monitor
> > > location? What happens when there would be multiple monitors?
> > >
> > >
> > > Not necessarily a persistent ZNode. We can advertise as many monitors
> as
> > we
> > > like, just as we currently do with tservers. This would make monitors
> > > available for log receipt, but the decision about who and how many to
> > send
> > > to could be left to the user, perhaps with a custom log appender which
> is
> > > configurable.
> > >
> > >
> > > (2). We could stop doing this variable
> > > > injection thing, and still leave the socket appender running, so that
> > > users
> > > > will have to configure their destination in their own configuration
> > > files,
> > > > if they want to emit logs to that socket. (3) We could use an
> Accumulo
> > > RPC
> > > > to send logs, rather than the log4j API. Lots of options.
> > >
> > > Instead of #3, aren't there other "standard" log collection APIs that
> > > would make more sense than us rolling our own?
> > >
> > >
> > >
> > > Certainly. This is not an exhaustive list of options, nor is it a
> > proposal
> > > to go with any one of them in particular. It's just a discussion. I'm
> not
> > > necessarily a fan of this option. It just occurred to me because I
> > thought
> > > we could provide the feature at the same level of security as the rest
> of
> > > our RPCs (for instance, when using TLS or KRB).
> > >
> > > --
> > > Christopher
> > >
> >
> --
> Christopher
>

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Christopher <ct...@apache.org>.
To wrap this thread up, I'm basically going to do ACCUMULO-4409, and leave
the monitor registration in place for the active monitor.

On Tue, Feb 7, 2017 at 6:09 PM Christopher <ct...@apache.org> wrote:

> On Tue, Feb 7, 2017 at 12:17 PM Josh Elser <jo...@gmail.com> wrote:
>
> Christopher wrote:
> > OnTue, Feb 7, 2017  at 11:02 AM  Billie Rinaldi<billie.rinaldi@gmail.com
> >
> > wrote:
> >
> >> >  I am a fan of the log-forwarding feature. I wouldn't mind if there
> were an
> >> >  option to turn it off for users that don't want it. I also wouldn't
> mind if
> >> >  someone wanted to improve how the log-forwarding mechanism is
> implemented.
> >> >  Option (2) doesn't work for me, though. I need the log-forwarding
> >> >  destination(s) to be discovered automatically rather than requiring
> it to
> >> >  be known in advance / configured manually by the user.
> >> >
> >> >
> > Using DNS (CNAME) for the monitor destination might be a better solution
> > for auto-routing to the current monitor, rather than the auto-discovery
> > happening inside Accumulo. The DNS solution won't work for random
> ports...
> > but I'm not sure what the use is for a monitor that isn't served to a
> > well-known location. A custom appender which does the logic (rather than
> > the standard socket appender, custom properties, and tserver logic) would
> > also work, if the services are properly advertised.
> >
> > What are your thoughts on these?
> >
>
> I don't know much about DNS, but I know enough to say that trying to
> provide a DNS-based approach would be a bigger burden than what we have
> now (security concerns and deployment into "enterprise" environments).
>
>
> If we tried to do it, yes, it would be a bigger burden. But, I'm not sure
> if I agree that would be the case for external services. It's pretty easy
> to update /etc/hosts, to write a name service plugin for nsswitch.conf, or
> to use a dynamic DNS service, etc. Lots of external options for this sort
> of thing. If we were to recommend this option, it'd basically be a
> recommendation for users to look outside of our project for such a solution.
>
>
> Also, Billie can probably speak to this better than I can, but with DNS
> (at least in Java), you'll also have to deal with caching of stale
> mappings.
>
>
> http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html
>
>
> Yeah, another colleague pointed this out to me, and I was shocked that
> Java did this at all... it seems crazy to me that they wouldn't just defer
> to the OS for all DNS resolution, which would inherently use OS preferences
> for DNS caching (which typically respects TTL). At the very least, it could
> do DNS caching which respects TTL. I have no idea what the JVM developers
> could have been thinking here. (Maybe attempting to achieve cross-platform
> network support in an era prior to widespread OS support for networking?
> Maybe this is all just legacy stuff... still, it seems crazy to me.)
>
> --
> Christopher
>
-- 
Christopher

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Christopher <ct...@apache.org>.
On Tue, Feb 7, 2017 at 12:17 PM Josh Elser <jo...@gmail.com> wrote:

> Christopher wrote:
> > OnTue, Feb 7, 2017  at 11:02 AM  Billie Rinaldi<billie.rinaldi@gmail.com
> >
> > wrote:
> >
> >> >  I am a fan of the log-forwarding feature. I wouldn't mind if there
> were an
> >> >  option to turn it off for users that don't want it. I also wouldn't
> mind if
> >> >  someone wanted to improve how the log-forwarding mechanism is
> implemented.
> >> >  Option (2) doesn't work for me, though. I need the log-forwarding
> >> >  destination(s) to be discovered automatically rather than requiring
> it to
> >> >  be known in advance / configured manually by the user.
> >> >
> >> >
> > Using DNS (CNAME) for the monitor destination might be a better solution
> > for auto-routing to the current monitor, rather than the auto-discovery
> > happening inside Accumulo. The DNS solution won't work for random
> ports...
> > but I'm not sure what the use is for a monitor that isn't served to a
> > well-known location. A custom appender which does the logic (rather than
> > the standard socket appender, custom properties, and tserver logic) would
> > also work, if the services are properly advertised.
> >
> > What are your thoughts on these?
> >
>
> I don't know much about DNS, but I know enough to say that trying to
> provide a DNS-based approach would be a bigger burden than what we have
> now (security concerns and deployment into "enterprise" environments).
>
>
If we tried to do it, yes, it would be a bigger burden. But, I'm not sure
if I agree that would be the case for external services. It's pretty easy
to update /etc/hosts, to write a name service plugin for nsswitch.conf, or
to use a dynamic DNS service, etc. Lots of external options for this sort
of thing. If we were to recommend this option, it'd basically be a
recommendation for users to look outside of our project for such a solution.


> Also, Billie can probably speak to this better than I can, but with DNS
> (at least in Java), you'll also have to deal with caching of stale
> mappings.
>
>
> http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html


Yeah, another colleague pointed this out to me, and I was shocked that Java
did this at all... it seems crazy to me that they wouldn't just defer to
the OS for all DNS resolution, which would inherently use OS preferences
for DNS caching (which typically respects TTL). At the very least, it could
do DNS caching which respects TTL. I have no idea what the JVM developers
could have been thinking here. (Maybe attempting to achieve cross-platform
network support in an era prior to widespread OS support for networking?
Maybe this is all just legacy stuff... still, it seems crazy to me.)

-- 
Christopher

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Josh Elser <jo...@gmail.com>.
Christopher wrote:
> OnTue, Feb 7, 2017  at 11:02 AM  Billie Rinaldi<bi...@gmail.com>
> wrote:
>
>> >  I am a fan of the log-forwarding feature. I wouldn't mind if there were an
>> >  option to turn it off for users that don't want it. I also wouldn't mind if
>> >  someone wanted to improve how the log-forwarding mechanism is implemented.
>> >  Option (2) doesn't work for me, though. I need the log-forwarding
>> >  destination(s) to be discovered automatically rather than requiring it to
>> >  be known in advance / configured manually by the user.
>> >
>> >
> Using DNS (CNAME) for the monitor destination might be a better solution
> for auto-routing to the current monitor, rather than the auto-discovery
> happening inside Accumulo. The DNS solution won't work for random ports...
> but I'm not sure what the use is for a monitor that isn't served to a
> well-known location. A custom appender which does the logic (rather than
> the standard socket appender, custom properties, and tserver logic) would
> also work, if the services are properly advertised.
>
> What are your thoughts on these?
>

I don't know much about DNS, but I know enough to say that trying to 
provide a DNS-based approach would be a bigger burden than what we have 
now (security concerns and deployment into "enterprise" environments).

Also, Billie can probably speak to this better than I can, but with DNS 
(at least in Java), you'll also have to deal with caching of stale 
mappings.

http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Christopher <ct...@apache.org>.
On Tue, Feb 7, 2017 at 11:02 AM Billie Rinaldi <bi...@gmail.com>
wrote:

> I am a fan of the log-forwarding feature. I wouldn't mind if there were an
> option to turn it off for users that don't want it. I also wouldn't mind if
> someone wanted to improve how the log-forwarding mechanism is implemented.
> Option (2) doesn't work for me, though. I need the log-forwarding
> destination(s) to be discovered automatically rather than requiring it to
> be known in advance / configured manually by the user.
>
>
Using DNS (CNAME) for the monitor destination might be a better solution
for auto-routing to the current monitor, rather than the auto-discovery
happening inside Accumulo. The DNS solution won't work for random ports...
but I'm not sure what the use is for a monitor that isn't served to a
well-known location. A custom appender which does the logic (rather than
the standard socket appender, custom properties, and tserver logic) would
also work, if the services are properly advertised.

What are your thoughts on these?


> Seems like improving log forwarding is a separate (but related) issue from
> whether we want to add the feature of supporting multiple monitors. To
> implement that feature, we'd have to decide whether we'd want to forward to
> all monitors, or just forward to the first monitor, or whatever. If we do
> implement the multiple monitors feature, I think we should add a "Monitors"
> tab to the monitor, so that users will be able to monitor their monitors.
> :) (Seriously.)
>
>
I'm leaning toward thinking that we shouldn't try to get into the business
of supporting the logic of how many, and which ones, to forward to.

From this discussion, I'm thinking:

1. Make the service discovery specifically for the log4j service, and make
it optional. Turning off log4j service will also prevent registration in
ZK. This would also make log4j an optional dependency of the monitor. When
the feature is disabled, /log in the monitor will document that it is
disabled.
2. Create a custom log4j appender which does the logic of looking up the
advertised monitors, and decides whether to send to one or them all.
3. Disable all these by default with the example configs, but provide blog
post explaining how to enable.

Does that sound reasonable?


> On Mon, Feb 6, 2017 at 2:04 PM, Christopher <ct...@apache.org> wrote:
>
> > On Mon, Feb 6, 2017 at 12:15 PM Josh Elser <jo...@gmail.com> wrote:
> >
> > -1 for removal of the log-forwarding which does not include an
> > out-of-the-box replacement. I'm going to avoid saying any more on this
> > unless necessary.
> >
> >
> > If we were to opt for removal (perhaps by agreeing that it were out of
> > scope), I'd personally prefer it be phased out gradually, as users gained
> > more experience integrating with superior tooling for distributed log
> > aggregation. I think too many people rely on it right now to remove it
> > abruptly.
> >
> >
> > Most of the other things you bring up seem like you poking holes in how
> > the log aggregation works presently. I don't think anyone would disagree
> > that this could be made better. Instead of focusing on this, why not
> > approach your issues by instead suggesting how you'd like to see it work
> > and weigh the pros/cons?
> >
> >
> > Yes, this discussion is about both whether, and how, we might be able to
> > make it better. I haven't proposed a solution, because I don't have one
> at
> > this time. Hence, my desire to [DISCUSS] it.
> >
> >
> > Christopher wrote:
> > > Currently, the monitor has an HA-standby behavior as a side-effect of
> the
> > > way it handles logging.
> > >
> > > The way this works is that the other services read the monitor's lock
> in
> > ZK
> > > to get the current active monitor's host and port. They use this to set
> > > some system properties, which are referenced in the default example
> > > generic_logger.xml config file. They set these system properties and
> > reset
> > > the log4j system whenever they detect a change in the active monitor.
> > This
> > > has the effect of forcing all the logging to go to a particular socket
> > open
> > > on the monitor, wherever it is currently running.
> >
> > If you'd like to change the implementation, great.
> >
> >
> > Right. So, this is a discuss thread about how we might want to change
> it. I
> > don't have a good sense about what avenues are going to cause headaches.
> >
> >
> > As a point of reference, the Master's ZK lock is also used for service
> > discovery as well as distributed exclusion (a lock). I'm not sure why
> > you find an issue with this.
> >
> >
> > I don't find an issue with it. I recognize it does both. I stated that. I
> > also recognize that it must do both for certain services. My point was
> that
> > only one of those aspects (service discovery), rather than both, was
> needed
> > for the monitor service. The other aspect (distributed exclusion) is an
> > unnecessary side-effect which cripples redundant monitors because of this
> > (optional) log receiving feature.
> >
> >
> > > The ZK lock is currently being used to restrict monitor functionality
> to
> > a
> > > single monitor instance. But, this isn't really necessary. There isn't
> > any
> > > reason to restrict concurrent monitors. The real purpose of the ZK
> lock,
> > as
> > > I've described, is to hijack the ZK lock mechanism because it's also a
> > > service-advertisement feature.
> >
> > The use of the ZK lock is to prevent users from accessing an instance of
> > the Monitor which is not actually receiving the forwarded logs from the
> > server. It would be terrible if half of an Ops team saw no errors on the
> > monitor they went to while the other half saw the real errors.
> >
> >
> > Yes. One solution would be to limit the distributed exclusion to just the
> > log feature (along with any relevant messages), and leave the rest of the
> > redundant monitor functional.
> >
> >
> > > This is a bit convoluted and makes a lot of assumptions, and has a lot
> of
> > > issues. It is also could be impeding some possible avenues of
> > > simplification under ACCUMULO-3005.
> > >
> > > 1. It locks us in to using Log4J (probably a specific range of
> versions).
> > > 2. It sends logs across the network insecurely.
> >
> > These seem like implementation details to me.
> >
> >
> > Yes, but it's an inherent problem with the asynchronous log appender,
> which
> > speaks directly to its suitability for running a custom log receiver.
> >
> >
> >
> > > 3. It assumes that you only want a single monitor service running.
> >
> > Again, an incorrect assumption.
> >
> >
> > It prevents you from running more than one concurrently. The incorrect
> > assumption that you want this is occurring on the part of the current
> > implementation, based on its fundamental design.
> >
> >
> > > 4. Code assumes particular configuration with particular variable's
> > > embedded in them.
> >
> > Implementation detail
> >
> >
> > Well, yes... this thread is a discuss thread about changing
> implementation
> > in some way... so....
> >
> >
> > > 5. Extra threads needed to track changes
> >
> >
> >
> > Implementation detail
> >
> >
> > Yep.
> >
> >
> >
> > > 6. It adds code complexity.
> >
> > ??? I don't even know how to address this one.
> >
> >
> > Responding to every point is optional. ;)
> > But, seriously, my point here is that we often seem to throw the kitchen
> > sink into our code, introducing unnecessary complexity, rather than
> > encourage integration. This complexity makes integration harder, and the
> > code less maintainable. While I don't yet know what my preferred solution
> > is... I strongly suspect it is one that is both workable and less
> complex.
> >
> >
> > > 7. It assumes the user wishes to use the monitor to watch logs, when
> > other
> > > tools are better suited for log aggregation, monitoring, and alerting.
> >
> > Your argument assumes that *all* users have such a capability set up.
> > This feature does not preclude users from doing whatever they'd like.
> >
> >
> > It's not an argument. I'm not championing a position. That point was
> simply
> > an observation of the current default behavior, and the recognition that
> > dedicated tools for this exist. I'm not assuming *any* users have any
> > alternate capabilities set up, never mind "*all*". Whether or not users
> > have such capabilities set up says nothing about whether we'd prefer to
> > roll our own or help users integrate our product into alternatives. We
> need
> > not know about their capabilities in order to adopt a preferred
> philosophy
> > on this matter.
> >
> > And, it actually does preclude users from doing whatever they like. It
> > imposes at least the following limitations on users: It's not possible
> > (currently) for users to run the monitor without also listing on an
> extra,
> > insecure, port, awaiting receipt of arbitrary log data. It's not possible
> > for users to run two monitors concurrently and get functionality out of
> > both. It's not possible for users to avoid log system resets, which could
> > be expensive (performance) and could cause log messages to be dropped
> while
> > it's reloading. It's not possible for users to be able to predict when
> the
> > logging configuration will be reloaded (and changes take effect) on the
> > basis of some new instance of the monitor being started somewhere.
> >
> >
> > > This whole thing would be simpler if we just eliminated the monitor
> > > log-watching feature entirely. However, we also have some options short
> > of
> > > that. For example, we could (1) use a different service-advertisement
> > > mechanism that doesn't lock.
> >
> > Are you talking about using a persistent ZNode to advertise the Monitor
> > location? What happens when there would be multiple monitors?
> >
> >
> > Not necessarily a persistent ZNode. We can advertise as many monitors as
> we
> > like, just as we currently do with tservers. This would make monitors
> > available for log receipt, but the decision about who and how many to
> send
> > to could be left to the user, perhaps with a custom log appender which is
> > configurable.
> >
> >
> > (2). We could stop doing this variable
> > > injection thing, and still leave the socket appender running, so that
> > users
> > > will have to configure their destination in their own configuration
> > files,
> > > if they want to emit logs to that socket. (3) We could use an Accumulo
> > RPC
> > > to send logs, rather than the log4j API. Lots of options.
> >
> > Instead of #3, aren't there other "standard" log collection APIs that
> > would make more sense than us rolling our own?
> >
> >
> >
> > Certainly. This is not an exhaustive list of options, nor is it a
> proposal
> > to go with any one of them in particular. It's just a discussion. I'm not
> > necessarily a fan of this option. It just occurred to me because I
> thought
> > we could provide the feature at the same level of security as the rest of
> > our RPCs (for instance, when using TLS or KRB).
> >
> > --
> > Christopher
> >
>
-- 
Christopher

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Billie Rinaldi <bi...@gmail.com>.
I am a fan of the log-forwarding feature. I wouldn't mind if there were an
option to turn it off for users that don't want it. I also wouldn't mind if
someone wanted to improve how the log-forwarding mechanism is implemented.
Option (2) doesn't work for me, though. I need the log-forwarding
destination(s) to be discovered automatically rather than requiring it to
be known in advance / configured manually by the user.

Seems like improving log forwarding is a separate (but related) issue from
whether we want to add the feature of supporting multiple monitors. To
implement that feature, we'd have to decide whether we'd want to forward to
all monitors, or just forward to the first monitor, or whatever. If we do
implement the multiple monitors feature, I think we should add a "Monitors"
tab to the monitor, so that users will be able to monitor their monitors.
:) (Seriously.)

On Mon, Feb 6, 2017 at 2:04 PM, Christopher <ct...@apache.org> wrote:

> On Mon, Feb 6, 2017 at 12:15 PM Josh Elser <jo...@gmail.com> wrote:
>
> -1 for removal of the log-forwarding which does not include an
> out-of-the-box replacement. I'm going to avoid saying any more on this
> unless necessary.
>
>
> If we were to opt for removal (perhaps by agreeing that it were out of
> scope), I'd personally prefer it be phased out gradually, as users gained
> more experience integrating with superior tooling for distributed log
> aggregation. I think too many people rely on it right now to remove it
> abruptly.
>
>
> Most of the other things you bring up seem like you poking holes in how
> the log aggregation works presently. I don't think anyone would disagree
> that this could be made better. Instead of focusing on this, why not
> approach your issues by instead suggesting how you'd like to see it work
> and weigh the pros/cons?
>
>
> Yes, this discussion is about both whether, and how, we might be able to
> make it better. I haven't proposed a solution, because I don't have one at
> this time. Hence, my desire to [DISCUSS] it.
>
>
> Christopher wrote:
> > Currently, the monitor has an HA-standby behavior as a side-effect of the
> > way it handles logging.
> >
> > The way this works is that the other services read the monitor's lock in
> ZK
> > to get the current active monitor's host and port. They use this to set
> > some system properties, which are referenced in the default example
> > generic_logger.xml config file. They set these system properties and
> reset
> > the log4j system whenever they detect a change in the active monitor.
> This
> > has the effect of forcing all the logging to go to a particular socket
> open
> > on the monitor, wherever it is currently running.
>
> If you'd like to change the implementation, great.
>
>
> Right. So, this is a discuss thread about how we might want to change it. I
> don't have a good sense about what avenues are going to cause headaches.
>
>
> As a point of reference, the Master's ZK lock is also used for service
> discovery as well as distributed exclusion (a lock). I'm not sure why
> you find an issue with this.
>
>
> I don't find an issue with it. I recognize it does both. I stated that. I
> also recognize that it must do both for certain services. My point was that
> only one of those aspects (service discovery), rather than both, was needed
> for the monitor service. The other aspect (distributed exclusion) is an
> unnecessary side-effect which cripples redundant monitors because of this
> (optional) log receiving feature.
>
>
> > The ZK lock is currently being used to restrict monitor functionality to
> a
> > single monitor instance. But, this isn't really necessary. There isn't
> any
> > reason to restrict concurrent monitors. The real purpose of the ZK lock,
> as
> > I've described, is to hijack the ZK lock mechanism because it's also a
> > service-advertisement feature.
>
> The use of the ZK lock is to prevent users from accessing an instance of
> the Monitor which is not actually receiving the forwarded logs from the
> server. It would be terrible if half of an Ops team saw no errors on the
> monitor they went to while the other half saw the real errors.
>
>
> Yes. One solution would be to limit the distributed exclusion to just the
> log feature (along with any relevant messages), and leave the rest of the
> redundant monitor functional.
>
>
> > This is a bit convoluted and makes a lot of assumptions, and has a lot of
> > issues. It is also could be impeding some possible avenues of
> > simplification under ACCUMULO-3005.
> >
> > 1. It locks us in to using Log4J (probably a specific range of versions).
> > 2. It sends logs across the network insecurely.
>
> These seem like implementation details to me.
>
>
> Yes, but it's an inherent problem with the asynchronous log appender, which
> speaks directly to its suitability for running a custom log receiver.
>
>
>
> > 3. It assumes that you only want a single monitor service running.
>
> Again, an incorrect assumption.
>
>
> It prevents you from running more than one concurrently. The incorrect
> assumption that you want this is occurring on the part of the current
> implementation, based on its fundamental design.
>
>
> > 4. Code assumes particular configuration with particular variable's
> > embedded in them.
>
> Implementation detail
>
>
> Well, yes... this thread is a discuss thread about changing implementation
> in some way... so....
>
>
> > 5. Extra threads needed to track changes
>
>
>
> Implementation detail
>
>
> Yep.
>
>
>
> > 6. It adds code complexity.
>
> ??? I don't even know how to address this one.
>
>
> Responding to every point is optional. ;)
> But, seriously, my point here is that we often seem to throw the kitchen
> sink into our code, introducing unnecessary complexity, rather than
> encourage integration. This complexity makes integration harder, and the
> code less maintainable. While I don't yet know what my preferred solution
> is... I strongly suspect it is one that is both workable and less complex.
>
>
> > 7. It assumes the user wishes to use the monitor to watch logs, when
> other
> > tools are better suited for log aggregation, monitoring, and alerting.
>
> Your argument assumes that *all* users have such a capability set up.
> This feature does not preclude users from doing whatever they'd like.
>
>
> It's not an argument. I'm not championing a position. That point was simply
> an observation of the current default behavior, and the recognition that
> dedicated tools for this exist. I'm not assuming *any* users have any
> alternate capabilities set up, never mind "*all*". Whether or not users
> have such capabilities set up says nothing about whether we'd prefer to
> roll our own or help users integrate our product into alternatives. We need
> not know about their capabilities in order to adopt a preferred philosophy
> on this matter.
>
> And, it actually does preclude users from doing whatever they like. It
> imposes at least the following limitations on users: It's not possible
> (currently) for users to run the monitor without also listing on an extra,
> insecure, port, awaiting receipt of arbitrary log data. It's not possible
> for users to run two monitors concurrently and get functionality out of
> both. It's not possible for users to avoid log system resets, which could
> be expensive (performance) and could cause log messages to be dropped while
> it's reloading. It's not possible for users to be able to predict when the
> logging configuration will be reloaded (and changes take effect) on the
> basis of some new instance of the monitor being started somewhere.
>
>
> > This whole thing would be simpler if we just eliminated the monitor
> > log-watching feature entirely. However, we also have some options short
> of
> > that. For example, we could (1) use a different service-advertisement
> > mechanism that doesn't lock.
>
> Are you talking about using a persistent ZNode to advertise the Monitor
> location? What happens when there would be multiple monitors?
>
>
> Not necessarily a persistent ZNode. We can advertise as many monitors as we
> like, just as we currently do with tservers. This would make monitors
> available for log receipt, but the decision about who and how many to send
> to could be left to the user, perhaps with a custom log appender which is
> configurable.
>
>
> (2). We could stop doing this variable
> > injection thing, and still leave the socket appender running, so that
> users
> > will have to configure their destination in their own configuration
> files,
> > if they want to emit logs to that socket. (3) We could use an Accumulo
> RPC
> > to send logs, rather than the log4j API. Lots of options.
>
> Instead of #3, aren't there other "standard" log collection APIs that
> would make more sense than us rolling our own?
>
>
>
> Certainly. This is not an exhaustive list of options, nor is it a proposal
> to go with any one of them in particular. It's just a discussion. I'm not
> necessarily a fan of this option. It just occurred to me because I thought
> we could provide the feature at the same level of security as the rest of
> our RPCs (for instance, when using TLS or KRB).
>
> --
> Christopher
>

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Josh Elser <jo...@gmail.com>.
Avoiding trying to respond inline as your last reply didn't format well 
(not sure if my side or your side). So, sorry for a top-post.

I have no strong opinions on what technology is used to improve log 
aggregation. I would encourage whoever wants to take up this work to: do 
the research, present some facts about what exists, and present an plan 
for which software/approach is "best" for what we have to do. Many of 
the points you raised on "things bad in current implementation" can be 
re-stated as "could be better" and I would agreed with them.

No offense, but further discussion at this level just feels fruitless to 
me as I don't think anyone has ever argued *against* not owning 
log-aggregation (but I could be wrong). IMO to move this forward, it 
requires concrete suggestions.

Christopher wrote:
> On Mon, Feb 6, 2017 at 12:15 PM Josh Elser<jo...@gmail.com>  wrote:
>
> -1 for removal of the log-forwarding which does not include an
> out-of-the-box replacement. I'm going to avoid saying any more on this
> unless necessary.
>
>
> If we were to opt for removal (perhaps by agreeing that it were out of
> scope), I'd personally prefer it be phased out gradually, as users gained
> more experience integrating with superior tooling for distributed log
> aggregation. I think too many people rely on it right now to remove it
> abruptly.
>
>
> Most of the other things you bring up seem like you poking holes in how
> the log aggregation works presently. I don't think anyone would disagree
> that this could be made better. Instead of focusing on this, why not
> approach your issues by instead suggesting how you'd like to see it work
> and weigh the pros/cons?
>
>
> Yes, this discussion is about both whether, and how, we might be able to
> make it better. I haven't proposed a solution, because I don't have one at
> this time. Hence, my desire to [DISCUSS] it.
>
>
> Christopher wrote:
>> Currently, the monitor has an HA-standby behavior as a side-effect of the
>> way it handles logging.
>>
>> The way this works is that the other services read the monitor's lock in
> ZK
>> to get the current active monitor's host and port. They use this to set
>> some system properties, which are referenced in the default example
>> generic_logger.xml config file. They set these system properties and reset
>> the log4j system whenever they detect a change in the active monitor. This
>> has the effect of forcing all the logging to go to a particular socket
> open
>> on the monitor, wherever it is currently running.
>
> If you'd like to change the implementation, great.
>
>
> Right. So, this is a discuss thread about how we might want to change it. I
> don't have a good sense about what avenues are going to cause headaches.
>
>
> As a point of reference, the Master's ZK lock is also used for service
> discovery as well as distributed exclusion (a lock). I'm not sure why
> you find an issue with this.
>
>
> I don't find an issue with it. I recognize it does both. I stated that. I
> also recognize that it must do both for certain services. My point was that
> only one of those aspects (service discovery), rather than both, was needed
> for the monitor service. The other aspect (distributed exclusion) is an
> unnecessary side-effect which cripples redundant monitors because of this
> (optional) log receiving feature.
>
>
>> The ZK lock is currently being used to restrict monitor functionality to a
>> single monitor instance. But, this isn't really necessary. There isn't any
>> reason to restrict concurrent monitors. The real purpose of the ZK lock,
> as
>> I've described, is to hijack the ZK lock mechanism because it's also a
>> service-advertisement feature.
>
> The use of the ZK lock is to prevent users from accessing an instance of
> the Monitor which is not actually receiving the forwarded logs from the
> server. It would be terrible if half of an Ops team saw no errors on the
> monitor they went to while the other half saw the real errors.
>
>
> Yes. One solution would be to limit the distributed exclusion to just the
> log feature (along with any relevant messages), and leave the rest of the
> redundant monitor functional.
>
>
>> This is a bit convoluted and makes a lot of assumptions, and has a lot of
>> issues. It is also could be impeding some possible avenues of
>> simplification under ACCUMULO-3005.
>>
>> 1. It locks us in to using Log4J (probably a specific range of versions).
>> 2. It sends logs across the network insecurely.
>
> These seem like implementation details to me.
>
>
> Yes, but it's an inherent problem with the asynchronous log appender, which
> speaks directly to its suitability for running a custom log receiver.
>
>
>
>> 3. It assumes that you only want a single monitor service running.
>
> Again, an incorrect assumption.
>
>
> It prevents you from running more than one concurrently. The incorrect
> assumption that you want this is occurring on the part of the current
> implementation, based on its fundamental design.
>
>
>> 4. Code assumes particular configuration with particular variable's
>> embedded in them.
>
> Implementation detail
>
>
> Well, yes... this thread is a discuss thread about changing implementation
> in some way... so....
>
>
>> 5. Extra threads needed to track changes
>
>
>
> Implementation detail
>
>
> Yep.
>
>
>
>> 6. It adds code complexity.
>
> ??? I don't even know how to address this one.
>
>
> Responding to every point is optional. ;)
> But, seriously, my point here is that we often seem to throw the kitchen
> sink into our code, introducing unnecessary complexity, rather than
> encourage integration. This complexity makes integration harder, and the
> code less maintainable. While I don't yet know what my preferred solution
> is... I strongly suspect it is one that is both workable and less complex.
>
>
>> 7. It assumes the user wishes to use the monitor to watch logs, when other
>> tools are better suited for log aggregation, monitoring, and alerting.
>
> Your argument assumes that *all* users have such a capability set up.
> This feature does not preclude users from doing whatever they'd like.
>
>
> It's not an argument. I'm not championing a position. That point was simply
> an observation of the current default behavior, and the recognition that
> dedicated tools for this exist. I'm not assuming *any* users have any
> alternate capabilities set up, never mind "*all*". Whether or not users
> have such capabilities set up says nothing about whether we'd prefer to
> roll our own or help users integrate our product into alternatives. We need
> not know about their capabilities in order to adopt a preferred philosophy
> on this matter.
>
> And, it actually does preclude users from doing whatever they like. It
> imposes at least the following limitations on users: It's not possible
> (currently) for users to run the monitor without also listing on an extra,
> insecure, port, awaiting receipt of arbitrary log data. It's not possible
> for users to run two monitors concurrently and get functionality out of
> both. It's not possible for users to avoid log system resets, which could
> be expensive (performance) and could cause log messages to be dropped while
> it's reloading. It's not possible for users to be able to predict when the
> logging configuration will be reloaded (and changes take effect) on the
> basis of some new instance of the monitor being started somewhere.
>
>
>> This whole thing would be simpler if we just eliminated the monitor
>> log-watching feature entirely. However, we also have some options short of
>> that. For example, we could (1) use a different service-advertisement
>> mechanism that doesn't lock.
>
> Are you talking about using a persistent ZNode to advertise the Monitor
> location? What happens when there would be multiple monitors?
>
>
> Not necessarily a persistent ZNode. We can advertise as many monitors as we
> like, just as we currently do with tservers. This would make monitors
> available for log receipt, but the decision about who and how many to send
> to could be left to the user, perhaps with a custom log appender which is
> configurable.
>
>
> (2). We could stop doing this variable
>> injection thing, and still leave the socket appender running, so that
> users
>> will have to configure their destination in their own configuration files,
>> if they want to emit logs to that socket. (3) We could use an Accumulo RPC
>> to send logs, rather than the log4j API. Lots of options.
>
> Instead of #3, aren't there other "standard" log collection APIs that
> would make more sense than us rolling our own?
>
>
>
> Certainly. This is not an exhaustive list of options, nor is it a proposal
> to go with any one of them in particular. It's just a discussion. I'm not
> necessarily a fan of this option. It just occurred to me because I thought
> we could provide the feature at the same level of security as the rest of
> our RPCs (for instance, when using TLS or KRB).
>

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Christopher <ct...@apache.org>.
On Mon, Feb 6, 2017 at 12:15 PM Josh Elser <jo...@gmail.com> wrote:

-1 for removal of the log-forwarding which does not include an
out-of-the-box replacement. I'm going to avoid saying any more on this
unless necessary.


If we were to opt for removal (perhaps by agreeing that it were out of
scope), I'd personally prefer it be phased out gradually, as users gained
more experience integrating with superior tooling for distributed log
aggregation. I think too many people rely on it right now to remove it
abruptly.


Most of the other things you bring up seem like you poking holes in how
the log aggregation works presently. I don't think anyone would disagree
that this could be made better. Instead of focusing on this, why not
approach your issues by instead suggesting how you'd like to see it work
and weigh the pros/cons?


Yes, this discussion is about both whether, and how, we might be able to
make it better. I haven't proposed a solution, because I don't have one at
this time. Hence, my desire to [DISCUSS] it.


Christopher wrote:
> Currently, the monitor has an HA-standby behavior as a side-effect of the
> way it handles logging.
>
> The way this works is that the other services read the monitor's lock in
ZK
> to get the current active monitor's host and port. They use this to set
> some system properties, which are referenced in the default example
> generic_logger.xml config file. They set these system properties and reset
> the log4j system whenever they detect a change in the active monitor. This
> has the effect of forcing all the logging to go to a particular socket
open
> on the monitor, wherever it is currently running.

If you'd like to change the implementation, great.


Right. So, this is a discuss thread about how we might want to change it. I
don't have a good sense about what avenues are going to cause headaches.


As a point of reference, the Master's ZK lock is also used for service
discovery as well as distributed exclusion (a lock). I'm not sure why
you find an issue with this.


I don't find an issue with it. I recognize it does both. I stated that. I
also recognize that it must do both for certain services. My point was that
only one of those aspects (service discovery), rather than both, was needed
for the monitor service. The other aspect (distributed exclusion) is an
unnecessary side-effect which cripples redundant monitors because of this
(optional) log receiving feature.


> The ZK lock is currently being used to restrict monitor functionality to a
> single monitor instance. But, this isn't really necessary. There isn't any
> reason to restrict concurrent monitors. The real purpose of the ZK lock,
as
> I've described, is to hijack the ZK lock mechanism because it's also a
> service-advertisement feature.

The use of the ZK lock is to prevent users from accessing an instance of
the Monitor which is not actually receiving the forwarded logs from the
server. It would be terrible if half of an Ops team saw no errors on the
monitor they went to while the other half saw the real errors.


Yes. One solution would be to limit the distributed exclusion to just the
log feature (along with any relevant messages), and leave the rest of the
redundant monitor functional.


> This is a bit convoluted and makes a lot of assumptions, and has a lot of
> issues. It is also could be impeding some possible avenues of
> simplification under ACCUMULO-3005.
>
> 1. It locks us in to using Log4J (probably a specific range of versions).
> 2. It sends logs across the network insecurely.

These seem like implementation details to me.


Yes, but it's an inherent problem with the asynchronous log appender, which
speaks directly to its suitability for running a custom log receiver.



> 3. It assumes that you only want a single monitor service running.

Again, an incorrect assumption.


It prevents you from running more than one concurrently. The incorrect
assumption that you want this is occurring on the part of the current
implementation, based on its fundamental design.


> 4. Code assumes particular configuration with particular variable's
> embedded in them.

Implementation detail


Well, yes... this thread is a discuss thread about changing implementation
in some way... so....


> 5. Extra threads needed to track changes



Implementation detail


Yep.



> 6. It adds code complexity.

??? I don't even know how to address this one.


Responding to every point is optional. ;)
But, seriously, my point here is that we often seem to throw the kitchen
sink into our code, introducing unnecessary complexity, rather than
encourage integration. This complexity makes integration harder, and the
code less maintainable. While I don't yet know what my preferred solution
is... I strongly suspect it is one that is both workable and less complex.


> 7. It assumes the user wishes to use the monitor to watch logs, when other
> tools are better suited for log aggregation, monitoring, and alerting.

Your argument assumes that *all* users have such a capability set up.
This feature does not preclude users from doing whatever they'd like.


It's not an argument. I'm not championing a position. That point was simply
an observation of the current default behavior, and the recognition that
dedicated tools for this exist. I'm not assuming *any* users have any
alternate capabilities set up, never mind "*all*". Whether or not users
have such capabilities set up says nothing about whether we'd prefer to
roll our own or help users integrate our product into alternatives. We need
not know about their capabilities in order to adopt a preferred philosophy
on this matter.

And, it actually does preclude users from doing whatever they like. It
imposes at least the following limitations on users: It's not possible
(currently) for users to run the monitor without also listing on an extra,
insecure, port, awaiting receipt of arbitrary log data. It's not possible
for users to run two monitors concurrently and get functionality out of
both. It's not possible for users to avoid log system resets, which could
be expensive (performance) and could cause log messages to be dropped while
it's reloading. It's not possible for users to be able to predict when the
logging configuration will be reloaded (and changes take effect) on the
basis of some new instance of the monitor being started somewhere.


> This whole thing would be simpler if we just eliminated the monitor
> log-watching feature entirely. However, we also have some options short of
> that. For example, we could (1) use a different service-advertisement
> mechanism that doesn't lock.

Are you talking about using a persistent ZNode to advertise the Monitor
location? What happens when there would be multiple monitors?


Not necessarily a persistent ZNode. We can advertise as many monitors as we
like, just as we currently do with tservers. This would make monitors
available for log receipt, but the decision about who and how many to send
to could be left to the user, perhaps with a custom log appender which is
configurable.


(2). We could stop doing this variable
> injection thing, and still leave the socket appender running, so that
users
> will have to configure their destination in their own configuration files,
> if they want to emit logs to that socket. (3) We could use an Accumulo RPC
> to send logs, rather than the log4j API. Lots of options.

Instead of #3, aren't there other "standard" log collection APIs that
would make more sense than us rolling our own?



Certainly. This is not an exhaustive list of options, nor is it a proposal
to go with any one of them in particular. It's just a discussion. I'm not
necessarily a fan of this option. It just occurred to me because I thought
we could provide the feature at the same level of security as the rest of
our RPCs (for instance, when using TLS or KRB).

-- 
Christopher

Re: [DISCUSS] Rethinking monitor log receiver

Posted by Josh Elser <jo...@gmail.com>.
-1 for removal of the log-forwarding which does not include an 
out-of-the-box replacement. I'm going to avoid saying any more on this 
unless necessary.

Most of the other things you bring up seem like you poking holes in how 
the log aggregation works presently. I don't think anyone would disagree 
that this could be made better. Instead of focusing on this, why not 
approach your issues by instead suggesting how you'd like to see it work 
and weigh the pros/cons?

Christopher wrote:
> Currently, the monitor has an HA-standby behavior as a side-effect of the
> way it handles logging.
>
> The way this works is that the other services read the monitor's lock in ZK
> to get the current active monitor's host and port. They use this to set
> some system properties, which are referenced in the default example
> generic_logger.xml config file. They set these system properties and reset
> the log4j system whenever they detect a change in the active monitor. This
> has the effect of forcing all the logging to go to a particular socket open
> on the monitor, wherever it is currently running.

If you'd like to change the implementation, great.

As a point of reference, the Master's ZK lock is also used for service 
discovery as well as distributed exclusion (a lock). I'm not sure why 
you find an issue with this.

> The ZK lock is currently being used to restrict monitor functionality to a
> single monitor instance. But, this isn't really necessary. There isn't any
> reason to restrict concurrent monitors. The real purpose of the ZK lock, as
> I've described, is to hijack the ZK lock mechanism because it's also a
> service-advertisement feature.

The use of the ZK lock is to prevent users from accessing an instance of 
the Monitor which is not actually receiving the forwarded logs from the 
server. It would be terrible if half of an Ops team saw no errors on the 
monitor they went to while the other half saw the real errors.

> This is a bit convoluted and makes a lot of assumptions, and has a lot of
> issues. It is also could be impeding some possible avenues of
> simplification under ACCUMULO-3005.
>
> 1. It locks us in to using Log4J (probably a specific range of versions).
> 2. It sends logs across the network insecurely.

These seem like implementation details to me.

> 3. It assumes that you only want a single monitor service running.

Again, an incorrect assumption.

> 4. Code assumes particular configuration with particular variable's
> embedded in them.

Implementation detail

> 5. Extra threads needed to track changes

Implementation detail

> 6. It adds code complexity.

??? I don't even know how to address this one.

> 7. It assumes the user wishes to use the monitor to watch logs, when other
> tools are better suited for log aggregation, monitoring, and alerting.

Your argument assumes that *all* users have such a capability set up. 
This feature does not preclude users from doing whatever they'd like.

> This whole thing would be simpler if we just eliminated the monitor
> log-watching feature entirely. However, we also have some options short of
> that. For example, we could (1) use a different service-advertisement
> mechanism that doesn't lock.

Are you talking about using a persistent ZNode to advertise the Monitor 
location? What happens when there would be multiple monitors?

(2). We could stop doing this variable
> injection thing, and still leave the socket appender running, so that users
> will have to configure their destination in their own configuration files,
> if they want to emit logs to that socket. (3) We could use an Accumulo RPC
> to send logs, rather than the log4j API. Lots of options.

Instead of #3, aren't there other "standard" log collection APIs that 
would make more sense than us rolling our own?