You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Andrew Hulbert <ah...@ccri.com> on 2020/09/16 00:50:17 UTC

Uniquely identifying Scans

Hi all,

We were looking to uniquely tag a scan with some sort of ID that would 
allow us to map it back to a client...something like a session id or 
something else. The first idea we came up with was to simply set an 
iterator option that could be viewable in listscans in the shell. Basic 
idea would be to map it back to more than just the IP address of the 
client (e.g a user name or something else injected on the client side 
for a client servicing multiple users).

Wondering if anybody else had done this before or if there were any 
thoughts about having "scan metadata" in the future?

Thanks,

Andrew

Re: Uniquely identifying Scans

Posted by Dave Marion <dm...@gmail.com>.

I believe adding a property in the iterator options to identify a scan was
something that we did on a previous project of mine. IIRC list scans showed
the information.

On Wed, Sep 16, 2020 at 12:20 AM Christopher <ct...@apache.org> wrote:

> Hi Andrew,
>
> We currently have the concept of a "scan session", but that is used
> internally only, for continuing a scan after retrieving a batch of
> results from a server. It does contain certain information, like a
> session id, but might not contain all the information you want... and
> in any case it's not a user-facing feature, but used internally.
>
> Your solution to use an iterator option is clever, and could work
> well, particularly if you are already setting an iterator on the
> client, and if listscans shows these options (I don't recall if it
> does). If you aren't setting an iterator on the client already, in
> order to add a superfluous option with the info you want to send, you
> could add an identity-mapping iterator (aka an "allow all" filter),
> but the extra iterator on the stack could have a performance impact.
>
> Another option, since 2.0.0, is to set an execution hint on a scanner
> (see https://accumulo.apache.org/docs/2.x/administration/scan-executors).
> However, querying the hint and emitting them to listscans, might
> require some code modification, as I'm not sure if those will show
> there right now. If you find this to be a viable option, and it needs
> additional code to work, feel free to propose a design to the dev
> list, or open a pull request.
>
> Related: we could also consider automatically populating some scan
> hints with some of the information the server side already knows, such
> as client IP address, client user name, etc.) into a reserved hint
> namespace (accumulo.* or similar), or a separate similar store that
> dispatchers would have access to in addition to execution hints.
>
> Christopher
>
> On Tue, Sep 15, 2020 at 8:50 PM Andrew Hulbert <ah...@ccri.com> wrote:
> >
> > Hi all,
> >
> > We were looking to uniquely tag a scan with some sort of ID that would
> > allow us to map it back to a client...something like a session id or
> > something else. The first idea we came up with was to simply set an
> > iterator option that could be viewable in listscans in the shell. Basic
> > idea would be to map it back to more than just the IP address of the
> > client (e.g a user name or something else injected on the client side
> > for a client servicing multiple users).
> >
> > Wondering if anybody else had done this before or if there were any
> > thoughts about having "scan metadata" in the future?
> >
> > Thanks,
> >
> > Andrew
> >
>

Re: Uniquely identifying Scans

Posted by Christopher <ct...@apache.org>.

Hi Andrew,

We currently have the concept of a "scan session", but that is used
internally only, for continuing a scan after retrieving a batch of
results from a server. It does contain certain information, like a
session id, but might not contain all the information you want... and
in any case it's not a user-facing feature, but used internally.

Your solution to use an iterator option is clever, and could work
well, particularly if you are already setting an iterator on the
client, and if listscans shows these options (I don't recall if it
does). If you aren't setting an iterator on the client already, in
order to add a superfluous option with the info you want to send, you
could add an identity-mapping iterator (aka an "allow all" filter),
but the extra iterator on the stack could have a performance impact.

Another option, since 2.0.0, is to set an execution hint on a scanner
(see https://accumulo.apache.org/docs/2.x/administration/scan-executors).
However, querying the hint and emitting them to listscans, might
require some code modification, as I'm not sure if those will show
there right now. If you find this to be a viable option, and it needs
additional code to work, feel free to propose a design to the dev
list, or open a pull request.

Related: we could also consider automatically populating some scan
hints with some of the information the server side already knows, such
as client IP address, client user name, etc.) into a reserved hint
namespace (accumulo.* or similar), or a separate similar store that
dispatchers would have access to in addition to execution hints.

Christopher

On Tue, Sep 15, 2020 at 8:50 PM Andrew Hulbert <ah...@ccri.com> wrote:
>
> Hi all,
>
> We were looking to uniquely tag a scan with some sort of ID that would
> allow us to map it back to a client...something like a session id or
> something else. The first idea we came up with was to simply set an
> iterator option that could be viewable in listscans in the shell. Basic
> idea would be to map it back to more than just the IP address of the
> client (e.g a user name or something else injected on the client side
> for a client servicing multiple users).
>
> Wondering if anybody else had done this before or if there were any
> thoughts about having "scan metadata" in the future?
>
> Thanks,
>
> Andrew
>