You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by buttercream <bu...@gmail.com> on 2015/02/19 02:09:01 UTC

Authorizations for complex user management

I'm working on a system where there are many users and the users credentials
and information are stored in a third party system. I was thinking the best
approach would be to have my default Accumulo user have the superset of all
permissions and then when a query is performed, proxy in the specific user
credential that may be a subset. But, this seems a bit cumbersome to have to
up front define all available credentials, especially if new authorizations
are added without our knowledge. Any thoughts on an alternative approach?
I'd like to just be able to proxy through credentials and not have to worry
about whether my Accumulo-defined user that I'm proxying through already has
them. Is there a way to just let that Accumulo-defined user have max
credentials and not have to specifically call them out? Thanks.



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Authorizations-for-complex-user-management-tp13294.html
Sent from the Users mailing list archive at Nabble.com.

Re: Authorizations for complex user management

Posted by Srikanth Viswanathan <sr...@gmail.com>.

Accumulo's authorizations are designed to be a whitelist, so you
cannot define "max credentials" in the authorization layer.

I faced a problem similar to yours, and I went with the custom
Authorizor approach. You can use the custom Authorizor to call out to
your third party service/database to obtain authorizations for end
users like Josh suggested.

On Wed, Feb 18, 2015 at 10:58 PM, Josh Elser <jo...@gmail.com> wrote:
> buttercream wrote:
>>
>> I'm working on a system where there are many users and the users
>> credentials
>> and information are stored in a third party system. I was thinking the
>> best
>> approach would be to have my default Accumulo user have the superset of
>> all
>> permissions and then when a query is performed, proxy in the specific user
>> credential that may be a subset. But, this seems a bit cumbersome to have
>> to
>> up front define all available credentials, especially if new
>> authorizations
>> are added without our knowledge.
>
>
> Yeah, this is the pain point. The approach works, but you have to assume a
> lot of security testing in your "proxy". You have to certify your software
> to get a full picture on the security of the system.
>
>> Any thoughts on an alternative approach?
>> I'd like to just be able to proxy through credentials and not have to
>> worry
>> about whether my Accumulo-defined user that I'm proxying through already
>> has
>> them. Is there a way to just let that Accumulo-defined user have max
>> credentials and not have to specifically call them out? Thanks.
>
>
> Another approach could be writing your own Accumulo Authorizor and
> Authenticator. You could directly contact the third-party system to
> determine if a user can be authenticated with Accumulo. Assuming you can
> extrapolate the Authorizations for each user from that system as well, the
> Authorizor can be done in the same fashion.
>
> http://accumulo.apache.org/1.6/accumulo_user_manual.html#_pluggable_security
>
>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-accumulo.1065345.n5.nabble.com/Authorizations-for-complex-user-management-tp13294.html
>> Sent from the Users mailing list archive at Nabble.com.

Re: OfflineScanner

Posted by Sean Busbey <bu...@cloudera.com>.

Hi Marc!

Yep, you can do this using the optional "setOfflineTableScan" on
AccumuloInputFormat[1]. It still requires that the table be offline.

There's a good example of programmatically creating an offline clone if you
look at the MR job we use to verify the "Continuous Ingest" integration
test[2]:

----
      Random random = new Random();
      clone = opts.getTableName() + "_" + String.format("%016x",
(random.nextLong() & 0x7fffffffffffffffl));
      conn = opts.getConnector();
      conn.tableOperations().clone(opts.getTableName(), clone, true, new
HashMap<String,String>(), new HashSet<String>());
      ranges =
conn.tableOperations().splitRangeByTablets(opts.getTableName(), new
Range(), opts.maxMaps);
      conn.tableOperations().offline(clone);
      AccumuloInputFormat.setInputTableName(job, clone);
      AccumuloInputFormat.setOfflineTableScan(job, true);
----

[1]: *http://s.apache.org/Pul <http://s.apache.org/Pul>*
[2]: *http://s.apache.org/P6Z <http://s.apache.org/P6Z>*



On Thu, Feb 19, 2015 at 9:47 AM, Marc Reichman <mreichman@pixelforensics.com
> wrote:

> Apologies for hijacking this, but is there any way to use an offline table
> clone with MapReduce and AccumuloInputFormat? That read speed increase
> sounds very appealing..
>
> On Thu, Feb 19, 2015 at 9:27 AM, Josh Elser <jo...@gmail.com> wrote:
>
>> Typically, if you're using the OfflineScanner, you'd clone the table you
>> want to read and then take the clone offline. It's a simple (and fast)
>> solution that doesn't interrupt the availability of the table.
>>
>> Doing the read offline will definitely be faster (maybe 20%, I'm not
>> entirely sure on accurate number and how it scales with nodes). The pain
>> would be the extra work in creating the clone, offline'ing the table, and
>> eventually deleting the clone when you're done with it. A little more work,
>> but manageable.
>>
>>
>> Ara Ebrahimi wrote:
>>
>>> Hi,
>>>
>>> I’m trying to optimize a connector we’ve written for Presto. In some
>>> cases we need to perform full table scans. This happens across all the
>>> nodes but each node is assigned to process only a sharded subset of data.
>>> Each shard is hosted by only 1 RFile. I’m looking at the
>>> AbstractInputFormat and OfflineIterator and it seems like the code is not
>>> that hard to use for this case. Is there any drawback? It seems like if the
>>> table is offline then OfflineIterator is used which apparently reads the
>>> RFiles directly and doesn’t involve any RPC and I think should be
>>> significantly faster. Is it so? Is there any drawback to using this while
>>> the table is not offline but no other app is messing with the table?
>>>
>>> Thanks,
>>> Ara.
>>>
>>>
>>>
>>> ________________________________
>>>
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you have
>>> received it in error, please notify the sender immediately and delete the
>>> original. Any other use of the e-mail by you is prohibited. Thank you in
>>> advance for your cooperation.
>>>
>>> ________________________________
>>>
>>
>


-- 
Sean

Re: OfflineScanner

Posted by Marc Reichman <mr...@pixelforensics.com>.

Apologies for hijacking this, but is there any way to use an offline table
clone with MapReduce and AccumuloInputFormat? That read speed increase
sounds very appealing..

On Thu, Feb 19, 2015 at 9:27 AM, Josh Elser <jo...@gmail.com> wrote:

> Typically, if you're using the OfflineScanner, you'd clone the table you
> want to read and then take the clone offline. It's a simple (and fast)
> solution that doesn't interrupt the availability of the table.
>
> Doing the read offline will definitely be faster (maybe 20%, I'm not
> entirely sure on accurate number and how it scales with nodes). The pain
> would be the extra work in creating the clone, offline'ing the table, and
> eventually deleting the clone when you're done with it. A little more work,
> but manageable.
>
>
> Ara Ebrahimi wrote:
>
>> Hi,
>>
>> I’m trying to optimize a connector we’ve written for Presto. In some
>> cases we need to perform full table scans. This happens across all the
>> nodes but each node is assigned to process only a sharded subset of data.
>> Each shard is hosted by only 1 RFile. I’m looking at the
>> AbstractInputFormat and OfflineIterator and it seems like the code is not
>> that hard to use for this case. Is there any drawback? It seems like if the
>> table is offline then OfflineIterator is used which apparently reads the
>> RFiles directly and doesn’t involve any RPC and I think should be
>> significantly faster. Is it so? Is there any drawback to using this while
>> the table is not offline but no other app is messing with the table?
>>
>> Thanks,
>> Ara.
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Thank you in
>> advance for your cooperation.
>>
>> ________________________________
>>
>

Re: OfflineScanner

Posted by Ara Ebrahimi <ar...@argyledata.com>.

Ok so I need to clone it. Doable. Thanks!

Ara.

> On Feb 19, 2015, at 7:28 AM, Josh Elser <jo...@gmail.com> wrote:
>
> Typically, if you're using the OfflineScanner, you'd clone the table you
> want to read and then take the clone offline. It's a simple (and fast)
> solution that doesn't interrupt the availability of the table.
>
> Doing the read offline will definitely be faster (maybe 20%, I'm not
> entirely sure on accurate number and how it scales with nodes). The pain
> would be the extra work in creating the clone, offline'ing the table,
> and eventually deleting the clone when you're done with it. A little
> more work, but manageable.
>
> Ara Ebrahimi wrote:
>> Hi,
>>
>> I’m trying to optimize a connector we’ve written for Presto. In some cases we need to perform full table scans. This happens across all the nodes but each node is assigned to process only a sharded subset of data. Each shard is hosted by only 1 RFile. I’m looking at the AbstractInputFormat and OfflineIterator and it seems like the code is not that hard to use for this case. Is there any drawback? It seems like if the table is offline then OfflineIterator is used which apparently reads the RFiles directly and doesn’t involve any RPC and I think should be significantly faster. Is it so? Is there any drawback to using this while the table is not offline but no other app is messing with the table?
>>
>> Thanks,
>> Ara.
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>>
>> ________________________________
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>
> ________________________________



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.

________________________________

Re: OfflineScanner

Posted by Josh Elser <jo...@gmail.com>.

Typically, if you're using the OfflineScanner, you'd clone the table you 
want to read and then take the clone offline. It's a simple (and fast) 
solution that doesn't interrupt the availability of the table.

Doing the read offline will definitely be faster (maybe 20%, I'm not 
entirely sure on accurate number and how it scales with nodes). The pain 
would be the extra work in creating the clone, offline'ing the table, 
and eventually deleting the clone when you're done with it. A little 
more work, but manageable.

Ara Ebrahimi wrote:
> Hi,
>
> I’m trying to optimize a connector we’ve written for Presto. In some cases we need to perform full table scans. This happens across all the nodes but each node is assigned to process only a sharded subset of data. Each shard is hosted by only 1 RFile. I’m looking at the AbstractInputFormat and OfflineIterator and it seems like the code is not that hard to use for this case. Is there any drawback? It seems like if the table is offline then OfflineIterator is used which apparently reads the RFiles directly and doesn’t involve any RPC and I think should be significantly faster. Is it so? Is there any drawback to using this while the table is not offline but no other app is messing with the table?
>
> Thanks,
> Ara.
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>
> ________________________________

Re: OfflineScanner

Posted by Josh Elser <jo...@gmail.com>.

Even so, as it stands, we have no "obligation" to make sure code you 
write that uses OfflineScanner will work across versions. Granted, we're 
not going to go and do it just to mess with you, but it's something that 
we can do to help you sleep better at night.

Ara Ebrahimi wrote:
> Actually I was wrong. OfflineScanner is public, OfflineIterator is package. So it’s good enough :)
>
> Ara.
>
>> On Feb 19, 2015, at 11:46 AM, Josh Elser<jo...@gmail.com>  wrote:
>>
>> Want to file a ticket, Ara? I didn't realize it wasn't directly in the
>> public API (only via m/r). I think it would make a nice addition.
>>
>> Ara Ebrahimi wrote:
>>> OfflineScanner is package protected. So I'll need to hack it. If it
>>> proves to be faster at least 20% then it's worth having it in the public
>>> Ali, perhaps even let user use it by a asking specific file to be
>>> scanned rather than directing scan by carefully defining the range to
>>> touch the intended file.
>>>
>>> Ara.
>>>
>>> On Feb 19, 2015, at 8:15 AM, Keith Turner<keith@deenlo.com
>>> <ma...@deenlo.com>>  wrote:
>>>
>>>>
>>>> On Thu, Feb 19, 2015 at 12:57 AM, Ara Ebrahimi
>>>> <ar...@argyledata.com>>  wrote:
>>>>
>>>>     Hi,
>>>>
>>>>     I’m trying to optimize a connector we’ve written for Presto. In
>>>>     some cases we need to perform full table scans. This happens
>>>>     across all the nodes but each node is assigned to process only a
>>>>     sharded subset of data. Each shard is hosted by only 1 RFile. I’m
>>>>     looking at the AbstractInputFormat and OfflineIterator and it
>>>>     seems like the code is not that hard to use for this case. Is
>>>>     there any drawback? It seems like if the table is offline then
>>>>     OfflineIterator is used which apparently reads the RFiles directly
>>>>     and doesn’t involve any RPC and I think should be significantly
>>>>     faster. Is it so? Is there any drawback to using this while the
>>>>     table is not offline but no other app is messing with the table?
>>>>
>>>>
>>>> The code will throw an exception if the table is not offline (intent
>>>> is to ensure the files are stable and not garbage collected). As
>>>> others have stated you can clone.
>>>> Currently offline scanning is only supported in the public API w/ Map
>>>> Reduce. Curious, would you be interested in seeing this in the client
>>>> public API?
>>>>
>>>>
>>>>     Thanks,
>>>>     Ara.
>>>>
>>>>
>>>>
>>>>     ________________________________
>>>>
>>>>     This message is for the designated recipient only and may contain
>>>>     privileged, proprietary, or otherwise confidential information. If
>>>>     you have received it in error, please notify the sender
>>>>     immediately and delete the original. Any other use of the e-mail
>>>>     by you is prohibited. Thank you in advance for your cooperation.
>>>>
>>>>     ________________________________
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise confidential information. If you
>>>> have received it in error, please notify the sender immediately and
>>>> delete the original. Any other use of the e-mail by you is prohibited.
>>>> Thank you in advance for your cooperation.
>>>>
>>>> ------------------------------------------------------------------------
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you
>>> have received it in error, please notify the sender immediately and
>>> delete the original. Any other use of the e-mail by you is prohibited.
>>> Thank you in advance for your cooperation.
>>>
>>> ------------------------------------------------------------------------
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>>
>> ________________________________
>
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>
> ________________________________

Re: OfflineScanner

Posted by Ara Ebrahimi <ar...@argyledata.com>.

Actually I was wrong. OfflineScanner is public, OfflineIterator is package. So it’s good enough :)

Ara.

> On Feb 19, 2015, at 11:46 AM, Josh Elser <jo...@gmail.com> wrote:
>
> Want to file a ticket, Ara? I didn't realize it wasn't directly in the
> public API (only via m/r). I think it would make a nice addition.
>
> Ara Ebrahimi wrote:
>> OfflineScanner is package protected. So I'll need to hack it. If it
>> proves to be faster at least 20% then it's worth having it in the public
>> Ali, perhaps even let user use it by a asking specific file to be
>> scanned rather than directing scan by carefully defining the range to
>> touch the intended file.
>>
>> Ara.
>>
>> On Feb 19, 2015, at 8:15 AM, Keith Turner <keith@deenlo.com
>> <ma...@deenlo.com>> wrote:
>>
>>>
>>>
>>> On Thu, Feb 19, 2015 at 12:57 AM, Ara Ebrahimi
>>> <ara.ebrahimi@argyledata.com <ma...@argyledata.com>> wrote:
>>>
>>>    Hi,
>>>
>>>    I’m trying to optimize a connector we’ve written for Presto. In
>>>    some cases we need to perform full table scans. This happens
>>>    across all the nodes but each node is assigned to process only a
>>>    sharded subset of data. Each shard is hosted by only 1 RFile. I’m
>>>    looking at the AbstractInputFormat and OfflineIterator and it
>>>    seems like the code is not that hard to use for this case. Is
>>>    there any drawback? It seems like if the table is offline then
>>>    OfflineIterator is used which apparently reads the RFiles directly
>>>    and doesn’t involve any RPC and I think should be significantly
>>>    faster. Is it so? Is there any drawback to using this while the
>>>    table is not offline but no other app is messing with the table?
>>>
>>>
>>> The code will throw an exception if the table is not offline (intent
>>> is to ensure the files are stable and not garbage collected). As
>>> others have stated you can clone.
>>> Currently offline scanning is only supported in the public API w/ Map
>>> Reduce. Curious, would you be interested in seeing this in the client
>>> public API?
>>>
>>>
>>>    Thanks,
>>>    Ara.
>>>
>>>
>>>
>>>    ________________________________
>>>
>>>    This message is for the designated recipient only and may contain
>>>    privileged, proprietary, or otherwise confidential information. If
>>>    you have received it in error, please notify the sender
>>>    immediately and delete the original. Any other use of the e-mail
>>>    by you is prohibited. Thank you in advance for your cooperation.
>>>
>>>    ________________________________
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you
>>> have received it in error, please notify the sender immediately and
>>> delete the original. Any other use of the e-mail by you is prohibited.
>>> Thank you in advance for your cooperation.
>>>
>>> ------------------------------------------------------------------------
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>> have received it in error, please notify the sender immediately and
>> delete the original. Any other use of the e-mail by you is prohibited.
>> Thank you in advance for your cooperation.
>>
>> ------------------------------------------------------------------------
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>
> ________________________________




________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.

________________________________

Re: OfflineScanner

Posted by Josh Elser <jo...@gmail.com>.

Want to file a ticket, Ara? I didn't realize it wasn't directly in the 
public API (only via m/r). I think it would make a nice addition.

Ara Ebrahimi wrote:
> OfflineScanner is package protected. So I'll need to hack it. If it
> proves to be faster at least 20% then it's worth having it in the public
> Ali, perhaps even let user use it by a asking specific file to be
> scanned rather than directing scan by carefully defining the range to
> touch the intended file.
>
> Ara.
>
> On Feb 19, 2015, at 8:15 AM, Keith Turner <keith@deenlo.com
> <ma...@deenlo.com>> wrote:
>
>>
>>
>> On Thu, Feb 19, 2015 at 12:57 AM, Ara Ebrahimi
>> <ara.ebrahimi@argyledata.com <ma...@argyledata.com>> wrote:
>>
>>     Hi,
>>
>>     I’m trying to optimize a connector we’ve written for Presto. In
>>     some cases we need to perform full table scans. This happens
>>     across all the nodes but each node is assigned to process only a
>>     sharded subset of data. Each shard is hosted by only 1 RFile. I’m
>>     looking at the AbstractInputFormat and OfflineIterator and it
>>     seems like the code is not that hard to use for this case. Is
>>     there any drawback? It seems like if the table is offline then
>>     OfflineIterator is used which apparently reads the RFiles directly
>>     and doesn’t involve any RPC and I think should be significantly
>>     faster. Is it so? Is there any drawback to using this while the
>>     table is not offline but no other app is messing with the table?
>>
>>
>> The code will throw an exception if the table is not offline (intent
>> is to ensure the files are stable and not garbage collected). As
>> others have stated you can clone.
>> Currently offline scanning is only supported in the public API w/ Map
>> Reduce. Curious, would you be interested in seeing this in the client
>> public API?
>>
>>
>>     Thanks,
>>     Ara.
>>
>>
>>
>>     ________________________________
>>
>>     This message is for the designated recipient only and may contain
>>     privileged, proprietary, or otherwise confidential information. If
>>     you have received it in error, please notify the sender
>>     immediately and delete the original. Any other use of the e-mail
>>     by you is prohibited. Thank you in advance for your cooperation.
>>
>>     ________________________________
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>> have received it in error, please notify the sender immediately and
>> delete the original. Any other use of the e-mail by you is prohibited.
>> Thank you in advance for your cooperation.
>>
>> ------------------------------------------------------------------------
>
>
>
> ------------------------------------------------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you
> have received it in error, please notify the sender immediately and
> delete the original. Any other use of the e-mail by you is prohibited.
> Thank you in advance for your cooperation.
>
> ------------------------------------------------------------------------

Re: OfflineScanner

Posted by Ara Ebrahimi <ar...@argyledata.com>.

OfflineScanner is package protected. So I'll need to hack it. If it proves to be faster at least 20% then it's worth having it in the public Ali, perhaps even let user use it by a asking specific file to be scanned rather than directing scan by carefully defining the range to touch the intended file.

Ara.

On Feb 19, 2015, at 8:15 AM, Keith Turner <ke...@deenlo.com>> wrote:



On Thu, Feb 19, 2015 at 12:57 AM, Ara Ebrahimi <ar...@argyledata.com>> wrote:
Hi,

I'm trying to optimize a connector we've written for Presto. In some cases we need to perform full table scans. This happens across all the nodes but each node is assigned to process only a sharded subset of data. Each shard is hosted by only 1 RFile. I'm looking at the AbstractInputFormat and OfflineIterator and it seems like the code is not that hard to use for this case. Is there any drawback? It seems like if the table is offline then OfflineIterator is used which apparently reads the RFiles directly and doesn't involve any RPC and I think should be significantly faster. Is it so? Is there any drawback to using this while the table is not offline but no other app is messing with the table?

The code will throw an exception if the table is not offline (intent is to ensure the files are stable and not garbage collected). As others have stated you can clone.

Currently offline scanning is only supported in the public API w/ Map Reduce.  Curious, would you be interested in seeing this in the client public API?


Thanks,
Ara.



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.

________________________________




________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.

________________________________



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.

________________________________

Re: OfflineScanner

Posted by Keith Turner <ke...@deenlo.com>.

On Thu, Feb 19, 2015 at 12:57 AM, Ara Ebrahimi <ar...@argyledata.com>
wrote:

> Hi,
>
> I’m trying to optimize a connector we’ve written for Presto. In some cases
> we need to perform full table scans. This happens across all the nodes but
> each node is assigned to process only a sharded subset of data. Each shard
> is hosted by only 1 RFile. I’m looking at the AbstractInputFormat and
> OfflineIterator and it seems like the code is not that hard to use for this
> case. Is there any drawback? It seems like if the table is offline then
> OfflineIterator is used which apparently reads the RFiles directly and
> doesn’t involve any RPC and I think should be significantly faster. Is it
> so? Is there any drawback to using this while the table is not offline but
> no other app is messing with the table?
>

The code will throw an exception if the table is not offline (intent is to
ensure the files are stable and not garbage collected). As others have
stated you can clone.

Currently offline scanning is only supported in the public API w/ Map
Reduce.  Curious, would you be interested in seeing this in the client
public API?


> Thanks,
> Ara.
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
>
> ________________________________
>

OfflineScanner

Posted by Ara Ebrahimi <ar...@argyledata.com>.

Hi,

I’m trying to optimize a connector we’ve written for Presto. In some cases we need to perform full table scans. This happens across all the nodes but each node is assigned to process only a sharded subset of data. Each shard is hosted by only 1 RFile. I’m looking at the AbstractInputFormat and OfflineIterator and it seems like the code is not that hard to use for this case. Is there any drawback? It seems like if the table is offline then OfflineIterator is used which apparently reads the RFiles directly and doesn’t involve any RPC and I think should be significantly faster. Is it so? Is there any drawback to using this while the table is not offline but no other app is messing with the table?

Thanks,
Ara.



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.

________________________________

Re: Authorizations for complex user management

Posted by Josh Elser <jo...@gmail.com>.

buttercream wrote:
> I'm working on a system where there are many users and the users credentials
> and information are stored in a third party system. I was thinking the best
> approach would be to have my default Accumulo user have the superset of all
> permissions and then when a query is performed, proxy in the specific user
> credential that may be a subset. But, this seems a bit cumbersome to have to
> up front define all available credentials, especially if new authorizations
> are added without our knowledge.

Yeah, this is the pain point. The approach works, but you have to assume 
a lot of security testing in your "proxy". You have to certify your 
software to get a full picture on the security of the system.

> Any thoughts on an alternative approach?
> I'd like to just be able to proxy through credentials and not have to worry
> about whether my Accumulo-defined user that I'm proxying through already has
> them. Is there a way to just let that Accumulo-defined user have max
> credentials and not have to specifically call them out? Thanks.

Another approach could be writing your own Accumulo Authorizor and 
Authenticator. You could directly contact the third-party system to 
determine if a user can be authenticated with Accumulo. Assuming you can 
extrapolate the Authorizations for each user from that system as well, 
the Authorizor can be done in the same fashion.

http://accumulo.apache.org/1.6/accumulo_user_manual.html#_pluggable_security

>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Authorizations-for-complex-user-management-tp13294.html
> Sent from the Users mailing list archive at Nabble.com.