You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pirk.apache.org by Tim Ellison <t....@gmail.com> on 2016/09/28 08:53:48 UTC

Thoughts on exponent tables

Presently, Pirk has the ability to create exponent tables for a query
either in memory directly via the Query#expTable (which ends up being a
map of maps element -> <power, element^power mod N^2>), or using
map-reduce on HDFS via Query#expFileBasedLookup (which ends up being
element hash -> filename containing <power, element^power mod N^2>
strings, and back in memory as a Guava cache).

I'm inclined to pull these table representations out to a core abstract
type that provides the exponent table calls, and create the concrete
implementations under there.  Then all the table building and lookup
would be in one place, and a Query would just have one expTable
reference to worry about.

This would then result in changing the QueryInfo constructors to take a
concrete type of expTable, rather than the booleans
useExpLookupTableInput and useHDFSExpLookupTableInput; which should
scale better if we want to try useRedisExpLookupTable or whatever in
future, and it reduces the pirk-core's direct references to HDFS.

WDYT?

Regards,
Tim

Re: Thoughts on exponent tables

Posted by Tim Ellison <t....@gmail.com>.

On 13/10/16 18:50, Ellison Anne Williams wrote:
> The embedded lookup tables are computed Query side as they are specific to
> the (encrypted) query vectors. The 'embedded' part is key here - if you
> compute them in the Responder, then you have to repeat that computation
> each time you would like to run the query instead of just pulling the
> (one-time) pre-computed lookup table from the Query object.

Yet as we can see, the responder may choose:
- not to use a lookup table at all,
- generate it for this query, and keep it in memory ("embedded"),
- precompute the table and store it in HDFS,
- other ways to get the exp values we have not thought of yet.

So it is odd that the details of the implementation by which the
responder finds the exp values is decided by the querier.

p.s. it makes me a bit nervous, though without any concrete proof, for
the querier to be giving the responder pre-computed exponent values used
in compiling the encrypted response.  What if the querier chooses to lie
through the expTable (e.g. always answer 1)?  Can't they use that to
figure out details of the underlying data?

> Note that in Spark, there is an option to compute the lookup table in a
> distributed form (i.e. not embedded in the Query).
> 
> Thus, computation of lookup tables can happen on the Responder side (there
> is such an implementation for Spark), but the embedded lookup table is a
> different animal.

Right, I am just thinking about the embedded lookup table right now.

> Make sense?

I agree that having the responder run the same query again is likely to
be a common case, and there should be some way to cache the working data
in case the query is seen again.

I'm not quite there yet that embedding it in the query is the right answer.

Regards,
Tim

> On Wed, Oct 12, 2016 at 9:36 AM, Tim Ellison <t....@gmail.com> wrote:
> 
>> On 29/09/16 11:29, Ellison Anne Williams wrote:
>>> In general, I am in favor of an abstract class.
>>>
>>> However, note that in the distributed case, the 'table' is generated in a
>>> distributed fashion and then used as such too ('split' and distributed).
>>>
>>> FWIW - In preliminary testing, the lookup tables ended up not performing
>>> any better at scale than the local caching mechanism that is currently in
>>> place and used by default (in
>>> org.apache.pirk.responder.wideskies.common.ComputeEncryptedRow).
>>
>> I'm trying to figure out why the Query is responsible for maintaining
>> the expTable / expFile* info?  These tables are only used by the
>> responders, so doesn't it make sense to move the logic over there?
>>
>> The responders should decide whether they want to use caches to
>> calculate the response, not the person asking the query.
>>
>> Regards,
>> Tim
>>
>

Re: Thoughts on exponent tables

Posted by Ellison Anne Williams <ea...@apache.org>.

The embedded lookup tables are computed Query side as they are specific to
the (encrypted) query vectors. The 'embedded' part is key here - if you
compute them in the Responder, then you have to repeat that computation
each time you would like to run the query instead of just pulling the
(one-time) pre-computed lookup table from the Query object.

Note that in Spark, there is an option to compute the lookup table in a
distributed form (i.e. not embedded in the Query).

Thus, computation of lookup tables can happen on the Responder side (there
is such an implementation for Spark), but the embedded lookup table is a
different animal.

Make sense?

On Wed, Oct 12, 2016 at 9:36 AM, Tim Ellison <t....@gmail.com> wrote:

> On 29/09/16 11:29, Ellison Anne Williams wrote:
> > In general, I am in favor of an abstract class.
> >
> > However, note that in the distributed case, the 'table' is generated in a
> > distributed fashion and then used as such too ('split' and distributed).
> >
> > FWIW - In preliminary testing, the lookup tables ended up not performing
> > any better at scale than the local caching mechanism that is currently in
> > place and used by default (in
> > org.apache.pirk.responder.wideskies.common.ComputeEncryptedRow).
>
> I'm trying to figure out why the Query is responsible for maintaining
> the expTable / expFile* info?  These tables are only used by the
> responders, so doesn't it make sense to move the logic over there?
>
> The responders should decide whether they want to use caches to
> calculate the response, not the person asking the query.
>
> Regards,
> Tim
>

Re: Thoughts on exponent tables

Posted by Tim Ellison <t....@gmail.com>.

On 29/09/16 11:29, Ellison Anne Williams wrote:
> In general, I am in favor of an abstract class.
> 
> However, note that in the distributed case, the 'table' is generated in a
> distributed fashion and then used as such too ('split' and distributed).
> 
> FWIW - In preliminary testing, the lookup tables ended up not performing
> any better at scale than the local caching mechanism that is currently in
> place and used by default (in
> org.apache.pirk.responder.wideskies.common.ComputeEncryptedRow).

I'm trying to figure out why the Query is responsible for maintaining
the expTable / expFile* info?  These tables are only used by the
responders, so doesn't it make sense to move the logic over there?

The responders should decide whether they want to use caches to
calculate the response, not the person asking the query.

Regards,
Tim

Re: Thoughts on exponent tables

Posted by Tim Ellison <t....@gmail.com>.

On 29/09/16 11:29, Ellison Anne Williams wrote:
> In general, I am in favor of an abstract class.
> 
> However, note that in the distributed case, the 'table' is generated in a
> distributed fashion and then used as such too ('split' and distributed).
> 
> FWIW - In preliminary testing, the lookup tables ended up not performing
> any better at scale than the local caching mechanism that is currently in
> place and used by default (in
> org.apache.pirk.responder.wideskies.common.ComputeEncryptedRow).


Thanks for the comments.  I've created a JIRA so I have somewhere to
hang the bits of code I'm experimenting with now.  It will be on a
"slow-burn" as and when I find a few mins.

Regards,
Tim

> On Wed, Sep 28, 2016 at 4:53 AM, Tim Ellison <t....@gmail.com> wrote:
> 
>> Presently, Pirk has the ability to create exponent tables for a query
>> either in memory directly via the Query#expTable (which ends up being a
>> map of maps element -> <power, element^power mod N^2>), or using
>> map-reduce on HDFS via Query#expFileBasedLookup (which ends up being
>> element hash -> filename containing <power, element^power mod N^2>
>> strings, and back in memory as a Guava cache).
>>
>> I'm inclined to pull these table representations out to a core abstract
>> type that provides the exponent table calls, and create the concrete
>> implementations under there.  Then all the table building and lookup
>> would be in one place, and a Query would just have one expTable
>> reference to worry about.
>>
>> This would then result in changing the QueryInfo constructors to take a
>> concrete type of expTable, rather than the booleans
>> useExpLookupTableInput and useHDFSExpLookupTableInput; which should
>> scale better if we want to try useRedisExpLookupTable or whatever in
>> future, and it reduces the pirk-core's direct references to HDFS.
>>
>> WDYT?
>>
>> Regards,
>> Tim
>>
>

Re: Thoughts on exponent tables

Posted by Ellison Anne Williams <ea...@apache.org>.

In general, I am in favor of an abstract class.

However, note that in the distributed case, the 'table' is generated in a
distributed fashion and then used as such too ('split' and distributed).

FWIW - In preliminary testing, the lookup tables ended up not performing
any better at scale than the local caching mechanism that is currently in
place and used by default (in
org.apache.pirk.responder.wideskies.common.ComputeEncryptedRow).

On Wed, Sep 28, 2016 at 4:53 AM, Tim Ellison <t....@gmail.com> wrote:

> Presently, Pirk has the ability to create exponent tables for a query
> either in memory directly via the Query#expTable (which ends up being a
> map of maps element -> <power, element^power mod N^2>), or using
> map-reduce on HDFS via Query#expFileBasedLookup (which ends up being
> element hash -> filename containing <power, element^power mod N^2>
> strings, and back in memory as a Guava cache).
>
> I'm inclined to pull these table representations out to a core abstract
> type that provides the exponent table calls, and create the concrete
> implementations under there.  Then all the table building and lookup
> would be in one place, and a Query would just have one expTable
> reference to worry about.
>
> This would then result in changing the QueryInfo constructors to take a
> concrete type of expTable, rather than the booleans
> useExpLookupTableInput and useHDFSExpLookupTableInput; which should
> scale better if we want to try useRedisExpLookupTable or whatever in
> future, and it reduces the pirk-core's direct references to HDFS.
>
> WDYT?
>
> Regards,
> Tim
>