You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Rob Vesse <rv...@dotnetrdf.org> on 2017/10/12 09:03:07 UTC

Obfuscation Support?

Folks

 

 An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached.

 

 Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries?

 

 It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think

 

Rob


Re: Obfuscation Support?

Posted by aj...@apache.org.
I think that having the tooling available would be nothing but good. (Well, except for the hard work that Rob will have 
to do to make it happen. :g:) And I agree with Andy that we want to be careful about how we present it-- managing 
expectations is key. Perhaps we can make a point of providing the tooling in a way that moves users through some 
thinking about MCVE provision and so forth? I'm just imagining a page on the site where you get the tool, with that link 
wrapped in some useful guidance explaining the limitations that Andy discussed, how to be sure you are asking your 
question in a way that will get the best answers, etc.

> Do we perhaps need to consider how we could make clear that there is an ability to purchase support from external vendors? Would it be possible to have a page on the website that provides a list of known support vendors, obviously with the appropriate disclaimers around nonendorsement, neutrality etc and the ability for anyone who asks to have their Company listed?

+1! I bet we can do this, well within Apache boundaries. For example, there are plenty of pages like:

https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support


ajs6f

Rob Vesse wrote on 10/12/17 9:21 AM:
> My intention was not for us to start offering a debugging service nor to stop expecting users to provide a minimal complete example.
>
> My thinking is that it provides a way to help users in providing a complete example, I was not expecting that they would use it to submit their entire data sets. And clearly obfuscation does have limits, particularly when you consider things like typed literals where are you almost need to leave them alone in order for the obfuscated outputs to have any semblance of meaning and usefulness.
>
> I totally agree that none of us has the time to dive into detailed debugging of users problems. Do we perhaps need to consider how we could make clear that there is an ability to purchase support from external vendors? Would it be possible to have a page on the website that provides a list of known support vendors, obviously with the appropriate disclaimers around nonendorsement, neutrality etc and the ability for anyone who asks to have their Company listed?
>
> Rob
>
> On 12/10/2017 12:36, "Andy Seaborne" <an...@apache.org> wrote:
>
>     Good question.
>
>     It might be valuable to add to the collection of tools.
>
>     I do have some concern about we are offering here though.
>
>     (1) if we offer to look at large datasets and/or large log files, then
>     work is moving from the user to the list.
>
>     (2) the obfuscated data is public. We don't want any
>     commitment/liability here that the code is, say, suitable for personal
>     data because sometimes obfuscation is not enough.
>
>
>     On the first point:
>
>     Part of a CMVE [1] is the user doing some work.  If we make it
>     acceptable to bypass that, the work still exists but it has been
>     transferred.
>
>     I simply can't spend 1+ hour setting up a test environment.  Performance
>     can involve load as well and I don't have the infrastructure to look at
>     that.
>
>     I'm more willing to spend time if the user is in a university/non-profit
>     or for people, commercial or otherwise, who engage in useful discussion.
>     A good report is a contribution.
>
>     But I'm not willing (or even able) to subsidise commercial organisations
>     per se. They can go find and pay for commercial support contract or
>     contract with someone (a contributor/committer maybe) and have a
>     confidentiality agreement.
>
>     It is not always one question in isolation.  Solve one issue and then
>     another arrives.
>
>     Sorry if this is grumpy but I can see ways things might turn out not so
>     well without us also having common agreement about how we operate on users@.
>
>     	Andy
>
>     [1] and point to
>     https://stackoverflow.com/help/mcve
>
>     PS
>     There is also a theme of "ask first" before trying anything, or doing in
>     a few minutes investigation. Such emails are vague.
>
>
>
>     On 12/10/17 10:03, Rob Vesse wrote:
>     > Folks
>     >
>     >
>     >
>     >   An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached.
>     >
>     >
>     >
>     >   Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries?
>     >
>     >
>     >
>     >   It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think
>     >
>     >
>     >
>     > Rob
>     >
>     >
>
>
>
>
>

Re: Obfuscation Support?

Posted by Rob Vesse <rv...@dotnetrdf.org>.
My intention was not for us to start offering a debugging service nor to stop expecting users to provide a minimal complete example.

My thinking is that it provides a way to help users in providing a complete example, I was not expecting that they would use it to submit their entire data sets. And clearly obfuscation does have limits, particularly when you consider things like typed literals where are you almost need to leave them alone in order for the obfuscated outputs to have any semblance of meaning and usefulness.

I totally agree that none of us has the time to dive into detailed debugging of users problems. Do we perhaps need to consider how we could make clear that there is an ability to purchase support from external vendors? Would it be possible to have a page on the website that provides a list of known support vendors, obviously with the appropriate disclaimers around nonendorsement, neutrality etc and the ability for anyone who asks to have their Company listed?

Rob

On 12/10/2017 12:36, "Andy Seaborne" <an...@apache.org> wrote:

    Good question.
    
    It might be valuable to add to the collection of tools.
    
    I do have some concern about we are offering here though.
    
    (1) if we offer to look at large datasets and/or large log files, then 
    work is moving from the user to the list.
    
    (2) the obfuscated data is public. We don't want any 
    commitment/liability here that the code is, say, suitable for personal 
    data because sometimes obfuscation is not enough.
    
    
    On the first point:
    
    Part of a CMVE [1] is the user doing some work.  If we make it 
    acceptable to bypass that, the work still exists but it has been 
    transferred.
    
    I simply can't spend 1+ hour setting up a test environment.  Performance 
    can involve load as well and I don't have the infrastructure to look at 
    that.
    
    I'm more willing to spend time if the user is in a university/non-profit 
    or for people, commercial or otherwise, who engage in useful discussion. 
    A good report is a contribution.
    
    But I'm not willing (or even able) to subsidise commercial organisations 
    per se. They can go find and pay for commercial support contract or 
    contract with someone (a contributor/committer maybe) and have a 
    confidentiality agreement.
    
    It is not always one question in isolation.  Solve one issue and then 
    another arrives.
    
    Sorry if this is grumpy but I can see ways things might turn out not so 
    well without us also having common agreement about how we operate on users@.
    
    	Andy
    
    [1] and point to
    https://stackoverflow.com/help/mcve
    
    PS
    There is also a theme of "ask first" before trying anything, or doing in 
    a few minutes investigation. Such emails are vague.
    
    
    
    On 12/10/17 10:03, Rob Vesse wrote:
    > Folks
    > 
    >   
    > 
    >   An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached.
    > 
    >   
    > 
    >   Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries?
    > 
    >   
    > 
    >   It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think
    > 
    >   
    > 
    > Rob
    > 
    > 
    





Re: Obfuscation Support?

Posted by Andy Seaborne <an...@apache.org>.
Good question.

It might be valuable to add to the collection of tools.

I do have some concern about we are offering here though.

(1) if we offer to look at large datasets and/or large log files, then 
work is moving from the user to the list.

(2) the obfuscated data is public. We don't want any 
commitment/liability here that the code is, say, suitable for personal 
data because sometimes obfuscation is not enough.


On the first point:

Part of a CMVE [1] is the user doing some work.  If we make it 
acceptable to bypass that, the work still exists but it has been 
transferred.

I simply can't spend 1+ hour setting up a test environment.  Performance 
can involve load as well and I don't have the infrastructure to look at 
that.

I'm more willing to spend time if the user is in a university/non-profit 
or for people, commercial or otherwise, who engage in useful discussion. 
A good report is a contribution.

But I'm not willing (or even able) to subsidise commercial organisations 
per se. They can go find and pay for commercial support contract or 
contract with someone (a contributor/committer maybe) and have a 
confidentiality agreement.

It is not always one question in isolation.  Solve one issue and then 
another arrives.

Sorry if this is grumpy but I can see ways things might turn out not so 
well without us also having common agreement about how we operate on users@.

	Andy

[1] and point to
https://stackoverflow.com/help/mcve

PS
There is also a theme of "ask first" before trying anything, or doing in 
a few minutes investigation. Such emails are vague.



On 12/10/17 10:03, Rob Vesse wrote:
> Folks
> 
>   
> 
>   An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached.
> 
>   
> 
>   Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries?
> 
>   
> 
>   It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think
> 
>   
> 
> Rob
> 
>