You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Rob Vesse <rv...@dotnetrdf.org> on 2017/10/12 09:03:07 UTC
Obfuscation Support?
Folks
An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached.
Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries?
It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think
Rob
Re: Obfuscation Support?
Posted by aj...@apache.org.
I think that having the tooling available would be nothing but good. (Well, except for the hard work that Rob will have
to do to make it happen. :g:) And I agree with Andy that we want to be careful about how we present it-- managing
expectations is key. Perhaps we can make a point of providing the tooling in a way that moves users through some
thinking about MCVE provision and so forth? I'm just imagining a page on the site where you get the tool, with that link
wrapped in some useful guidance explaining the limitations that Andy discussed, how to be sure you are asking your
question in a way that will get the best answers, etc.
> Do we perhaps need to consider how we could make clear that there is an ability to purchase support from external vendors? Would it be possible to have a page on the website that provides a list of known support vendors, obviously with the appropriate disclaimers around nonendorsement, neutrality etc and the ability for anyone who asks to have their Company listed?
+1! I bet we can do this, well within Apache boundaries. For example, there are plenty of pages like:
https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
ajs6f
Rob Vesse wrote on 10/12/17 9:21 AM:
> My intention was not for us to start offering a debugging service nor to stop expecting users to provide a minimal complete example.
>
> My thinking is that it provides a way to help users in providing a complete example, I was not expecting that they would use it to submit their entire data sets. And clearly obfuscation does have limits, particularly when you consider things like typed literals where are you almost need to leave them alone in order for the obfuscated outputs to have any semblance of meaning and usefulness.
>
> I totally agree that none of us has the time to dive into detailed debugging of users problems. Do we perhaps need to consider how we could make clear that there is an ability to purchase support from external vendors? Would it be possible to have a page on the website that provides a list of known support vendors, obviously with the appropriate disclaimers around nonendorsement, neutrality etc and the ability for anyone who asks to have their Company listed?
>
> Rob
>
> On 12/10/2017 12:36, "Andy Seaborne" <an...@apache.org> wrote:
>
> Good question.
>
> It might be valuable to add to the collection of tools.
>
> I do have some concern about we are offering here though.
>
> (1) if we offer to look at large datasets and/or large log files, then
> work is moving from the user to the list.
>
> (2) the obfuscated data is public. We don't want any
> commitment/liability here that the code is, say, suitable for personal
> data because sometimes obfuscation is not enough.
>
>
> On the first point:
>
> Part of a CMVE [1] is the user doing some work. If we make it
> acceptable to bypass that, the work still exists but it has been
> transferred.
>
> I simply can't spend 1+ hour setting up a test environment. Performance
> can involve load as well and I don't have the infrastructure to look at
> that.
>
> I'm more willing to spend time if the user is in a university/non-profit
> or for people, commercial or otherwise, who engage in useful discussion.
> A good report is a contribution.
>
> But I'm not willing (or even able) to subsidise commercial organisations
> per se. They can go find and pay for commercial support contract or
> contract with someone (a contributor/committer maybe) and have a
> confidentiality agreement.
>
> It is not always one question in isolation. Solve one issue and then
> another arrives.
>
> Sorry if this is grumpy but I can see ways things might turn out not so
> well without us also having common agreement about how we operate on users@.
>
> Andy
>
> [1] and point to
> https://stackoverflow.com/help/mcve
>
> PS
> There is also a theme of "ask first" before trying anything, or doing in
> a few minutes investigation. Such emails are vague.
>
>
>
> On 12/10/17 10:03, Rob Vesse wrote:
> > Folks
> >
> >
> >
> > An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached.
> >
> >
> >
> > Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries?
> >
> >
> >
> > It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think
> >
> >
> >
> > Rob
> >
> >
>
>
>
>
>
Re: Obfuscation Support?
Posted by Rob Vesse <rv...@dotnetrdf.org>.
My intention was not for us to start offering a debugging service nor to stop expecting users to provide a minimal complete example.
My thinking is that it provides a way to help users in providing a complete example, I was not expecting that they would use it to submit their entire data sets. And clearly obfuscation does have limits, particularly when you consider things like typed literals where are you almost need to leave them alone in order for the obfuscated outputs to have any semblance of meaning and usefulness.
I totally agree that none of us has the time to dive into detailed debugging of users problems. Do we perhaps need to consider how we could make clear that there is an ability to purchase support from external vendors? Would it be possible to have a page on the website that provides a list of known support vendors, obviously with the appropriate disclaimers around nonendorsement, neutrality etc and the ability for anyone who asks to have their Company listed?
Rob
On 12/10/2017 12:36, "Andy Seaborne" <an...@apache.org> wrote:
Good question.
It might be valuable to add to the collection of tools.
I do have some concern about we are offering here though.
(1) if we offer to look at large datasets and/or large log files, then
work is moving from the user to the list.
(2) the obfuscated data is public. We don't want any
commitment/liability here that the code is, say, suitable for personal
data because sometimes obfuscation is not enough.
On the first point:
Part of a CMVE [1] is the user doing some work. If we make it
acceptable to bypass that, the work still exists but it has been
transferred.
I simply can't spend 1+ hour setting up a test environment. Performance
can involve load as well and I don't have the infrastructure to look at
that.
I'm more willing to spend time if the user is in a university/non-profit
or for people, commercial or otherwise, who engage in useful discussion.
A good report is a contribution.
But I'm not willing (or even able) to subsidise commercial organisations
per se. They can go find and pay for commercial support contract or
contract with someone (a contributor/committer maybe) and have a
confidentiality agreement.
It is not always one question in isolation. Solve one issue and then
another arrives.
Sorry if this is grumpy but I can see ways things might turn out not so
well without us also having common agreement about how we operate on users@.
Andy
[1] and point to
https://stackoverflow.com/help/mcve
PS
There is also a theme of "ask first" before trying anything, or doing in
a few minutes investigation. Such emails are vague.
On 12/10/17 10:03, Rob Vesse wrote:
> Folks
>
>
>
> An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached.
>
>
>
> Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries?
>
>
>
> It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think
>
>
>
> Rob
>
>
Re: Obfuscation Support?
Posted by Andy Seaborne <an...@apache.org>.
Good question.
It might be valuable to add to the collection of tools.
I do have some concern about we are offering here though.
(1) if we offer to look at large datasets and/or large log files, then
work is moving from the user to the list.
(2) the obfuscated data is public. We don't want any
commitment/liability here that the code is, say, suitable for personal
data because sometimes obfuscation is not enough.
On the first point:
Part of a CMVE [1] is the user doing some work. If we make it
acceptable to bypass that, the work still exists but it has been
transferred.
I simply can't spend 1+ hour setting up a test environment. Performance
can involve load as well and I don't have the infrastructure to look at
that.
I'm more willing to spend time if the user is in a university/non-profit
or for people, commercial or otherwise, who engage in useful discussion.
A good report is a contribution.
But I'm not willing (or even able) to subsidise commercial organisations
per se. They can go find and pay for commercial support contract or
contract with someone (a contributor/committer maybe) and have a
confidentiality agreement.
It is not always one question in isolation. Solve one issue and then
another arrives.
Sorry if this is grumpy but I can see ways things might turn out not so
well without us also having common agreement about how we operate on users@.
Andy
[1] and point to
https://stackoverflow.com/help/mcve
PS
There is also a theme of "ask first" before trying anything, or doing in
a few minutes investigation. Such emails are vague.
On 12/10/17 10:03, Rob Vesse wrote:
> Folks
>
>
>
> An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached.
>
>
>
> Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries?
>
>
>
> It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think
>
>
>
> Rob
>
>