You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arnon Yogev <AR...@il.ibm.com> on 2015/10/12 09:33:22 UTC

Spell Check and Privacy

Hi,

Our system supports many users from different organizations and with 
different ACLs. 
We consider adding a spell check ("did you mean") functionality using 
DirectSolrSpellChecker. However, a privacy concern was raised, as this 
might lead to private information being revealed between users via the 
suggested terms. Using the FileBasedSpellChecker is another option, but 
naturally a static list of terms is not optimal.

Is there a best practice or a suggested method for these kind of cases?

Thanks,
Arnon

Re: Spell Check and Privacy

Posted by Alessandro Benedetti <be...@gmail.com>.
We had the very exact issue and we solved as James suggested :)

To answer Susheel, the requirement is to provide users with the only
suggestions he should see.
It can seem a paranoid request but can happen that we don't want to show
any of the indexed data for different users.
In enterprise search you are able to see only the documents you expect to
see, and the same is valid for autocompletion and spellchecking.
Time ago I was thinking to provide a filter query approach for
spellchecking and autocomplete, maybe I will return to think about it later.

Cheers

On 12 October 2015 at 15:36, Susheel Kumar <su...@gmail.com> wrote:

> Hi Arnon,
>
> I couldn't fully understood your use case regarding Privacy. Are you
> concerned that SpellCheck may reveal user names part of suggestions which
> could have belonged to different organizations / ACLS OR after providing
> suggestions you are concerned that user may be able to click and view other
> organization users?
>
> Please provide some details on your concern for Privacy with Spell Checker.
>
> Thanks,
> Susheel
>
> On Mon, Oct 12, 2015 at 9:45 AM, Dyer, James <James.Dyer@ingramcontent.com
> >
> wrote:
>
> > Arnon,
> >
> > Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to
> a
> > non-zero value.  This will give you re-written queries that are
> guaranteed
> > to return hits, given the original query and filters.  If you are using
> an
> > "mm" value other than 100%, you also will want specify "
> > spellcheck.collateParam.mm=100%". (or if using "q.op=OR", then use
> > "spellcheck.collateParam.q.op=AND")
> >
> > Of course, the first section of the spellcheck result will still show
> > every possible suggestion, so your client needs to discard these and not
> > divulge them to the user.  If you need to know word-by-word how the
> > collations were constructed, then specify
> > "spellcheck.collateExtendedResults=true".  Use the extended collation
> > results for this information and not the first section of the spellcheck
> > results.
> >
> > This is all fairly well-documented on the old solr wiki:
> > https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate
> >
> > James Dyer
> > Ingram Content Group
> >
> > -----Original Message-----
> > From: Arnon Yogev [mailto:ARNONY@il.ibm.com]
> > Sent: Monday, October 12, 2015 2:33 AM
> > To: solr-user@lucene.apache.org
> > Subject: Spell Check and Privacy
> >
> > Hi,
> >
> > Our system supports many users from different organizations and with
> > different ACLs.
> > We consider adding a spell check ("did you mean") functionality using
> > DirectSolrSpellChecker. However, a privacy concern was raised, as this
> > might lead to private information being revealed between users via the
> > suggested terms. Using the FileBasedSpellChecker is another option, but
> > naturally a static list of terms is not optimal.
> >
> > Is there a best practice or a suggested method for these kind of cases?
> >
> > Thanks,
> > Arnon
> >
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Spell Check and Privacy

Posted by Susheel Kumar <su...@gmail.com>.
Hi Arnon,

I couldn't fully understood your use case regarding Privacy. Are you
concerned that SpellCheck may reveal user names part of suggestions which
could have belonged to different organizations / ACLS OR after providing
suggestions you are concerned that user may be able to click and view other
organization users?

Please provide some details on your concern for Privacy with Spell Checker.

Thanks,
Susheel

On Mon, Oct 12, 2015 at 9:45 AM, Dyer, James <Ja...@ingramcontent.com>
wrote:

> Arnon,
>
> Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to a
> non-zero value.  This will give you re-written queries that are guaranteed
> to return hits, given the original query and filters.  If you are using an
> "mm" value other than 100%, you also will want specify "
> spellcheck.collateParam.mm=100%". (or if using "q.op=OR", then use
> "spellcheck.collateParam.q.op=AND")
>
> Of course, the first section of the spellcheck result will still show
> every possible suggestion, so your client needs to discard these and not
> divulge them to the user.  If you need to know word-by-word how the
> collations were constructed, then specify
> "spellcheck.collateExtendedResults=true".  Use the extended collation
> results for this information and not the first section of the spellcheck
> results.
>
> This is all fairly well-documented on the old solr wiki:
> https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate
>
> James Dyer
> Ingram Content Group
>
> -----Original Message-----
> From: Arnon Yogev [mailto:ARNONY@il.ibm.com]
> Sent: Monday, October 12, 2015 2:33 AM
> To: solr-user@lucene.apache.org
> Subject: Spell Check and Privacy
>
> Hi,
>
> Our system supports many users from different organizations and with
> different ACLs.
> We consider adding a spell check ("did you mean") functionality using
> DirectSolrSpellChecker. However, a privacy concern was raised, as this
> might lead to private information being revealed between users via the
> suggested terms. Using the FileBasedSpellChecker is another option, but
> naturally a static list of terms is not optimal.
>
> Is there a best practice or a suggested method for these kind of cases?
>
> Thanks,
> Arnon
>
>

RE: Spell Check and Privacy

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Arnon,

Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to a non-zero value.  This will give you re-written queries that are guaranteed to return hits, given the original query and filters.  If you are using an "mm" value other than 100%, you also will want specify "spellcheck.collateParam.mm=100%". (or if using "q.op=OR", then use "spellcheck.collateParam.q.op=AND")

Of course, the first section of the spellcheck result will still show every possible suggestion, so your client needs to discard these and not divulge them to the user.  If you need to know word-by-word how the collations were constructed, then specify "spellcheck.collateExtendedResults=true".  Use the extended collation results for this information and not the first section of the spellcheck results.

This is all fairly well-documented on the old solr wiki:  https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

James Dyer
Ingram Content Group

-----Original Message-----
From: Arnon Yogev [mailto:ARNONY@il.ibm.com] 
Sent: Monday, October 12, 2015 2:33 AM
To: solr-user@lucene.apache.org
Subject: Spell Check and Privacy

Hi,

Our system supports many users from different organizations and with 
different ACLs. 
We consider adding a spell check ("did you mean") functionality using 
DirectSolrSpellChecker. However, a privacy concern was raised, as this 
might lead to private information being revealed between users via the 
suggested terms. Using the FileBasedSpellChecker is another option, but 
naturally a static list of terms is not optimal.

Is there a best practice or a suggested method for these kind of cases?

Thanks,
Arnon