You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Andy Hind <An...@oracle.com> on 2018/10/15 20:38:17 UTC

LSH/MinHash

Hi All

Following on from https://issues.apache.org/jira/browse/LUCENE-6968 <https://issues.apache.org/jira/browse/LUCENE-6968> (I know it’s been a while…)
I have a QParser plugin that can generate the appropriate banded queries for Jaccard similarity.

It covers the same functionality that was proposed in the original issue but wrapped up as a query parser.
There are two analysis cases and two query cases.. Hashes generated by tokenisation or those generated by pre-analysis. Queries based on text or provided hash values.

If there is interest, I will create the issue and put up the patch.

Regards

Andy 



Re: LSH/MinHash

Posted by Tommaso Teofili <to...@gmail.com>.
Hi Andy,

It would be very nice if you could do that and I'd be very interested
in reviewing and helping out with the patch.
I have been using that filter for a while with my own query bits; a
full fledged query parser would surely be a very useful contribution.

Regards,
Tommaso
Il giorno lun 15 ott 2018 alle ore 22:38 Andy Hind
<An...@oracle.com> ha scritto:
>
> Hi All
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968 (I know it’s been a while…)
> I have a QParser plugin that can generate the appropriate banded queries for Jaccard similarity.
>
> It covers the same functionality that was proposed in the original issue but wrapped up as a query parser.
> There are two analysis cases and two query cases.. Hashes generated by tokenisation or those generated by pre-analysis. Queries based on text or provided hash values.
>
> If there is interest, I will create the issue and put up the patch.
>
> Regards
>
> Andy
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org