You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Michael Ryan <mr...@moreover.com> on 2014/07/01 15:24:29 UTC

RE: Multiterm analysis in complexphrase query

Thanks. This looks interesting...

-Michael

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Monday, June 30, 2014 8:15 AM
To: solr-user@lucene.apache.org
Subject: RE: Multiterm analysis in complexphrase query

Ahmet, please correct me if I'm wrong, but the ComplexPhraseQueryParser does not perform analysis (as you, Michael, point out).  The SpanQueryParser in LUCENE-5205 does perform analysis and might meet your needs.  Work on it has gone on pause, though, so you'll have to build from the patch or the LUCENE-5205 branch.  Let me know if you have any questions.

LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and make it available to all parsers that use QueryParserBase, including the ComplexPhraseQueryParser.

Best,

        Tim

-----Original Message-----
From: Michael Ryan [mailto:mryan@moreover.com] 
Sent: Sunday, June 29, 2014 11:09 AM
To: solr-user@lucene.apache.org
Subject: Multiterm analysis in complexphrase query

I've been using a modified version of the complex phrase query parser patch from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6, and I'm currently upgrading to 4.9, which has this built-in.

I'm having trouble with using accents in wildcard queries, support for which was added in https://issues.apache.org/jira/browse/SOLR-2438. In 3.6, I was using a modified version of SolrQueryParser, which simply used ComplexPhraseQueryParser in place of QueryParser. In the version of ComplexPhraseQParserPlugin in 4.9, it just directly uses ComplexPhraseQueryParser, and doesn't go through SolrQueryParser at all. SolrQueryParserBase.analyzeIfMultitermTermText() is where the multiterm analysis magic happens.

So, my problem is that ComplexPhraseQParserPlugin/ComplexPhraseQueryParser doesn't use SolrQueryParserBase, which breaks doing fun things like this:
{!complexPhrase}"barac* óba*a"
And expecting it to match "Barack Obama".

Anyone run into this before, or have a way to get this working?

-Michael

RE: Multiterm analysis in complexphrase query

Posted by "Allison, Timothy B." <ta...@mitre.org>.

Hi Gopal,

I just started a repository on github (https://github.com/tballison/tallison-lucene-addons) to host a standalone version of LUCENE-5205 (with other patches to come).  SOLR-5410 is next (Solr wrapper of the SpanQueryParser), and then I'll try to add LUCENE-5317 (concordance) and LUCENE-5318 (co-occurrence) over the next week or so.

The code in this repository is "standalone" (not a fork of lucene-solr)and is aimed at the most recent stable release of Lucene/Solr.

For "trunk" versions of this code, check out the lucene5205 branch of my lucene-solr fork.

Much more work remains.

-----Original Message-----
From: Gopal Agarwal [mailto:gopal.agarwal3@gmail.com] 
Sent: Monday, July 21, 2014 5:04 PM
To: solr-user@lucene.apache.org
Subject: RE: Multiterm analysis in complexphrase query

That would be really useful.

Can you upload the jar and its requirements?

It also makes it pluggable with diff versions of solr.
 On Jul 1, 2014 9:01 PM, "Allison, Timothy B." <ta...@mitre.org> wrote:

> If there's enough interest, I might get back into the code and throw a
> standalone src (and jar) of the SpanQueryParser and the Solr wrapper onto
> github.  That would make it more widely available until there's a chance to
> integrate it into Lucene/Solr.  If you'd be interested in this, let me know
> (and/or vote on the issue pages on Jira).
>
> Best,
>
>        Tim
>
> -----Original Message-----
> From: Michael Ryan [mailto:mryan@moreover.com]
> Sent: Tuesday, July 01, 2014 9:24 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Multiterm analysis in complexphrase query
>
> Thanks. This looks interesting...
>
> -Michael
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Monday, June 30, 2014 8:15 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Multiterm analysis in complexphrase query
>
> Ahmet, please correct me if I'm wrong, but the ComplexPhraseQueryParser
> does not perform analysis (as you, Michael, point out).  The
> SpanQueryParser in LUCENE-5205 does perform analysis and might meet your
> needs.  Work on it has gone on pause, though, so you'll have to build from
> the patch or the LUCENE-5205 branch.  Let me know if you have any questions.
>
> LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and
> make it available to all parsers that use QueryParserBase, including the
> ComplexPhraseQueryParser.
>
> Best,
>
>         Tim
>
> -----Original Message-----
> From: Michael Ryan [mailto:mryan@moreover.com]
> Sent: Sunday, June 29, 2014 11:09 AM
> To: solr-user@lucene.apache.org
> Subject: Multiterm analysis in complexphrase query
>
> I've been using a modified version of the complex phrase query parser
> patch from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6,
> and I'm currently upgrading to 4.9, which has this built-in.
>
> I'm having trouble with using accents in wildcard queries, support for
> which was added in https://issues.apache.org/jira/browse/SOLR-2438. In
> 3.6, I was using a modified version of SolrQueryParser, which simply used
> ComplexPhraseQueryParser in place of QueryParser. In the version of
> ComplexPhraseQParserPlugin in 4.9, it just directly uses
> ComplexPhraseQueryParser, and doesn't go through SolrQueryParser at all.
> SolrQueryParserBase.analyzeIfMultitermTermText() is where the multiterm
> analysis magic happens.
>
> So, my problem is that ComplexPhraseQParserPlugin/ComplexPhraseQueryParser
> doesn't use SolrQueryParserBase, which breaks doing fun things like this:
> {!complexPhrase}"barac* óba*a"
> And expecting it to match "Barack Obama".
>
> Anyone run into this before, or have a way to get this working?
>
> -Michael
>

RE: Multiterm analysis in complexphrase query

Posted by Gopal Agarwal <go...@gmail.com>.

That would be really useful.

Can you upload the jar and its requirements?

It also makes it pluggable with diff versions of solr.
 On Jul 1, 2014 9:01 PM, "Allison, Timothy B." <ta...@mitre.org> wrote:

> If there's enough interest, I might get back into the code and throw a
> standalone src (and jar) of the SpanQueryParser and the Solr wrapper onto
> github.  That would make it more widely available until there's a chance to
> integrate it into Lucene/Solr.  If you'd be interested in this, let me know
> (and/or vote on the issue pages on Jira).
>
> Best,
>
>        Tim
>
> -----Original Message-----
> From: Michael Ryan [mailto:mryan@moreover.com]
> Sent: Tuesday, July 01, 2014 9:24 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Multiterm analysis in complexphrase query
>
> Thanks. This looks interesting...
>
> -Michael
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Monday, June 30, 2014 8:15 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Multiterm analysis in complexphrase query
>
> Ahmet, please correct me if I'm wrong, but the ComplexPhraseQueryParser
> does not perform analysis (as you, Michael, point out).  The
> SpanQueryParser in LUCENE-5205 does perform analysis and might meet your
> needs.  Work on it has gone on pause, though, so you'll have to build from
> the patch or the LUCENE-5205 branch.  Let me know if you have any questions.
>
> LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and
> make it available to all parsers that use QueryParserBase, including the
> ComplexPhraseQueryParser.
>
> Best,
>
>         Tim
>
> -----Original Message-----
> From: Michael Ryan [mailto:mryan@moreover.com]
> Sent: Sunday, June 29, 2014 11:09 AM
> To: solr-user@lucene.apache.org
> Subject: Multiterm analysis in complexphrase query
>
> I've been using a modified version of the complex phrase query parser
> patch from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6,
> and I'm currently upgrading to 4.9, which has this built-in.
>
> I'm having trouble with using accents in wildcard queries, support for
> which was added in https://issues.apache.org/jira/browse/SOLR-2438. In
> 3.6, I was using a modified version of SolrQueryParser, which simply used
> ComplexPhraseQueryParser in place of QueryParser. In the version of
> ComplexPhraseQParserPlugin in 4.9, it just directly uses
> ComplexPhraseQueryParser, and doesn't go through SolrQueryParser at all.
> SolrQueryParserBase.analyzeIfMultitermTermText() is where the multiterm
> analysis magic happens.
>
> So, my problem is that ComplexPhraseQParserPlugin/ComplexPhraseQueryParser
> doesn't use SolrQueryParserBase, which breaks doing fun things like this:
> {!complexPhrase}"barac* óba*a"
> And expecting it to match "Barack Obama".
>
> Anyone run into this before, or have a way to get this working?
>
> -Michael
>

RE: Multiterm analysis in complexphrase query

Posted by "Allison, Timothy B." <ta...@mitre.org>.

If there's enough interest, I might get back into the code and throw a standalone src (and jar) of the SpanQueryParser and the Solr wrapper onto github.  That would make it more widely available until there's a chance to integrate it into Lucene/Solr.  If you'd be interested in this, let me know (and/or vote on the issue pages on Jira).

Best,

       Tim

-----Original Message-----
From: Michael Ryan [mailto:mryan@moreover.com] 
Sent: Tuesday, July 01, 2014 9:24 AM
To: solr-user@lucene.apache.org
Subject: RE: Multiterm analysis in complexphrase query

Thanks. This looks interesting...

-Michael

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Monday, June 30, 2014 8:15 AM
To: solr-user@lucene.apache.org
Subject: RE: Multiterm analysis in complexphrase query

Ahmet, please correct me if I'm wrong, but the ComplexPhraseQueryParser does not perform analysis (as you, Michael, point out).  The SpanQueryParser in LUCENE-5205 does perform analysis and might meet your needs.  Work on it has gone on pause, though, so you'll have to build from the patch or the LUCENE-5205 branch.  Let me know if you have any questions.

LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and make it available to all parsers that use QueryParserBase, including the ComplexPhraseQueryParser.

Best,

        Tim

-----Original Message-----
From: Michael Ryan [mailto:mryan@moreover.com] 
Sent: Sunday, June 29, 2014 11:09 AM
To: solr-user@lucene.apache.org
Subject: Multiterm analysis in complexphrase query

I've been using a modified version of the complex phrase query parser patch from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6, and I'm currently upgrading to 4.9, which has this built-in.

I'm having trouble with using accents in wildcard queries, support for which was added in https://issues.apache.org/jira/browse/SOLR-2438. In 3.6, I was using a modified version of SolrQueryParser, which simply used ComplexPhraseQueryParser in place of QueryParser. In the version of ComplexPhraseQParserPlugin in 4.9, it just directly uses ComplexPhraseQueryParser, and doesn't go through SolrQueryParser at all. SolrQueryParserBase.analyzeIfMultitermTermText() is where the multiterm analysis magic happens.

So, my problem is that ComplexPhraseQParserPlugin/ComplexPhraseQueryParser doesn't use SolrQueryParserBase, which breaks doing fun things like this:
{!complexPhrase}"barac* óba*a"
And expecting it to match "Barack Obama".

Anyone run into this before, or have a way to get this working?

-Michael