You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mike (Created) (JIRA)" <ji...@apache.org> on 2012/02/06 03:29:59 UTC

[jira] [Created] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Add query operator, index structure, and analyzer for "exact match" searching
-----------------------------------------------------------------------------

                 Key: SOLR-3099
                 URL: https://issues.apache.org/jira/browse/SOLR-3099
             Project: Solr
          Issue Type: New Feature
          Components: Schema and Analysis
            Reporter: Mike


A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".

In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 

What I'd like instead is two things:
1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.

This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-3099:
------------------------------

         Fix Version/s: 4.0
    Remaining Estimate:     (was: 4h)
     Original Estimate:     (was: 4h)

This is wanted. Scheduling for 4.x. I think for this to work we need some better metadata support in analysis? Currently you can tag a token with a TOKENTYPE, so the stemmer could add the stemmed token on same position with tokentype=stem. However, we'd need a way to convey from the query that [=foo] should NOT match the "stem" token types?

Also, could we not simply adopt Google's syntax, i.e. if a single token is quoted, it is searched verbatim, e.g. foo "bar".
                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>             Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251852#comment-13251852 ] 

Robert Muir commented on SOLR-3099:
-----------------------------------

How will it save space?

                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>             Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Jan Høydahl (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251930#comment-13251930 ] 

Jan Høydahl commented on SOLR-3099:
-----------------------------------

The stored part will be duplicated, and to support highlighting for a multiple field solution you need to do extra programming to merge the highlights from each field. It won't give *more* query features, but will work more nicely together with existing features. I'm working towards support for stuff like {{foo ONEAR/10 "bar"}}, a span query between the two terms where "bar" should then be matched literally - spans would not work across words in different fields.

Instead of assuming that we'd *complicate* analysis as you're afraid of, we should work on simplifying and refactoring analysis to make it more flexible and easier to work with, implementing features like this. Other stuff that could be useful in analysis is a graph structure instead of the current linear one to be able to overlay "New York" as a synonym for "NY" on the same position offset even if they have different number of tokens; or to attach metadata to field input e.g. to signify that the input is pre-tokenized.

Also note that at this stage of this issue we're just discussing possible ways forward, any implementation details are still left to decide...
                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>             Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Mike (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike updated SOLR-3099:
-----------------------

    Issue Type: Sub-task  (was: New Feature)
        Parent: SOLR-3028
    
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251533#comment-13251533 ] 

Robert Muir commented on SOLR-3099:
-----------------------------------

{quote}
Currently you can tag a token with a TOKENTYPE, so the stemmer could add the stemmed token on same position with tokentype=stem.
{quote}

This is not the way to go, for many reasons, its been brought up many times before.

This feature already works. Just use a separate field. Stacking tokens on top of each other
will be about the same size in the index anyway, since its an inverted index.

stemmedBody = stemmed field
exactBody = unstemmed field.

Now i have an exact operator, "exactBody:stuff" that works.

                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>             Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Mike (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252185#comment-13252185 ] 

Mike commented on SOLR-3099:
----------------------------

>From my perspective as the OP, my goals for this feature are simple:
 - I want my users to have an exact match operator
 - I want my users to have it work in an identical fashion to regular terms
 - I want highlighting to work without a bunch of extra coding effort

I just drew out various ways to make up the index structures, and performance-wise I think agree with Robert that it makes most sense to not change the way we index things, and to instead ask people to just have a stemmed and an unstemmed index that they can query against. The arguments against that are (1) if we introduce a query operator for exact match (which I think is vital for this request), it'd be awkward to have an operator determine which index is queried. (2), I can't imagine how highlighting would work with such a configuration.

Also, I *don't* think we can ask users to do queries like [ unstemmedBody:foo ]. Two reasons:
 1. That sucks, and users probably won't use it.
 2. Highlighting will break. 

Anyway, hopefully this is helpful...maybe you guys have already thought through this. But from my perspective, we have two options for implementing this:
 1. Don't change index structures at the risk of having a query operator change which index a query goes against, probably breaking highlighting; or
 2. Change the index structures so they can have the unstemmed tokens at the risk of added complexity in the index and a possible performance impact.

As far as the analyzer goes, I don't have a horse in the race, provided it works. Seems like adding an exact match operator to it shouldn't be terribly hard, but I haven't delved into it myself.


                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>             Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251933#comment-13251933 ] 

Robert Muir commented on SOLR-3099:
-----------------------------------

{quote}
The stored part will be duplicated, and to support highlighting for a multiple field solution you need to do extra programming to merge the highlights from each field.
{quote}

Wait, why would you duplicate that? just store it once.

if the highlighter cannot deal with the fact that foo_unstemmed and foo_stemmed have the same stored content only in one field (called whatever, i dont care), then thats a highlighter problem.

Its not something to be worked around by making analyzers more complicated or screwing up scoring by injecting things.

{quote}
Instead of assuming that we'd complicate analysis as you're afraid of, we should work on simplifying and refactoring analysis to make it more flexible and easier to work with, implementing features like this. Other stuff that could be useful in analysis is a graph structure instead of the current linear one to be able to overlay "New York" as a synonym for "NY" on the same position offset even if they have different number of tokens
{quote}

Who is doing the assumptions? this has already happened: its called PositionLengthAttribute and is already in 3.6 and trunk...

                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>             Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251854#comment-13251854 ] 

Robert Muir commented on SOLR-3099:
-----------------------------------

And i totally disagree this is 'wanted', you already have this, its called separate fields in lucene.

its inverted, so its basically the same postings either way, whether you duplicate these inside the same
field or use a different one:

* it *won't* save space
* it *won't* be more performant (it will be slower)
* it *won't* give you more query features (all of lucene's queries support 'field' parameter already)

On the other hand, doing this will only make the analyzers more complicated. Currently I've spent 
the first part of this week tracking down and fixing bugs in the analyzers.

Bottom line: we can't make the analyzers more complicated because people are afraid that using an
extra field costs more than 'injecting' additional terms: it doesn't. The analyzers are already
too complicated.


                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>             Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-3099:
---------------------------

    Fix Version/s:     (was: 4.0)

removing fixVersion=4.0 since there is no evidence that anyone is currently working on this issue.  (this can certainly be revisited if volunteers step forward)


                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3099) Add query operator, index structure, and analyzer for "exact match" searching

Posted by "Jan Høydahl (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251844#comment-13251844 ] 

Jan Høydahl commented on SOLR-3099:
-----------------------------------

The duplicate field trick is well known. This issue is specifically about *exploring* native support in the original field for this. A native solution *will* save space, be easier to understand/use, be more performant, support more query features (such as spans) etc. This is a community; please let's try to contribute towards a improving the design instead of dismissing it as "not the way to go".
                
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-3099
>                 URL: https://issues.apache.org/jira/browse/SOLR-3099
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Mike
>             Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. 
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index along with the stemmed variations." If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org