You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joshua Edwards (JIRA)" <ji...@apache.org> on 2014/08/22 23:10:12 UTC

[jira] [Updated] (SOLR-6408) Numerical Synonyms

     [ https://issues.apache.org/jira/browse/SOLR-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joshua Edwards updated SOLR-6408:
---------------------------------

    Attachment: numerical_synonyms.patch

Attaching a patch that works for whole numbers.  Two filters work in tandem to make sure that numbers become synonymous whether they are written in text or numbers.  The filter that goes from number to text is able to handle floating point numbers.  

Currently, "and" is treated as non-numeric.  Ie, if someone types "three hundred and sixty" this will be treated as two distinct numbers.  I'm not sure what the best way of handling that is right now.  Grammatically, "three hundred and sixty" is two numbers, as "and" is only supposed to be used to add the decimal part of a number, but that doesn't mean people regularly use it anyway.

> Numerical Synonyms
> ------------------
>
>                 Key: SOLR-6408
>                 URL: https://issues.apache.org/jira/browse/SOLR-6408
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 4.9
>            Reporter: Joshua Edwards
>         Attachments: numerical_synonyms.patch
>
>
> I would like to be able to have numbers be synonymous with their textual versions so that the system can match these better.  Additionally, when parsing tokens, I would like the multiple tokens that represent a single entity to be combined so that they will match correctly.  For example, if a document has the phrase "three sixty", then I would like that to match if a user types 360, or "three hundred sixty," but not if the user searches for "three" or "sixty."  I would also like different ways of writing a number to be treated synonymously with these.  As an example, I would like to be able to search for "ten thousand," "10,000," or "10000" and get the same results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org