You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Despot Jakimovski (JIRA)" <ji...@apache.org> on 2012/06/25 19:11:45 UTC

[jira] [Created] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

Despot Jakimovski created SOLR-3574:
---------------------------------------

             Summary: Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
                 Key: SOLR-3574
                 URL: https://issues.apache.org/jira/browse/SOLR-3574
             Project: Solr
          Issue Type: New Feature
          Components: SearchComponents - other
    Affects Versions: 3.6
            Reporter: Despot Jakimovski
             Fix For: 3.6


When having the following use case:
We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.

When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 

We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

Posted by "Despot Jakimovski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Despot Jakimovski updated SOLR-3574:
------------------------------------

    Fix Version/s:     (was: 4.0)
                   5.0
    
> Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3574
>                 URL: https://issues.apache.org/jira/browse/SOLR-3574
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>    Affects Versions: 5.0
>            Reporter: Despot Jakimovski
>            Assignee: Despot Jakimovski
>              Labels: compound-word, dictionary, feature, filter, word-exception
>             Fix For: 5.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When having the following use case:
> We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.
> When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 
> We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.
> More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

Posted by "Despot Jakimovski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Despot Jakimovski updated SOLR-3574:
------------------------------------

    Affects Version/s:     (was: 4.1)
                           (was: 4.0-ALPHA)
    
> Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3574
>                 URL: https://issues.apache.org/jira/browse/SOLR-3574
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>    Affects Versions: 5.0
>            Reporter: Despot Jakimovski
>            Assignee: Despot Jakimovski
>              Labels: compound-word, dictionary, feature, filter, word-exception
>             Fix For: 5.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When having the following use case:
> We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.
> When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 
> We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.
> More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

Posted by "Despot Jakimovski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Despot Jakimovski reassigned SOLR-3574:
---------------------------------------

    Assignee: Despot Jakimovski
    
> Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3574
>                 URL: https://issues.apache.org/jira/browse/SOLR-3574
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: Despot Jakimovski
>            Assignee: Despot Jakimovski
>              Labels: compound-word, dictionary, feature, filter, word-exception
>             Fix For: 4.0, 4.1, 5.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When having the following use case:
> We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.
> When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 
> We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.
> More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

Posted by "Despot Jakimovski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Despot Jakimovski updated SOLR-3574:
------------------------------------

    Attachment: SOLR-3574.patch
    
> Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3574
>                 URL: https://issues.apache.org/jira/browse/SOLR-3574
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>    Affects Versions: 5.0
>            Reporter: Despot Jakimovski
>            Assignee: Despot Jakimovski
>              Labels: compound-word, dictionary, feature, filter, word-exception
>             Fix For: 5.0
>
>         Attachments: SOLR-3574.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When having the following use case:
> We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.
> When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 
> We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.
> More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

Posted by "Despot Jakimovski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447836#comment-13447836 ] 

Despot Jakimovski commented on SOLR-3574:
-----------------------------------------

I just added a patch from the implementation and tests of the new feature described above. (Can't see a Log Work button though :( )
                
> Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3574
>                 URL: https://issues.apache.org/jira/browse/SOLR-3574
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>    Affects Versions: 5.0
>            Reporter: Despot Jakimovski
>            Assignee: Despot Jakimovski
>              Labels: compound-word, dictionary, feature, filter, word-exception
>             Fix For: 5.0
>
>         Attachments: SOLR-3574.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When having the following use case:
> We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.
> When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 
> We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.
> More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

Posted by "Despot Jakimovski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Despot Jakimovski updated SOLR-3574:
------------------------------------

    Description: 
When having the following use case:
We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.

When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 

We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.

More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

  was:
When having the following use case:
We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.

When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 

We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.



    
> Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3574
>                 URL: https://issues.apache.org/jira/browse/SOLR-3574
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>    Affects Versions: 3.6
>            Reporter: Despot Jakimovski
>              Labels: compound-word, dictionary, feature, filter, word-exception
>             Fix For: 3.6
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When having the following use case:
> We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.
> When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 
> We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.
> More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

Posted by "Despot Jakimovski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Despot Jakimovski updated SOLR-3574:
------------------------------------

        Fix Version/s:     (was: 3.6)
                       5.0
                       4.1
                       4.0
    Affects Version/s:     (was: 3.6)
                       5.0
                       4.1
                       4.0
    
> Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3574
>                 URL: https://issues.apache.org/jira/browse/SOLR-3574
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: Despot Jakimovski
>              Labels: compound-word, dictionary, feature, filter, word-exception
>             Fix For: 4.0, 4.1, 5.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When having the following use case:
> We have 2 words "penslot" and "knoppen". One of them presents a compound word ("penslot"), the other one is a plural form of knop.
> When using the compound word filter, if we place the words "pen" "slot" and "knop" in the dictionary, for a search containing "knoppen", we get results containing "pen" also, which shouldn't be the case, because "knoppen" is only a plural form (not a compound word). 
> We need another dictionary to specify the words that are exceptions to the filter (like in this case "knoppen"). In this case, the filter would find compound words containing "pen" "slot" and "knop", but will leave out dividing "knoppen" and searching on its parts.
> More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org