You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by "viobade (JIRA)" <ji...@apache.org> on 2009/05/28 20:13:45 UTC

[jira] Created: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
---------------------------------------------------------------------------------------------------

                 Key: SOLR-1192
                 URL: https://issues.apache.org/jira/browse/SOLR-1192
             Project: Solr
          Issue Type: Bug
          Components: Analysis
    Affects Versions: 1.3
         Environment: any
            Reporter: viobade
             Fix For: 1.3


If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens  are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic resolved SOLR-1192.
------------------------------------

    Resolution: Fixed

Should be taken care of with Lucene upgrade now.

> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1192
>                 URL: https://issues.apache.org/jira/browse/SOLR-1192
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.3
>         Environment: any
>            Reporter: viobade
>            Assignee: Otis Gospodnetic
>             Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens  are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated SOLR-1192:
-----------------------------------

    Fix Version/s:     (was: 1.3)
                   1.4

> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1192
>                 URL: https://issues.apache.org/jira/browse/SOLR-1192
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.3
>         Environment: any
>            Reporter: viobade
>             Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens  are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated SOLR-1192:
-----------------------------------


That stems from Lucene, see LUCENE-1491.


> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1192
>                 URL: https://issues.apache.org/jira/browse/SOLR-1192
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.3
>         Environment: any
>            Reporter: viobade
>             Fix For: 1.3
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens  are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731082#action_12731082 ] 

Otis Gospodnetic commented on SOLR-1192:
----------------------------------------

LUCENE-1491 fix is in Lucene repository now, so as soon as we pull new Lucene jars into Solr, I'll mark this as fixed.  Feel free to test with local copies of the Lucene nightly jars tomorrow and report back.


> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1192
>                 URL: https://issues.apache.org/jira/browse/SOLR-1192
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.3
>         Environment: any
>            Reporter: viobade
>            Assignee: Otis Gospodnetic
>             Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens  are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man reassigned SOLR-1192:
------------------------------

    Assignee: Otis Gospodnetic

depends on LUCENE issue assigned to otis, so giving this to otis as well.

(allthough unless Otis decides to add a new option to the underlying filter in LUCENE-1491, fixing this issue should be a NOOP).

> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1192
>                 URL: https://issues.apache.org/jira/browse/SOLR-1192
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.3
>         Environment: any
>            Reporter: viobade
>            Assignee: Otis Gospodnetic
>             Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens  are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1192) solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731586#action_12731586 ] 

Shalin Shekhar Mangar commented on SOLR-1192:
---------------------------------------------

Lucene has been upgraded in trunk, so we can mark this as fixed.

> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1192
>                 URL: https://issues.apache.org/jira/browse/SOLR-1192
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.3
>         Environment: any
>            Reporter: viobade
>            Assignee: Otis Gospodnetic
>             Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens  are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.