You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "viobade (JIRA)" <ji...@apache.org> on 2009/05/28 20:13:45 UTC
[jira] Created: (SOLR-1192) solr.NGramFilterFactory stops to index
the content if it find a token smaller than minim ngram size
solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
---------------------------------------------------------------------------------------------------
Key: SOLR-1192
URL: https://issues.apache.org/jira/browse/SOLR-1192
Project: Solr
Issue Type: Bug
Components: Analysis
Affects Versions: 1.3
Environment: any
Reporter: viobade
Fix For: 1.3
If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1192) solr.NGramFilterFactory stops to index
the content if it find a token smaller than minim ngram size
Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otis Gospodnetic resolved SOLR-1192.
------------------------------------
Resolution: Fixed
Should be taken care of with Lucene upgrade now.
> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-1192
> URL: https://issues.apache.org/jira/browse/SOLR-1192
> Project: Solr
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.3
> Environment: any
> Reporter: viobade
> Assignee: Otis Gospodnetic
> Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1192) solr.NGramFilterFactory stops to index
the content if it find a token smaller than minim ngram size
Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otis Gospodnetic updated SOLR-1192:
-----------------------------------
Fix Version/s: (was: 1.3)
1.4
> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-1192
> URL: https://issues.apache.org/jira/browse/SOLR-1192
> Project: Solr
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.3
> Environment: any
> Reporter: viobade
> Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1192) solr.NGramFilterFactory stops to index
the content if it find a token smaller than minim ngram size
Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otis Gospodnetic updated SOLR-1192:
-----------------------------------
That stems from Lucene, see LUCENE-1491.
> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-1192
> URL: https://issues.apache.org/jira/browse/SOLR-1192
> Project: Solr
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.3
> Environment: any
> Reporter: viobade
> Fix For: 1.3
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1192) solr.NGramFilterFactory stops to
index the content if it find a token smaller than minim ngram size
Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731082#action_12731082 ]
Otis Gospodnetic commented on SOLR-1192:
----------------------------------------
LUCENE-1491 fix is in Lucene repository now, so as soon as we pull new Lucene jars into Solr, I'll mark this as fixed. Feel free to test with local copies of the Lucene nightly jars tomorrow and report back.
> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-1192
> URL: https://issues.apache.org/jira/browse/SOLR-1192
> Project: Solr
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.3
> Environment: any
> Reporter: viobade
> Assignee: Otis Gospodnetic
> Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1192) solr.NGramFilterFactory stops to index
the content if it find a token smaller than minim ngram size
Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man reassigned SOLR-1192:
------------------------------
Assignee: Otis Gospodnetic
depends on LUCENE issue assigned to otis, so giving this to otis as well.
(allthough unless Otis decides to add a new option to the underlying filter in LUCENE-1491, fixing this issue should be a NOOP).
> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-1192
> URL: https://issues.apache.org/jira/browse/SOLR-1192
> Project: Solr
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.3
> Environment: any
> Reporter: viobade
> Assignee: Otis Gospodnetic
> Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1192) solr.NGramFilterFactory stops to
index the content if it find a token smaller than minim ngram size
Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731586#action_12731586 ]
Shalin Shekhar Mangar commented on SOLR-1192:
---------------------------------------------
Lucene has been upgraded in trunk, so we can mark this as fixed.
> solr.NGramFilterFactory stops to index the content if it find a token smaller than minim ngram size
> ---------------------------------------------------------------------------------------------------
>
> Key: SOLR-1192
> URL: https://issues.apache.org/jira/browse/SOLR-1192
> Project: Solr
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.3
> Environment: any
> Reporter: viobade
> Assignee: Otis Gospodnetic
> Fix For: 1.4
>
>
> If a field is split in tokens (by a tokenizer) and after that is aplied the NGramFilterFactory for these tokens...the indexing goes well while the length of the tokens is greater or equal with minim ngram size (ussually is 3). Otherwise the indexing breaks in this point and the rest of tokens are no more indexed. This behaviour can be easy observed with the analysis tool which is in Solr admin interface.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.