You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by neosky <ne...@yahoo.com> on 2012/04/07 19:15:06 UTC

Two questions about the Ngramtokenizerfactory

I use the solr 3.5 version
1. It seems that the Ngramtokenizerfactory only token the first 1024
characters. I search the problem on the Internet, somebody had noticed the
bug in 2007, but I can't find the solution.
ps:  my max field length has been modified 
<maxFieldLength>50000</maxFieldLength>
This is very critical for me.

2.the second questions that when I defines the 
 minGramSize=3
 maxGramSize=8
 what happens when I search a query length is 5. Does it work?
My consideration is to use the copyfiled to specify the gram from 3,8, I am
not sure it is a solution.I am very worry about the index speed. I spend
more than 6 hours to index the gram from 7,8 for testing.
Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/Two-questions-about-the-Ngramtokenizerfactory-tp3893045p3893045.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Two questions about the Ngramtokenizerfactory

Posted by neosky <ne...@yahoo.com>.
neosky wrote
> 
> I use the solr 3.5 version
> 1. It seems that the Ngramtokenizerfactory only token the first 1024
> characters. I search the problem on the Internet, somebody had noticed the
> bug in 2007, but I can't find the solution.
> ps:  my max field length has been modified 
> <maxFieldLength>50000</maxFieldLength>
> This is very critical for me.
> 
> It is not fixed as I know. In the NGramTokenizer
>  char[] chars = new char[1024];
>       input.read(chars);
> but I don't know what's the different between NGramTokenizer and
> NGramTokenFilter
> suppose I want to write my Analyzier which should I use?
> 
> 
> 2.the second questions that when I defines the 
>  minGramSize=3
>  maxGramSize=8
>  what happens when I search a query length is 5. Does it work?
> My consideration is to use the copyfiled to specify the gram from 3,8, I
> am not sure it is a solution.I am very worry about the index speed. I
> spend more than 6 hours to index the gram from 7,8 for testing.
> Thanks!
> 

I still need time to index to test.

--
View this message in context: http://lucene.472066.n3.nabble.com/Two-questions-about-the-Ngramtokenizerfactory-tp3893045p3894851.html
Sent from the Solr - User mailing list archive at Nabble.com.