You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/10/08 16:20:42 UTC

Effect of multiple white space at WhiteSpaceTokenizer

I use Solr 4.5 and I have a WhiteSpaceTokenizer at my schema. What is the
difference (index size and performance) for that two sentences:

First one: This is a sentence.
Second one: This       is         a                          sentence.

RE: Effect of multiple white space at WhiteSpaceTokenizer

Posted by Markus Jelsma <ma...@openindex.io>.
Result is the same and performance difference should be negligible, unless you're uploading megabytes of white space. Consecutive white space should be collapsed outside of Solr/Lucene anyway because it'll end up in your stored field. Index size will be slightly bigger but not much due to compression.
 
-----Original message-----
> From:Furkan KAMACI <fu...@gmail.com>
> Sent: Tuesday 8th October 2013 16:21
> To: solr-user@lucene.apache.org
> Subject: Effect of multiple white space at WhiteSpaceTokenizer
> 
> I use Solr 4.5 and I have a WhiteSpaceTokenizer at my schema. What is the
> difference (index size and performance) for that two sentences:
> 
> First one: This is a sentence.
> Second one: This       is         a                          sentence.
>