You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by revas <re...@gmail.com> on 2009/06/09 07:45:47 UTC

spellcheck /too many open files

Hi ,

1)Does the spell check component support all languages?


2) I have a scnenario where i have abt 20 webapps in  a single container.We
get too many open files at index time /while restarting tomcat.

The mergefactor is at default.

If i reduce the merge factor to 2 and optimize the index ,will the open
files be closed automatically or would i have to reindex to close the open
files or  how do i close the already opened files.This is on linux with solr
1.3 and tomcat 5.5

Regards
Revas

Re: spellcheck /too many open files

Posted by revas <re...@gmail.com>.

Thanks

On Tue, Jun 9, 2009 at 5:14 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 4:32 PM, revas <re...@gmail.com> wrote:
>
> > Thanks Shalin....When we use the external  file  dictionary (if there is
> > one),then it should work fine ,right for spell check,also is there any
> > format for this file
> >
>
> The external file should have one token per line. See
> http://wiki.apache.org/solr/FileBasedSpellChecker
>
> The default analyzer is WhitespaceAnalyzer. So all tokens in the file will
> be split on whitespace and the resulting tokens will be used for giving
> suggestions. If you want to change the analyzer, specify fieldType in the
> spell checker configuration and the component will use the analyzer
> configured for that field type.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: spellcheck /too many open files

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Jun 9, 2009 at 4:32 PM, revas <re...@gmail.com> wrote:

> Thanks Shalin....When we use the external  file  dictionary (if there is
> one),then it should work fine ,right for spell check,also is there any
> format for this file
>

The external file should have one token per line. See
http://wiki.apache.org/solr/FileBasedSpellChecker

The default analyzer is WhitespaceAnalyzer. So all tokens in the file will
be split on whitespace and the resulting tokens will be used for giving
suggestions. If you want to change the analyzer, specify fieldType in the
spell checker configuration and the component will use the analyzer
configured for that field type.

-- 
Regards,
Shalin Shekhar Mangar.

Re: spellcheck /too many open files

Posted by revas <re...@gmail.com>.

Thanks Shalin....When we use the external  file  dictionary (if there is
one),then it should work fine ,right for spell check,also is there any
format for this file

Regards
Sujatha

On Tue, Jun 9, 2009 at 3:03 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 2:56 PM, revas <re...@gmail.com> wrote:
>
> > But the spell check componenet uses the n-gram analyzer and henc should
> > work
> > for any language ,is this correct ,also we can refer an extern dictionary
> > for suggestions ,could this be in any language?
> >
>
> Yes it does use n-grams but there's an analysis step before the n-grams are
> created. For example, if you are creating your spell check index from a
> Solr
> field, SpellCheckComponent uses that field's index time analyzer. So you
> should create your language-specific fields in such a way that the analysis
> works correctly for that language.
>
>
> > The open files is not because of spell check as we have not yet
> implemented
> > this yet, every time we restart solr we need to up the ulimit ,otherwise
> it
> > does not work,so is there any workaround to permanently close this open
> > files ,does optmizing the index close it?
> >
>
> Optimization merges the segments of the index into one big segment. So it
> will reduce the number of files. However, during the merge it may create
> many more files. The old files after the merge are cleanup by Lucene in a
> while (unless you have changed the defaults in the IndexDeletionPolicy
> section in solrconfig.xml).
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: spellcheck /too many open files

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Jun 9, 2009 at 2:56 PM, revas <re...@gmail.com> wrote:

> But the spell check componenet uses the n-gram analyzer and henc should
> work
> for any language ,is this correct ,also we can refer an extern dictionary
> for suggestions ,could this be in any language?
>

Yes it does use n-grams but there's an analysis step before the n-grams are
created. For example, if you are creating your spell check index from a Solr
field, SpellCheckComponent uses that field's index time analyzer. So you
should create your language-specific fields in such a way that the analysis
works correctly for that language.

> The open files is not because of spell check as we have not yet implemented
> this yet, every time we restart solr we need to up the ulimit ,otherwise it
> does not work,so is there any workaround to permanently close this open
> files ,does optmizing the index close it?
>

Optimization merges the segments of the index into one big segment. So it
will reduce the number of files. However, during the merge it may create
many more files. The old files after the merge are cleanup by Lucene in a
while (unless you have changed the defaults in the IndexDeletionPolicy
section in solrconfig.xml).

-- 
Regards,
Shalin Shekhar Mangar.

Re: spellcheck /too many open files

Posted by revas <re...@gmail.com>.

But the spell check componenet uses the n-gram analyzer and henc should work
for any language ,is this correct ,also we can refer an extern dictionary
for suggestions ,could this be in any language?

The open files is not because of spell check as we have not yet implemented
this yet, every time we restart solr we need to up the ulimit ,otherwise it
does not work,so is there any workaround to permanently close this open
files ,does optmizing the index close it?

Regards
Sujatha

On Tue, Jun 9, 2009 at 12:53 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 11:15 AM, revasHi <re...@gmail.com> wrote:
>
> >
> > 1)Does the spell check component support all languages?
> >
>
> SpellCheckComponent relies on Lucene/Solr analyzers and tokenizers. So if
> you can find an analyzer/tokenizer for your language, spell checker can
> work.
>
>
> > 2) I have a scnenario where i have abt 20 webapps in  a single
> container.We
> > get too many open files at index time /while restarting tomcat.
>
>
> Is that because of SpellCheckComponent?
>
>
> > The mergefactor is at default.
> >
> > If i reduce the merge factor to 2 and optimize the index ,will the open
> > files be closed automatically or would i have to reindex to close the
> open
> > files or  how do i close the already opened files.This is on linux with
> > solr
> > 1.3 and tomcat 5.5
> >
>
> Lucene/Solr does not keep any file opened longer than it is necessary. But
> decreasing merge factor should help. You can also increase the open file
> limit on your system.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: spellcheck /too many open files

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Jun 9, 2009 at 11:15 AM, revas <re...@gmail.com> wrote:

>
> 1)Does the spell check component support all languages?
>

SpellCheckComponent relies on Lucene/Solr analyzers and tokenizers. So if
you can find an analyzer/tokenizer for your language, spell checker can
work.

> 2) I have a scnenario where i have abt 20 webapps in  a single container.We
> get too many open files at index time /while restarting tomcat.

Is that because of SpellCheckComponent?

> The mergefactor is at default.
>
> If i reduce the merge factor to 2 and optimize the index ,will the open
> files be closed automatically or would i have to reindex to close the open
> files or  how do i close the already opened files.This is on linux with
> solr
> 1.3 and tomcat 5.5
>

Lucene/Solr does not keep any file opened longer than it is necessary. But
decreasing merge factor should help. You can also increase the open file
limit on your system.

-- 
Regards,
Shalin Shekhar Mangar.