You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by rajini maski <ra...@gmail.com> on 2011/06/06 07:04:45 UTC

Applying synonyms increase the data size from MB to GBs

Applying synonyms increased the data size from 28 mb to 10.3 gb

   Before enabling synonyms to the a field , the data size was 28mb.  Now ,
after applying synonyms I see that data folder size has increased to 10.3
gb.

Attached is schema field type for that field:


 <fieldType name="textBODY" class="solr.TextField"
positionIncrementGap="100" >
      <analyzer>
        <filter class="solr.SynonymFilterFactory"
synonyms="BODYTaxonomy.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.SynonymFilterFactory" synonyms="ObsTaxo.txt"
ignoreCase="true" expand="true"/>
       <filter class="solr.SynonymFilterFactory" synonyms="MTaxonomy.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.SynonymFilterFactory" synonyms="MicTaxo.txt"
ignoreCase="true" expand="true"/>
       <filter class="solr.SynonymFilterFactory" synonyms="SpTaxonomy.txt"
ignoreCase="true" expand="true"/>
       <filter class="solr.SynonymFilterFactory"
synonyms="ParameterTaxonomy.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.SynonymFilterFactory" synonyms="STaxo.txt"
ignoreCase="true" expand="true"/>
  <filter class="solr.LowerCaseFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
      </analyzer>
    </fieldType>

All the attached synonym files are not more than 200KB


What might be the reason for this? Any config changes to be done?



Regards

Rajani

Re: Applying synonyms increase the data size from MB to GBs

Posted by Erick Erickson <er...@gmail.com>.
Have you considered query-time expansion rather than index-time expansion?
In general this will lead to more complex queries, but smaller indexes.

Take a look at the analysis page available from the admin page to see exactly
what happens.

What is the high-legel problem you're trying to solve? Having this huge an
expansion in index size is pretty unusual, and I'm wondering if there might be
another approach to the problem...

Best
Erick

On Mon, Jun 6, 2011 at 6:19 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>> Is there a way where in I can apply all those file to same
>> tag with some
>> delimiter separated?
>>
>> like this:
>>         <filter
>> class="solr.SynonymFilterFactory"
>> synonyms="BODYTaxonomy.txt
>> , ClinicalObs.txt, MicTaxo.txt, SPTaxo.txt"
>> ignoreCase="true"
>> expand="true"/>
>
>
> Yes, you can perfectly feed multiple text files separated by comma to synonyms parameter.
>
> synonyms="BODYTaxonomy.txt,ClinicalObs.txt,MicTaxo.txt,SPTaxo.txt"
>

Re: Applying synonyms increase the data size from MB to GBs

Posted by pravesh <su...@yahoo.com>.
Since you r using expand="true" , so, every time a matching synonym entry is
found the analyzer is expanding the term with all synonyms set in the index.
This may cause the index to grow in size.

--
View this message in context: http://lucene.472066.n3.nabble.com/Applying-synonyms-increase-the-data-size-from-MB-to-GBs-tp3028700p3028877.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Applying synonyms increase the data size from MB to GBs

Posted by Ahmet Arslan <io...@yahoo.com>.
> Is there a way where in I can apply all those file to same
> tag with some
> delimiter separated?
> 
> like this:
>         <filter
> class="solr.SynonymFilterFactory"
> synonyms="BODYTaxonomy.txt
> , ClinicalObs.txt, MicTaxo.txt, SPTaxo.txt"
> ignoreCase="true"
> expand="true"/>


Yes, you can perfectly feed multiple text files separated by comma to synonyms parameter.

synonyms="BODYTaxonomy.txt,ClinicalObs.txt,MicTaxo.txt,SPTaxo.txt"

Re: Applying synonyms increase the data size from MB to GBs

Posted by rajini maski <ra...@gmail.com>.
   I have the flat files (synonym text files) each upto 200kb. Integrationg
all of them increased the txt file size to huge. And I wanted to maintain
them separately. So in order to apply all those synonyms to same field type
I created that many filter tags for respective synonym txt files.

Is it not the right way to do so?

Is there a way where in I can apply all those file to same tag with some
delimiter separated?

like this:

<fieldType name="textBODY" class="solr.TextField" positionIncrementGap="100"
>
      <analyzer>
        <filter class="solr.SynonymFilterFactory" synonyms="BODYTaxonomy.txt
, ClinicalObs.txt, MicTaxo.txt, SPTaxo.txt" ignoreCase="true"
expand="true"/>
              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
      </analyzer>
    </fieldType>




Rajani


On Mon, Jun 6, 2011 at 11:01 AM, Gora Mohanty <go...@mimirtech.com> wrote:

> On Mon, Jun 6, 2011 at 10:34 AM, rajini maski <ra...@gmail.com>
> wrote:
> > Applying synonyms increased the data size from 28 mb to 10.3 gb
> >
> >   Before enabling synonyms to the a field , the data size was 28mb.  Now
> ,
> > after applying synonyms I see that data folder size has increased to 10.3
> > gb.
> >
> > Attached is schema field type for that field:
> >
> >
> >  <fieldType name="textBODY" class="solr.TextField"
> > positionIncrementGap="100" >
> >      <analyzer>
> >        <filter class="solr.SynonymFilterFactory"
> > synonyms="BODYTaxonomy.txt" ignoreCase="true" expand="true"/>
> >       <filter class="solr.SynonymFilterFactory" synonyms="ObsTaxo.txt"
> > ignoreCase="true" expand="true"/>
> >       <filter class="solr.SynonymFilterFactory" synonyms="MTaxonomy.txt"
> > ignoreCase="true" expand="true"/>
> [...]
>
> Could you explain what you are trying to do with multiple
> SynonymFilterFactory
> filters applied to the field?
>
> Regards,
> Gora
>

Re: Applying synonyms increase the data size from MB to GBs

Posted by Gora Mohanty <go...@mimirtech.com>.
On Mon, Jun 6, 2011 at 10:34 AM, rajini maski <ra...@gmail.com> wrote:
> Applying synonyms increased the data size from 28 mb to 10.3 gb
>
>   Before enabling synonyms to the a field , the data size was 28mb.  Now ,
> after applying synonyms I see that data folder size has increased to 10.3
> gb.
>
> Attached is schema field type for that field:
>
>
>  <fieldType name="textBODY" class="solr.TextField"
> positionIncrementGap="100" >
>      <analyzer>
>        <filter class="solr.SynonymFilterFactory"
> synonyms="BODYTaxonomy.txt" ignoreCase="true" expand="true"/>
>       <filter class="solr.SynonymFilterFactory" synonyms="ObsTaxo.txt"
> ignoreCase="true" expand="true"/>
>       <filter class="solr.SynonymFilterFactory" synonyms="MTaxonomy.txt"
> ignoreCase="true" expand="true"/>
[...]

Could you explain what you are trying to do with multiple SynonymFilterFactory
filters applied to the field?

Regards,
Gora