You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phillip Farber <pf...@umich.edu> on 2009/09/28 16:57:12 UTC

Writing optimized index to different storage?

Is it possible to tell Solr or Lucene, when optimizing, to write the 
files that constitute the optimized index to somewhere other than 
SOLR_HOME/data/index or is there something about the optimize that 
requires the final segment to be created in SOLR_HOME/data/index?

Thanks,

Phil


Re: Writing optimized index to different storage?

Posted by Phillip Farber <pf...@umich.edu>.
Sorry, I should have given more background. We have, at the moment 3.8 
million documents of 0.7MB/doc average so we have extremely large 
shards.  We build about 400,000 documents to a shard resulting 
200GB/shard.  We are also using LVM snapshots to manage a snapshot of 
the shard which we serve while we continue to build.

In order to optimize the building shard of around 200GB we need 400GB of 
  disk space to allow for 2x size increase. Due to the nature of 
snapshotting, the volume containing the snapshot has to be as large as 
the build volume, i.e. 400GB.

If we could write the optimized build shard elsewhere instead of "in 
place" we could avoid the need for the serving volume to match the size 
of the building volume.

We'd like to avoid the need to have 200GB+ hanging around just to 
optimize.

Responses we got on whether writing "elsewhere" optimize make it clear 
that's not a solution.

I posted another question to the list just a bit ago asking whether 
mergefactor=1 would give us a single segment index that is always 
optimized so that we don't have the 2x overhead.

However, running a build with merge factor=1 shows that lots of segments 
get created/merged and that the index grows in size but shrinks at 
intervals to a degree too.  It is not clear how big the index is at any 
point in time.


Chris Hostetter wrote:
> : Is it possible to tell Solr or Lucene, when optimizing, to write the files
> : that constitute the optimized index to somewhere other than
> : SOLR_HOME/data/index or is there something about the optimize that requires
> : the final segment to be created in SOLR_HOME/data/index?
> 
> 	For what purpose?
> 
> http://people.apache.org/~hossman/#xyproblem
> XY Problem
> 
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
> 
> 
> 
> 
> -Hoss
> 
> 

Re: Writing optimized index to different storage?

Posted by Chris Hostetter <ho...@fucit.org>.
: Is it possible to tell Solr or Lucene, when optimizing, to write the files
: that constitute the optimized index to somewhere other than
: SOLR_HOME/data/index or is there something about the optimize that requires
: the final segment to be created in SOLR_HOME/data/index?

	For what purpose?

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss


Re: Writing optimized index to different storage?

Posted by Phillip Farber <pf...@umich.edu>.
Thanks to all for thinking about this question.  Otis: could you say a 
bit more about per segment readers.  This is new to me.

I gather that there is a way to specify that the number of readers 
should correspond (or automatically correspond) to the number of segments?

I suppose this gives each reader a set of smaller files to process and 
there is some sort of result merge over readers to produce a final result?

So you gain through use of parallelism in a multi cpu environment?

Under what conditions can this perform as well as a single reader on an 
optimized index?

Phil


Otis Gospodnetic wrote:
> That's right.  mergeFactor=1 is an even more extreme case.  However, with the new per-segment readers, having an optimized index is no longer the best index state to go for in some cases.
> 
>  Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> ----- Original Message ----
>> From: Lance Norskog <go...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Monday, September 28, 2009 2:42:29 PM
>> Subject: Re: Writing optimized index to different storage?
>>
>> The optimize operation happens in place.
>>
>> I've been told that if you set "mergeFactor=2" when indexing, it will
>> be slower but you will always have a "mostly optimized" index.
>>
>> On Mon, Sep 28, 2009 at 10:22 AM, Jason Rutherglen
>> wrote:
>>> Hmm... Interesting question, not that I know of. The only way
>>> one could do this would be to intercept the newly optimized
>>> files via a FileSwitchDirectory like implementation that knows
>>> which new files are optimized and should "underneath" go to a
>>> different physical path.
>>>
>>> On Mon, Sep 28, 2009 at 7:57 AM, Phillip Farber wrote:
>>>> Is it possible to tell Solr or Lucene, when optimizing, to write the files
>>>> that constitute the optimized index to somewhere other than
>>>> SOLR_HOME/data/index or is there something about the optimize that requires
>>>> the final segment to be created in SOLR_HOME/data/index?
>>>>
>>>> Thanks,
>>>>
>>>> Phil
>>>>
>>>>
>>
>>
>> -- 
>> Lance Norskog
>> goksron@gmail.com
> 

Re: Writing optimized index to different storage?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
That's right.  mergeFactor=1 is an even more extreme case.  However, with the new per-segment readers, having an optimized index is no longer the best index state to go for in some cases.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Lance Norskog <go...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, September 28, 2009 2:42:29 PM
> Subject: Re: Writing optimized index to different storage?
> 
> The optimize operation happens in place.
> 
> I've been told that if you set "mergeFactor=2" when indexing, it will
> be slower but you will always have a "mostly optimized" index.
> 
> On Mon, Sep 28, 2009 at 10:22 AM, Jason Rutherglen
> wrote:
> > Hmm... Interesting question, not that I know of. The only way
> > one could do this would be to intercept the newly optimized
> > files via a FileSwitchDirectory like implementation that knows
> > which new files are optimized and should "underneath" go to a
> > different physical path.
> >
> > On Mon, Sep 28, 2009 at 7:57 AM, Phillip Farber wrote:
> >>
> >> Is it possible to tell Solr or Lucene, when optimizing, to write the files
> >> that constitute the optimized index to somewhere other than
> >> SOLR_HOME/data/index or is there something about the optimize that requires
> >> the final segment to be created in SOLR_HOME/data/index?
> >>
> >> Thanks,
> >>
> >> Phil
> >>
> >>
> >
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com


Re: Writing optimized index to different storage?

Posted by Lance Norskog <go...@gmail.com>.
The optimize operation happens in place.

I've been told that if you set "mergeFactor=2" when indexing, it will
be slower but you will always have a "mostly optimized" index.

On Mon, Sep 28, 2009 at 10:22 AM, Jason Rutherglen
<ja...@gmail.com> wrote:
> Hmm... Interesting question, not that I know of. The only way
> one could do this would be to intercept the newly optimized
> files via a FileSwitchDirectory like implementation that knows
> which new files are optimized and should "underneath" go to a
> different physical path.
>
> On Mon, Sep 28, 2009 at 7:57 AM, Phillip Farber <pf...@umich.edu> wrote:
>>
>> Is it possible to tell Solr or Lucene, when optimizing, to write the files
>> that constitute the optimized index to somewhere other than
>> SOLR_HOME/data/index or is there something about the optimize that requires
>> the final segment to be created in SOLR_HOME/data/index?
>>
>> Thanks,
>>
>> Phil
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Writing optimized index to different storage?

Posted by Jason Rutherglen <ja...@gmail.com>.
Hmm... Interesting question, not that I know of. The only way
one could do this would be to intercept the newly optimized
files via a FileSwitchDirectory like implementation that knows
which new files are optimized and should "underneath" go to a
different physical path.

On Mon, Sep 28, 2009 at 7:57 AM, Phillip Farber <pf...@umich.edu> wrote:
>
> Is it possible to tell Solr or Lucene, when optimizing, to write the files
> that constitute the optimized index to somewhere other than
> SOLR_HOME/data/index or is there something about the optimize that requires
> the final segment to be created in SOLR_HOME/data/index?
>
> Thanks,
>
> Phil
>
>