You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by stockiii <st...@gmail.com> on 2010/11/04 14:22:32 UTC

Optimize Index

Hello.

My Index have ~30 Million documents and a optimize=true is very heavy. it
takes looooong time ... 

how can i start an optimize by using DIH, but NOT after an delta- or
full-import ? 

i set my index to compound-index.

thx
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Optimize-Index-tp1841499p1841499.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Optimize Index

Posted by Erick Erickson <er...@gmail.com>.
no, you didn't miss anything. The comment at Lucen Revolution was more
along the lines that optimize didn't actually improve much #absent# deletes.

Plus, on a significant size corpus, the doc frequencies won't changed that
much by deleting documents, but that's a case-by-case thing

Best
Erick

On Thu, Nov 4, 2010 at 4:31 PM, Markus Jelsma <ma...@openindex.io>wrote:

> Huh? That's something new for me. Optmize removed documents that have been
> flagged for deletion. For relevancy it's important those are removed
> because
> document frequencies are not updated for deletes.
>
> Did i miss something?
>
> > For what it's worth, the Solr class instructor at the Lucene Revolution
> > conference recommended *against* optimizing, and instead suggested to
> just
> > let the merge factor do it's job.
> >
> > On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey <so...@elyograg.org> wrote:
> > > On 11/4/2010 7:22 AM, stockiii wrote:
> > >> how can i start an optimize by using DIH, but NOT after an delta- or
> > >> full-import ?
> > >
> > > I'm not aware of a way to do this with DIH, though there might be
> > > something I'm not aware of.  You can do it with an HTTP POST.  Here's
> > > how to do it with curl:
> > >
> > > /usr/bin/curl "http://HOST:PORT/solr/CORE/update" \
> > > -H "Content-Type: text/xml" \
> > > --data-binary '<optimize waitFlush="true" waitSearcher="true"/>'
> > >
> > > Shawn
>

Re: Optimize Index

Posted by Markus Jelsma <ma...@openindex.io>.
Huh? That's something new for me. Optmize removed documents that have been 
flagged for deletion. For relevancy it's important those are removed because 
document frequencies are not updated for deletes.

Did i miss something?

> For what it's worth, the Solr class instructor at the Lucene Revolution
> conference recommended *against* optimizing, and instead suggested to just
> let the merge factor do it's job.
> 
> On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey <so...@elyograg.org> wrote:
> > On 11/4/2010 7:22 AM, stockiii wrote:
> >> how can i start an optimize by using DIH, but NOT after an delta- or
> >> full-import ?
> > 
> > I'm not aware of a way to do this with DIH, though there might be
> > something I'm not aware of.  You can do it with an HTTP POST.  Here's
> > how to do it with curl:
> > 
> > /usr/bin/curl "http://HOST:PORT/solr/CORE/update" \
> > -H "Content-Type: text/xml" \
> > --data-binary '<optimize waitFlush="true" waitSearcher="true"/>'
> > 
> > Shawn

Re: Optimize Index

Posted by Rich Cariens <ri...@gmail.com>.
For what it's worth, the Solr class instructor at the Lucene Revolution
conference recommended *against* optimizing, and instead suggested to just
let the merge factor do it's job.

On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 11/4/2010 7:22 AM, stockiii wrote:
>
>> how can i start an optimize by using DIH, but NOT after an delta- or
>> full-import ?
>>
>
> I'm not aware of a way to do this with DIH, though there might be something
> I'm not aware of.  You can do it with an HTTP POST.  Here's how to do it
> with curl:
>
> /usr/bin/curl "http://HOST:PORT/solr/CORE/update" \
> -H "Content-Type: text/xml" \
> --data-binary '<optimize waitFlush="true" waitSearcher="true"/>'
>
> Shawn
>
>

Re: Optimize Index

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/4/2010 7:22 AM, stockiii wrote:
> how can i start an optimize by using DIH, but NOT after an delta- or
> full-import ?

I'm not aware of a way to do this with DIH, though there might be 
something I'm not aware of.  You can do it with an HTTP POST.  Here's 
how to do it with curl:

/usr/bin/curl "http://HOST:PORT/solr/CORE/update" \
-H "Content-Type: text/xml" \
--data-binary '<optimize waitFlush="true" waitSearcher="true"/>'

Shawn


Re: Optimize Index

Posted by Peter Karich <pe...@yahoo.de>.
  what you can try maxSegments=2 or more as a 'partial' optimize:

"If the index is so large that optimizes are taking longer than desired 
or using more disk space during optimization than you can spare, 
consider adding the maxSegments parameter to the optimize command. In 
the XML message, this would be an attribute; the URL form and SolrJ have 
the corresponding option too. By default this parameter is 1 since an 
optimize results in a single Lucene "segment". By setting it larger than 
1 but less than the mergeFactor, you permit partial optimization to no 
more than this many segments. Of course the index won't be fully 
optimized and therefore searches will be slower. "

from http://wiki.apache.org/solr/PacktBook2009 (I only found that link 
there must be sth. on the real wiki for the maxSegments parameter ...)

> Hello.
>
> My Index have ~30 Million documents and a optimize=true is very heavy. it
> takes looooong time ...
>
> how can i start an optimize by using DIH, but NOT after an delta- or
> full-import ?
>
> i set my index to compound-index.
>
> thx


-- 
http://jetwick.com twitter search prototype