You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by CrazyDiamond <Cr...@mail.ru> on 2015/09/23 03:01:01 UTC

is there a way to remove deleted documents from index without optimize

my index is updating frequently and i need to remove unused documents from
index after update/reindex.
Optimizaion is very expensive so what should i do?



--
View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there a way to remove deleted documents from index without optimize

Posted by Harry Yoo <hy...@gmail.com>.
Thanks for the clarification. 

I use 

<config>
	<luceneMatchVersion>${lucene.version}</luceneMatchVersion>

in the solrconfig.xml  and pass -Dlucene.version when I launch solr, to keep the versions.



> On Oct 12, 2017, at 11:01 PM, Erick Erickson <er...@gmail.com> wrote:
> 
> You can use the IndexUpgradeTool that ships with each version of Solr
> (well, actually Lucene) to, well, upgrade your index. So you can use
> the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the
> one that ships with 6x to upgrade from 5x. etc.
> 
> That said, none of that is necessary _if_ you
>> have the Lucene version in solrconfig.xml be the one that corresponds to your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion of 6something.
>> you update your index enough to rewrite all segments before moving to the _next_ version. When Lucene sees merges a segment, it writes the new segment according to the luceneMatchVersion in solrconfig.xml. So as long as you are on a version long enough for all segments to be merged into new segments, you don't have to worry.
> 
> Best,
> Erick
> 
> On Thu, Oct 12, 2017 at 8:29 PM, Harry Yoo <hy...@gmail.com> wrote:
>> I should have read this. My project has been running from apache solr 4.x, and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take care of old version indexes as well? I wanted to make sure my indexes are updated with 6.x lucence version so that it will be supported when i move to solr 7.x
>> 
>> Is there any best practice managing solr indexes?
>> 
>> Harry
>> 
>>> On Sep 22, 2015, at 8:21 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>>> 
>>> Don’t do anything. Solr will automatically clean up the deleted documents for you.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <Cr...@mail.ru> wrote:
>>>> 
>>>> my index is updating frequently and i need to remove unused documents from
>>>> index after update/reindex.
>>>> Optimizaion is very expensive so what should i do?
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 


Re: is there a way to remove deleted documents from index without optimize

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/12/2017 10:01 PM, Erick Erickson wrote:
> You can use the IndexUpgradeTool that ships with each version of Solr
> (well, actually Lucene) to, well, upgrade your index. So you can use
> the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the
> one that ships with 6x to upgrade from 5x. etc.
>
> That said, none of that is necessary _if_ you
>> have the Lucene version in solrconfig.xml be the one that corresponds to your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion of 6something.
>> you update your index enough to rewrite all segments before moving to the _next_ version. When Lucene sees merges a segment, it writes the new segment according to the luceneMatchVersion in solrconfig.xml. So as long as you are on a version long enough for all segments to be merged into new segments, you don't have to worry.

As far as I am aware, luceneMatchVersion in Solr will not change the
segment format, but only how some Lucene components (primarily analysis)
function.  Have I got incorrect information?

Something else that might be worth mentioning:  The IndexUpgrader is an
fairly simple piece of code.  It runs forceMerge (optimize) on the
index, creating a single new segment from the entire existing index. 
That ties into this thread's initial subject and LUCENE-7976.  I wonder
if perhaps the upgrade merge policy should be changed so that it just
rewrites all existing segments instead of fully merging them.

Thanks,
Shawn


Re: is there a way to remove deleted documents from index without optimize

Posted by Erick Erickson <er...@gmail.com>.
You can use the IndexUpgradeTool that ships with each version of Solr
(well, actually Lucene) to, well, upgrade your index. So you can use
the IndexUpgradeTool that ships with 5x to upgrade from 4x. And the
one that ships with 6x to upgrade from 5x. etc.

That said, none of that is necessary _if_ you
> have the Lucene version in solrconfig.xml be the one that corresponds to your current Solr. I.e. a solrconfig for 6x should have a luceneMatchVersion of 6something.
> you update your index enough to rewrite all segments before moving to the _next_ version. When Lucene sees merges a segment, it writes the new segment according to the luceneMatchVersion in solrconfig.xml. So as long as you are on a version long enough for all segments to be merged into new segments, you don't have to worry.

Best,
Erick

On Thu, Oct 12, 2017 at 8:29 PM, Harry Yoo <hy...@gmail.com> wrote:
> I should have read this. My project has been running from apache solr 4.x, and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take care of old version indexes as well? I wanted to make sure my indexes are updated with 6.x lucence version so that it will be supported when i move to solr 7.x
>
> Is there any best practice managing solr indexes?
>
> Harry
>
>> On Sep 22, 2015, at 8:21 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>>
>> Don’t do anything. Solr will automatically clean up the deleted documents for you.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <Cr...@mail.ru> wrote:
>>>
>>> my index is updating frequently and i need to remove unused documents from
>>> index after update/reindex.
>>> Optimizaion is very expensive so what should i do?
>>>
>>>
>>>
>>> --
>>> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>

Re: is there a way to remove deleted documents from index without optimize

Posted by Harry Yoo <hy...@gmail.com>.
I should have read this. My project has been running from apache solr 4.x, and moved to 5.x and recently migrated to 6.6.1. Do you think solr will take care of old version indexes as well? I wanted to make sure my indexes are updated with 6.x lucence version so that it will be supported when i move to solr 7.x

Is there any best practice managing solr indexes?

Harry

> On Sep 22, 2015, at 8:21 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> Don’t do anything. Solr will automatically clean up the deleted documents for you.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <Cr...@mail.ru> wrote:
>> 
>> my index is updating frequently and i need to remove unused documents from
>> index after update/reindex.
>> Optimizaion is very expensive so what should i do?
>> 
>> 
>> 
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: is there a way to remove deleted documents from index without optimize

Posted by Walter Underwood <wu...@wunderwood.org>.
Don’t do anything. Solr will automatically clean up the deleted documents for you.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <Cr...@mail.ru> wrote:
> 
> my index is updating frequently and i need to remove unused documents from
> index after update/reindex.
> Optimizaion is very expensive so what should i do?
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there a way to remove deleted documents from index without optimize

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
Avoid optimize like the plague.

Instead focus on tuning the segment merging process. As you commit index
files, segments are created. But they're periodically merged. Merging
removes remnants of the tombstoned docs.  You can optimize this, tune it,
etc. If you're dealing with a lot of updates, this is something you
definitely want to tune.  See this document, scroll down to the merge
parameters.
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig

There are other options for dealing with high update speed. You could shard
SolrCloud further and minimize replication. You could put things in Kafka
and work through them as you can, catching if you have any slow time. You
can tune your hard and soft commits to create segments of an appropriate
size, etc.

-Doug



On Tue, Sep 22, 2015 at 9:01 PM, CrazyDiamond <Cr...@mail.ru> wrote:

> my index is updating frequently and i need to remove unused documents from
> index after update/reindex.
> Optimizaion is very expensive so what should i do?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/is-there-a-way-to-remove-deleted-documents-from-index-without-optimize-tp4230691.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.