You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Don Bosco Durai <bo...@apache.org> on 2015/12/29 21:54:31 UTC

Facet shows deleted values...

I am purging some of my data on regular basis, but when I run a facet query, the deleted values are still shown in the facet list.

Seems, commit with expunge resolves this issue (http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields ). But it seems, commit is no more recommended. Also, I am running Solr 5.2 in SolrCloud mode.

What is the recommendation here?

Thanks

Bosco



Re: Facet shows deleted values...

Posted by Erick Erickson <er...@gmail.com>.
bq:  And I also read somewhere that explicit commit is not recommended
in SolrCloud mode

Not quite, it's just easy to have too many commits happen too
frequently from multiple
indexing clients. It's also rare that the benefits of the clients
issuing commits outweighs
the chance of getting it wrong. It's not so much it's not recommended
as usually not at all
necessary and easy to get wrong.

Best,
Erick



On Mon, Jan 4, 2016 at 5:15 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 1/4/2016 4:11 PM, Don Bosco Durai wrote:
>> Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. And I also read somewhere that explicit commit is not recommended in SolrCloud mode. Regarding auto warm, my server has/was been running for a while.
>
> Since 4.0, autoCommit with openSearcher set to false is highly
> recommended, no matter what your needs are regarding visibility, and
> whether or not you're running in cloud mode.  The exact interval to use
> is a subject for vigorous debate.  A common maxTime value that you will
> see for autoCommit is 15 seconds (15000).  I personally feel this is too
> frequent, but many people use that value with no problems.  I use five
> minutes (300000) in my own config, but over the course of those five
> minutes, there's not much in the way of updates, so the log replay will
> take very little time.  Using autoCommit with openSearcher set to false
> takes care of transaction log rotation, it doesn't do ANYTHING for
> document visibility.
>
> The issue of how to handle document visibility will depend on exactly
> how you use your index.  Do not worry about whether the index is
> SolrCloud or not for this topic.
>
> One way of handling document visibility is to use autoSoftCommit
> (available since 4.0) in your config ... with maxTime set to the longest
> possible interval you can stand.  My personal recommendation is to never
> set that interval shorter than one minute (60000).  Push back if you are
> told that documents must be visible faster than that.  If you use
> autoSoftCommit, you won't need explicit commits from your indexing
> application.
>
> Another way to handle document visibility is the commitWithin parameter
> on each update request.  This is similar to autoSoftCommit, but gets set
> on the update request.  Just like autoSoftCommit, I would not recommend
> a value less than one minute, and if this parameter is used on all
> updates, you will never need an explicit commit.
>
> Using autoSoftCommit or commitWithin is a good option if there are many
> clients/threads sending changes to the same index or the indexing
> happens in bursts where the update size is wildly different and
> completely unpredictable.
>
> The final way to handle document visibility is explicit commits.  When
> you want changes to be visible, you send a commit, hard or soft, with
> openSearcher set to true (this is the default for this parameter), and a
> short time later, all changes sent before that commit will become
> visible.  This is how I handle my own index.  This is a good option if
> all indexing is coming from a single source and that source has complete
> control over all indexing operations.
>
> One of the strong goals with commits is to avoid them happening too
> frequently, so they don't overlap, and so the machine is spending less
> time handling commits than it spends either idle or handling queries.
>
> Here's a blog post with more detail.  The blog post says "SolrCloud" but
> almost all of it is equally applicable to Solr 4.x and 5.x indexes that
> are not running in cloud mode:
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks,
> Shawn
>

Re: Facet shows deleted values...

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/4/2016 4:11 PM, Don Bosco Durai wrote:
> Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. And I also read somewhere that explicit commit is not recommended in SolrCloud mode. Regarding auto warm, my server has/was been running for a while.

Since 4.0, autoCommit with openSearcher set to false is highly
recommended, no matter what your needs are regarding visibility, and
whether or not you're running in cloud mode.  The exact interval to use
is a subject for vigorous debate.  A common maxTime value that you will
see for autoCommit is 15 seconds (15000).  I personally feel this is too
frequent, but many people use that value with no problems.  I use five
minutes (300000) in my own config, but over the course of those five
minutes, there's not much in the way of updates, so the log replay will
take very little time.  Using autoCommit with openSearcher set to false
takes care of transaction log rotation, it doesn't do ANYTHING for
document visibility.

The issue of how to handle document visibility will depend on exactly
how you use your index.  Do not worry about whether the index is
SolrCloud or not for this topic.

One way of handling document visibility is to use autoSoftCommit
(available since 4.0) in your config ... with maxTime set to the longest
possible interval you can stand.  My personal recommendation is to never
set that interval shorter than one minute (60000).  Push back if you are
told that documents must be visible faster than that.  If you use
autoSoftCommit, you won't need explicit commits from your indexing
application.

Another way to handle document visibility is the commitWithin parameter
on each update request.  This is similar to autoSoftCommit, but gets set
on the update request.  Just like autoSoftCommit, I would not recommend
a value less than one minute, and if this parameter is used on all
updates, you will never need an explicit commit.

Using autoSoftCommit or commitWithin is a good option if there are many
clients/threads sending changes to the same index or the indexing
happens in bursts where the update size is wildly different and
completely unpredictable.

The final way to handle document visibility is explicit commits.  When
you want changes to be visible, you send a commit, hard or soft, with
openSearcher set to true (this is the default for this parameter), and a
short time later, all changes sent before that commit will become
visible.  This is how I handle my own index.  This is a good option if
all indexing is coming from a single source and that source has complete
control over all indexing operations.

One of the strong goals with commits is to avoid them happening too
frequently, so they don't overlap, and so the machine is spending less
time handling commits than it spends either idle or handling queries.

Here's a blog post with more detail.  The blog post says "SolrCloud" but
almost all of it is equally applicable to Solr 4.x and 5.x indexes that
are not running in cloud mode:

http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks,
Shawn


Re: Facet shows deleted values...

Posted by Don Bosco Durai <bo...@apache.org>.
Tomás, thanks for the suggestion. facet.mincount will solve my issue.


Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. And I also read somewhere that explicit commit is not recommended in SolrCloud mode. Regarding auto warm, my server has/was been running for a while.

Lost my env during the holidays. I will rebuild it and monitor this further. I will also try to explicit commit() to see if that helps.

Thanks

Bosco





On 12/29/15, 5:48 PM, "Tomás Fernández Löbbe" <to...@gmail.com> wrote:

>I believe the problem here is that terms from the deleted docs still appear
>in the facets, even with a doc count of 0, is that it? Can you use
>facet.mincount=1 or would that not be a good fit for your use case?
>
>https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.mincountParameter
>
>Tomás
>
>On Tue, Dec 29, 2015 at 5:23 PM, Erick Erickson <er...@gmail.com>
>wrote:
>
>> Let's be sure we're using terms similarly....
>>
>> That article is from 2010, so is unreliable in the 5.2 world, I'd ignore
>> that.
>>
>> First, facets should always reflect the latest commit, regardless of
>> expungeDeletes or optimizes/forcemerges.
>>
>> _commits_ are definitely recommended. Optimize/forcemerge (or
>> expungedeletes) are rarely necessary and
>> should _not_ be necessary for facets to not count omitted documents.
>>
>> Is it possible that your autowarm period is long and you're still
>> getting an old searcher when you run your tests?
>>
>> Assuming that you commit(), then wait a few minutes, do you see
>> inaccurate facets? If so, what are the
>> exact steps you follow?
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai <bo...@apache.org>
>> wrote:
>> > I am purging some of my data on regular basis, but when I run a facet
>> query, the deleted values are still shown in the facet list.
>> >
>> > Seems, commit with expunge resolves this issue (
>> http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields
>> ). But it seems, commit is no more recommended. Also, I am running Solr 5.2
>> in SolrCloud mode.
>> >
>> > What is the recommendation here?
>> >
>> > Thanks
>> >
>> > Bosco
>> >
>> >
>>


Re: Facet shows deleted values...

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
I believe the problem here is that terms from the deleted docs still appear
in the facets, even with a doc count of 0, is that it? Can you use
facet.mincount=1 or would that not be a good fit for your use case?

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.mincountParameter

Tomás

On Tue, Dec 29, 2015 at 5:23 PM, Erick Erickson <er...@gmail.com>
wrote:

> Let's be sure we're using terms similarly....
>
> That article is from 2010, so is unreliable in the 5.2 world, I'd ignore
> that.
>
> First, facets should always reflect the latest commit, regardless of
> expungeDeletes or optimizes/forcemerges.
>
> _commits_ are definitely recommended. Optimize/forcemerge (or
> expungedeletes) are rarely necessary and
> should _not_ be necessary for facets to not count omitted documents.
>
> Is it possible that your autowarm period is long and you're still
> getting an old searcher when you run your tests?
>
> Assuming that you commit(), then wait a few minutes, do you see
> inaccurate facets? If so, what are the
> exact steps you follow?
>
> Best,
> Erick
>
> On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai <bo...@apache.org>
> wrote:
> > I am purging some of my data on regular basis, but when I run a facet
> query, the deleted values are still shown in the facet list.
> >
> > Seems, commit with expunge resolves this issue (
> http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields
> ). But it seems, commit is no more recommended. Also, I am running Solr 5.2
> in SolrCloud mode.
> >
> > What is the recommendation here?
> >
> > Thanks
> >
> > Bosco
> >
> >
>

Re: Facet shows deleted values...

Posted by Erick Erickson <er...@gmail.com>.
Let's be sure we're using terms similarly....

That article is from 2010, so is unreliable in the 5.2 world, I'd ignore that.

First, facets should always reflect the latest commit, regardless of
expungeDeletes or optimizes/forcemerges.

_commits_ are definitely recommended. Optimize/forcemerge (or
expungedeletes) are rarely necessary and
should _not_ be necessary for facets to not count omitted documents.

Is it possible that your autowarm period is long and you're still
getting an old searcher when you run your tests?

Assuming that you commit(), then wait a few minutes, do you see
inaccurate facets? If so, what are the
exact steps you follow?

Best,
Erick

On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai <bo...@apache.org> wrote:
> I am purging some of my data on regular basis, but when I run a facet query, the deleted values are still shown in the facet list.
>
> Seems, commit with expunge resolves this issue (http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields ). But it seems, commit is no more recommended. Also, I am running Solr 5.2 in SolrCloud mode.
>
> What is the recommendation here?
>
> Thanks
>
> Bosco
>
>