You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "girish.vignesh" <gi...@gmail.com> on 2018/04/12 11:53:38 UTC

Solr still gives old data while faceting from the deleted documents

Solr gives old data while faceting from old deleted or updated documents.

For example we are doing faceting on name. name changes frequently for our
application. When we index the document after changing the name we get both
old name and new name in the search results. After digging more on this I
got to know that Solr indexes are composed of segments (write once) and each
segment contains set of documents. Whenever hard commit happens these
segments will be closed and even if a document is deleted after that it will
still have those documents (which will be marked as deleted). These
documents will not be cleared immediately. It will not be displayed in the
search result though, but somehow faceting is still able to access those
data.

Optimizing fixed this issue. But we cannot perform this each time customer
changes data on production. I tried below options and that did not work for
me.

1) *expungeDeletes*.

Added this line below in solrconfig.xml

<autoCommit>
  <maxTime>30000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>



<autoSoftCommit>
  <maxTime>10000</maxTime>
</autoSoftCommit>

<commit waitSearcher="false" expungeDeletes="true"/>  // This is not
working.

I do not think I can add expungeDeletes configuration like this. When I make
expungeDeletes call using curl command its merging the segments.

2) Using *TieredMergePolicyFactory* might not help me as the threshold might
not reach always and user will see old data during this time.

3) One more way of doing it is calling *optimize*() method which is exposed
in solrj daily once. But not sure what impact this will have on performance.

4) Tried manipulating filterCache, documentCache and queryResultCache. I do
not think whatever the issue I am facing is because of these caches. 

Number of documents we index per server will be maximum 2M-3M.

Please suggest if there is any solution to this.

Let me know if more data needed.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr still gives old data while faceting from the deleted documents

Posted by Erick Erickson <er...@gmail.com>.
expungeDeletes wont' do the trick for you, it purges documents in
segments with > 10% deleted docs so you'll still have documents.

I'd push back on "the requirement is to show facets with 0 count as
disabled." Why? What use-case is satisfied here? Effectively this is
saying "For my query, show me possible values that have no hits for
that query". Optimize is a very costly operation and to really get
this behavior you'll need to run it _every_ time the index changes.
You really can't afford to run it for every update, so there'll be a
period of time when you will still get these facets.

Eventually you won't be displaying zero-count facets anyway, assuming
that you have room for, say, only 10 facets and sort by frequency.

If your index changes only periodically (say once a day) that may be
fine. But more often than that and you won't be able to satisfy the
requirement anyway.

My point is that requirements like this are often created without
understanding the consequences and cause a lot of effort to be
expended without a good purpose. See:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

On Thu, Apr 12, 2018 at 10:32 PM, girish.vignesh
<gi...@gmail.com> wrote:
> mincount will fix this issue for sure. I have tried that but the requirement
> is to show facets with 0 count as disabled.
>
> I think I left with only 2 options. Either go with expungeDelets with update
> URL or use optimize in a scheduler.
>
> Regards,
> Vignesh
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr still gives old data while faceting from the deleted documents

Posted by "girish.vignesh" <gi...@gmail.com>.
mincount will fix this issue for sure. I have tried that but the requirement
is to show facets with 0 count as disabled. 

I think I left with only 2 options. Either go with expungeDelets with update
URL or use optimize in a scheduler.

Regards,
Vignesh



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr still gives old data while faceting from the deleted documents

Posted by Shawn Heisey <el...@elyograg.org>.
On 4/12/2018 5:53 AM, girish.vignesh wrote:
> Solr gives old data while faceting from old deleted or updated documents.
>
> For example we are doing faceting on name. name changes frequently for our
> application. When we index the document after changing the name we get both
> old name and new name in the search results. After digging more on this I
> got to know that Solr indexes are composed of segments (write once) and each
> segment contains set of documents. Whenever hard commit happens these
> segments will be closed and even if a document is deleted after that it will
> still have those documents (which will be marked as deleted). These
> documents will not be cleared immediately. It will not be displayed in the
> search result though, but somehow faceting is still able to access those
> data.

If all documents with that term are deleted, then this will be fixed by 
adding a facet.mincount=1 parameter to your facet URL.  If you are using 
the JSON facet API, then there is a mincount parameter that you can 
place into your JSON request. I've never actually used the JSON facet 
API, but there is documentation:

https://lucene.apache.org/solr/guide/7_2/json-facet-api.html#TermsFacet

The mincount parameter might make it unnecessary to optimize.  But if 
you are updating a LOT of your documents on a regular basis, you might 
find that it gives you better performance, so optimizing once a day 
during a time when traffic is low might be useful.

Thanks,
Shawn