You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/09/22 16:20:40 UTC

[GitHub] [lucene-solr] ErickErickson commented on pull request #1733: LUCENE-9450 Use BinaryDocValues in the taxonomy writer

ErickErickson commented on pull request #1733:
URL: https://github.com/apache/lucene-solr/pull/1733#issuecomment-696828850


   
   
   > On Sep 22, 2020, at 11:04 AM, Michael McCandless <no...@github.com> wrote:
   > 
   > 
   > So I propose we get rid of the fullPathField altogether.
   > 
   > Wow, +1, this looks like it is (pre-existingly?) double-indexed? Maybe we should do this as a separate pre-cursor PR to this one (switch to StoredField when indexing the fullPathField)?
   > 
   > For maintaining backwards compatibility, we can read facet labels from new BinaryDocValues field, falling back to old StoredField if BinaryDocValues field does not exist or has no value for the docId. The performance penalty of doing so should be acceptable.
   > 
   > Yeah +1 to, on a hit by hit basis, try BinaryDocValues first, and then fallback to the StoredField. This is the cost of backwards compatibility ... though, for a fully new (all BinaryDocValues) index, the performance should be fine. Also, note that in Lucene 10.x we can remove that back-compat fallback.
   > 
   > Alternatively we can implement a special merge policy that takes care of moving data from old Stored field to BinaryDocValues field at the time of merge but that might be tricky to implement.
   > 
   > I think this would indeed be tricky.
   
   Andrzej and I spent quite a bit of time trying to get something similar to work for adding docValues on the fly using a custom merge policy. We realized that you could create a docValues field from an indexed field for primitive types since all the information was already in the index. We never could get it working if there was active indexing happening, so resorted to a batch process that rewrote all segments doing the transformation along the way that had to be run on a quiescent index, the client decided that was good enough and didn’t want to spend more time on it.
   
   Our best guess was that there was a race condition that we somehow couldn’t find in the time allowed… Mostly just FYI...
   
   FWIW,
   Erick
   
   > 
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org