You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/07/26 19:30:48 UTC

[GitHub] [lucene] gautamworah96 commented on pull request #220: LUCENE-9450: Use BinaryDocValue fields with a different name in the taxonomy index

gautamworah96 commented on pull request #220:
URL: https://github.com/apache/lucene/pull/220#issuecomment-886968851


   Changes in the new b9cbc4c commit:
   
   1. The reason why the `SegmentInfos.readLatestCommit(dir).getMinSegmentLuceneVersion()` call was returning 9 as the version, was that the older zip file in the mainline was using the Lucene 8.6 Codec but the major version variable was still assigned as 9. This was because the `main` branch in the repo (during the 8.6 release) had already set the major version as 9. I reconstructed the 8.10 taxonomy index from the `branch_8x` branch and that correctly set the major version as 8 for those older segments.
   2. Use a version based check for storing BDV fields or StringFields 
   
   I think the new commit might be slower that the previous `$full_path_binary$` option during indexing because it checks the Lucene version of the last commit everytime we add a new category.
    
   Finally, I think there should be a cleaner way of knowing if the index has atleast one commit or no. I use the `indexWriter.getLiveCommitData().iterator().hasNext()` call but maybe there is a better way..
   
   Side questions that need more thought:
   1. What is the use of the `LiveIndexWriterConfig.createdVersionMajor` param. I think instead of initializing it to the latest version, maybe we can assign the value of the min back compat version of the index to it (when the `LiveIndexWriterConfig` class is initialized).
   2. Can we fix the `DirectoryTaxonomyWriter.indexEpoch` variable to hold the accurate index epoch of the taxonomy index. 
   The current logic for `indexEpoch` assigns 1 even if the index is completely fresh. It also saves 1 as the value when the index has just 1 commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org