You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/02/18 00:21:23 UTC

[GitHub] [lucene-solr] jtibshirani opened a new pull request #2395: LUCENE-9616: Add developer docs on how to update a format.

jtibshirani opened a new pull request #2395:
URL: https://github.com/apache/lucene-solr/pull/2395


   This commit adds simple guidelines on how to make a change to a file format:
   * Document how the 'copy-on-write' approach works with backwards-codecs
   * Clarify that we prefer to copy the format instead of using internal versions


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jtibshirani commented on pull request #2395: LUCENE-9616: Add developer docs on how to update a format.

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on pull request #2395:
URL: https://github.com/apache/lucene-solr/pull/2395#issuecomment-780942757


   I'm not sure this is the right place to add developer docs... let me know if a wiki page would be more appropriate!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov commented on a change in pull request #2395: LUCENE-9616: Add developer docs on how to update a format.

Posted by GitBox <gi...@apache.org>.
msokolov commented on a change in pull request #2395:
URL: https://github.com/apache/lucene-solr/pull/2395#discussion_r578690516



##########
File path: lucene/backward-codecs/README.md
##########
@@ -0,0 +1,54 @@
+# Index backwards compatibility
+
+This README describes the approach to maintaining compatibility with indices
+from previous versions and gives guidelines for making format changes.
+
+## Compatibility strategy
+
+Codecs and file formats are versioned according to the minor version in which
+they were created. For example Lucene87Codec represents the codec used for
+creating Lucene 8.7 indices, and potentially later index versions too. Each
+segment records the codec version that was used to write it.
+
+Lucene supports the ability to read segments created in older versions by
+maintaining old codec classes. These older codecs live in the backwards-codecs
+package along with their file formats. When making a change to a file format,
+we create a fresh copies of the codec and format, and move the existing ones
+into backwards-codecs.
+
+Older codecs are tested in two ways:
+* Through unit tests like TestLucene80NormsFormat, which checks we can write
+then read data using each old format
+* Through TestBackwardsCompatibility, which loads indices created in previous
+versions and checks that we can search them
+
+## Making index format changes
+
+As an example, let's say we're making a change to the norms file format, and
+the current class in core is Lucene80NormsFormat. We'd perform the following
+steps:
+
+1. Create a new format with the target version for the changes, for example
+Lucene90NormsFormat. This includes creating copies of its writer and reader
+classes, as well as any helper classes. Make sure to copy unit tests too, like
+TestLucene80NormsFormat.
+2. Move the old Lucene80NormsFormat, along with its writer, reader, tests, and
+helper classes to the backwards-codecs package. If the format will only be
+used for reading, then delete the write-side logic and move it to a test-only
+class like Lucene80RWNormsFormat to support unit tests. Note that most formats
+only need read logic, but a small set including DocValuesFormat and
+FieldInfosFormat will need to retain write logic since can be used to update

Review comment:
       "since they can be used"

##########
File path: lucene/backward-codecs/README.md
##########
@@ -0,0 +1,54 @@
+# Index backwards compatibility
+
+This README describes the approach to maintaining compatibility with indices
+from previous versions and gives guidelines for making format changes.
+
+## Compatibility strategy
+
+Codecs and file formats are versioned according to the minor version in which
+they were created. For example Lucene87Codec represents the codec used for
+creating Lucene 8.7 indices, and potentially later index versions too. Each
+segment records the codec version that was used to write it.
+
+Lucene supports the ability to read segments created in older versions by
+maintaining old codec classes. These older codecs live in the backwards-codecs
+package along with their file formats. When making a change to a file format,
+we create a fresh copies of the codec and format, and move the existing ones

Review comment:
       "we create fresh copies" (strike "a")




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2395: LUCENE-9616: Add developer docs on how to update a format.

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #2395:
URL: https://github.com/apache/lucene-solr/pull/2395#discussion_r578704687



##########
File path: lucene/backward-codecs/README.md
##########
@@ -0,0 +1,45 @@
+# Index backwards compatibility
+
+This README describes the approach to maintaining compatibility with indices
+from previous versions and gives guidelines for making format changes.
+
+## Compatibility strategy
+
+Lucene supports the ability to read segments created in older versions by
+maintaining old codec classes along with their formats. When making a change
+to a file format, we create a fresh format class and copy the existing one
+into the backwards-codecs package.
+
+These older formats are tested in two ways:
+* Through unit tests like TestLucene80NormsFormat, which checks we can write
+then read data using the old format
+* Through TestBackwardsCompatibility, which loads indices created in previous
+versions and checks that we can search them
+
+## Making index format changes
+
+As an example, let's say we're making a change to the norms file format, and
+the current class in core is Lucene80NormsFormat. We'd perform the following
+steps:
+
+1. Create a new format with the target version for the changes, for example
+Lucene90NormsFormat. This includes creating copies of its writer and reader
+classes, as well as any helper classes. Make sure to copy unit tests too, like
+TestLucene80NormsFormat.
+2. Move the old Lucene80NormsFormat, along with its writer, reader, tests, and
+helper classes to the backwards-codecs package. If the format will only be
+used for reading, then delete the write-side logic and move it to a test-only
+class like Lucene80RWNormsFormat to support unit tests. Note that most formats
+only need read logic, but a small set including DocValuesFormat and
+FieldInfosFormat will need to retain write logic since can be used to update
+old segments.
+3. Make a change to the new format!
+
+## Internal format versions
+
+Each format class maintains an internal version which is written into the

Review comment:
       Thanks for the copy-edits. I wasn't sure if javadocs were the best place as they tend to be user-facing (describing how to use backwards-codecs, for example, as opposed to the development details).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jtibshirani merged pull request #2395: LUCENE-9616: Add developer docs on how to update a format.

Posted by GitBox <gi...@apache.org>.
jtibshirani merged pull request #2395:
URL: https://github.com/apache/lucene-solr/pull/2395


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2395: LUCENE-9616: Add developer docs on how to update a format.

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #2395:
URL: https://github.com/apache/lucene-solr/pull/2395#discussion_r578036823



##########
File path: lucene/backward-codecs/README.md
##########
@@ -0,0 +1,45 @@
+# Index backwards compatibility
+
+This README describes the approach to maintaining compatibility with indices
+from previous versions and gives guidelines for making format changes.
+
+## Compatibility strategy
+
+Lucene supports the ability to read segments created in older versions by
+maintaining old codec classes along with their formats. When making a change
+to a file format, we create a fresh format class and copy the existing one
+into the backwards-codecs package.
+
+These older formats are tested in two ways:
+* Through unit tests like TestLucene80NormsFormat, which checks we can write
+then read data using the old format
+* Through TestBackwardsCompatibility, which loads indices created in previous
+versions and checks that we can search them
+
+## Making index format changes
+
+As an example, let's say we're making a change to the norms file format, and
+the current class in core is Lucene80NormsFormat. We'd perform the following
+steps:
+
+1. Create a new format with the target version for the changes, for example
+Lucene90NormsFormat. This includes creating copies of its writer and reader
+classes, as well as any helper classes. Make sure to copy unit tests too, like
+TestLucene80NormsFormat.
+2. Move the old Lucene80NormsFormat, along with its writer, reader, tests, and
+helper classes to the backwards-codecs package. If the format will only be
+used for reading, then delete the write-side logic and move it to a test-only
+class like Lucene80RWNormsFormat to support unit tests. Note that most formats
+only need read logic, but a small set including DocValuesFormat and
+FieldInfosFormat will need to retain write logic since can be used to update
+old segments.
+3. Make a change to the new format!
+
+## Internal format versions
+
+Each format class maintains an internal version which is written into the

Review comment:
       This is based on the discussion in https://issues.apache.org/jira/browse/LUCENE-9616.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jpountz commented on a change in pull request #2395: LUCENE-9616: Add developer docs on how to update a format.

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #2395:
URL: https://github.com/apache/lucene-solr/pull/2395#discussion_r579817121



##########
File path: lucene/backward-codecs/README.md
##########
@@ -0,0 +1,54 @@
+# Index backwards compatibility
+
+This README describes the approach to maintaining compatibility with indices
+from previous versions and gives guidelines for making format changes.
+
+## Compatibility strategy
+
+Codecs and file formats are versioned according to the minor version in which
+they were created. For example Lucene87Codec represents the codec used for
+creating Lucene 8.7 indices, and potentially later index versions too. Each
+segment records the codec version that was used to write it.

Review comment:
       I think that the following would be more accurate?
   
   ```suggestion
   segment records the codec name that was used to write it.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov commented on a change in pull request #2395: LUCENE-9616: Add developer docs on how to update a format.

Posted by GitBox <gi...@apache.org>.
msokolov commented on a change in pull request #2395:
URL: https://github.com/apache/lucene-solr/pull/2395#discussion_r578693806



##########
File path: lucene/backward-codecs/README.md
##########
@@ -0,0 +1,45 @@
+# Index backwards compatibility
+
+This README describes the approach to maintaining compatibility with indices
+from previous versions and gives guidelines for making format changes.
+
+## Compatibility strategy
+
+Lucene supports the ability to read segments created in older versions by
+maintaining old codec classes along with their formats. When making a change
+to a file format, we create a fresh format class and copy the existing one
+into the backwards-codecs package.
+
+These older formats are tested in two ways:
+* Through unit tests like TestLucene80NormsFormat, which checks we can write
+then read data using the old format
+* Through TestBackwardsCompatibility, which loads indices created in previous
+versions and checks that we can search them
+
+## Making index format changes
+
+As an example, let's say we're making a change to the norms file format, and
+the current class in core is Lucene80NormsFormat. We'd perform the following
+steps:
+
+1. Create a new format with the target version for the changes, for example
+Lucene90NormsFormat. This includes creating copies of its writer and reader
+classes, as well as any helper classes. Make sure to copy unit tests too, like
+TestLucene80NormsFormat.
+2. Move the old Lucene80NormsFormat, along with its writer, reader, tests, and
+helper classes to the backwards-codecs package. If the format will only be
+used for reading, then delete the write-side logic and move it to a test-only
+class like Lucene80RWNormsFormat to support unit tests. Note that most formats
+only need read logic, but a small set including DocValuesFormat and
+FieldInfosFormat will need to retain write logic since can be used to update
+old segments.
+3. Make a change to the new format!
+
+## Internal format versions
+
+Each format class maintains an internal version which is written into the

Review comment:
       Maybe in package-level javadocs: https://lucene.apache.org/core/8_8_0/backward-codecs/index.html
   
   I would vote for either what you have done here, or javadocs, in preference to wiki




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org