You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/06/05 21:10:29 UTC

[GitHub] [incubator-pinot] siddharthteotia opened a new pull request #5503: Config for raw index writer version

siddharthteotia opened a new pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503


   ## Description
   In PR https://github.com/apache/incubator-pinot/pull/5285,  the raw index writer format was changed to use 8 byte offset for each chunk in the file header. The writer version was bumped to 3. This was done to support > 2GB indexes. The change was backward compatible to continue the support for reading existing/old segments using 4-byte offsets
   
   While there is no problem with the change, it prevents rollback. So if there is any orthogonal issue while rolling out a release, we can't rollback to older Pinot release since segments already
   generated with 8-byte offsets can't be read by old code.
   
   This PR introduces a config option to set the writer version (2 for using old 4-byte chunk offset and 3 which is latest for 8-byte chunk offset). This config option is **temporary**. To deploy a new version of our internal offline segment creation/push job, we would like this format to be disabled temporarily to have the flexibility of dealing with issues by rolling back the release at will. Once the roll-out is complete, this config option will be removed.
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   No
   Does this PR fix a zero-downtime upgrade introduced earlier?
   No
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   
   ## Documentation
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#discussion_r436192934



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/BaseChunkSingleValueWriter.java
##########
@@ -36,6 +37,9 @@
  * Base class for fixed and variable byte writer implementations.
  */
 public abstract class BaseChunkSingleValueWriter implements SingleColumnSingleValueWriter {
+  public static final int DEFAULT_VERSION = 2;

Review comment:
       Maybe add a TODO or create an issue to remove this before 0.5.0 and put a pointer here to the issue?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/BaseChunkSingleValueWriter.java
##########
@@ -60,12 +64,13 @@
    * @param numDocsPerChunk Number of docs per data chunk
    * @param chunkSize Size of chunk
    * @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
-   * @param version Version of file
+   * @param version format version used to determine whether to use 8 or 4 byte chunk offsets

Review comment:
       this comment may not hold in the long term, so just leave it as is

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/FixedByteChunkSingleValueWriter.java
##########
@@ -64,15 +62,15 @@
    * @param compressionType Type of compression to use.
    * @param totalDocs Total number of docs to write.
    * @param numDocsPerChunk Number of documents per chunk.
-   * @param sizeOfEntry Size of entry (in bytes).
+   * @param sizeOfEntry Size of entry (in bytes)
+   * @param writerVersion writer version used to determine whether to use 8 or 4 byte chunk offsets

Review comment:
       same

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/BaseChunkSingleValueWriter.java
##########
@@ -36,6 +37,9 @@
  * Base class for fixed and variable byte writer implementations.
  */
 public abstract class BaseChunkSingleValueWriter implements SingleColumnSingleValueWriter {
+  public static final int DEFAULT_VERSION = 2;

Review comment:
       ```suggestion
     public static final int DEFAULT_WRITER_VERSION = 2;
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia edited a comment on pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#issuecomment-639876788


   > If I understand the intention correctly, this PR want to let production environment to still use the `DEFAULT_VERSION`, but check in the classes of the `CURRENT_VERSION` and the tests for them. If that is the case, can we just add a separate constructor with the version info so that the tests can use that? The default constructor will create the writer with `DEFAULT_VERSION`. We don't really need to make it configurable because in production environment you never want to use `CURRENT_VERSION` before the version is verified. In order to move to the `CURRENT_VERSION`, you can simply change the default constructor to the `CURRENT_VERSION`
   
   - We want to use CURRENT_VERSION in production environment (use cases for text index are going to need the 8-byte chunk offset because of huge raw data). 
   
   - CURRENT_VERSION is verified in production. We already have realtime segments that got completed and written with CURRENT_VERSION as of the currently deployed release in production. Has also been verified on our internal perf cluster.
   
   - However, for pbnj roll-out we need the flexibility temporarily to rollback the release if something **orthogonal to this** goes wrong. So while pbnj is being rolled out internally using implicit DEFAULT_VERSION, tables with text index are going to specifically enable this config since it has already been verified. Once the pbnj roll-out is complete, we can remove this. This config is being introduced temporarily only to give us some flexibility in the roll-out and work on any unrelated issues that might require a release rollback. 
   
   Coming to changing the dotted notation, I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. Going to migrate soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia merged pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia merged pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#discussion_r436197601



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -33,15 +33,16 @@
   private final IndexType _indexType;
   private final Map<String, String> _properties;
 
-  public static String BLOOM_FILTER_COLUMN_KEY = "bloom.filter";

Review comment:
       I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. So this change doesn't affect anyone. Just wanted to clean up names and remove the dotted notation. Going to migrate to FieldConfig soon




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia edited a comment on pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#issuecomment-639876788


   > If I understand the intention correctly, this PR want to let production environment to still use the `DEFAULT_VERSION`, but check in the classes of the `CURRENT_VERSION` and the tests for them. If that is the case, can we just add a separate constructor with the version info so that the tests can use that? The default constructor will create the writer with `DEFAULT_VERSION`. We don't really need to make it configurable because in production environment you never want to use `CURRENT_VERSION` before the version is verified. In order to move to the `CURRENT_VERSION`, you can simply change the default constructor to the `CURRENT_VERSION`
   
   - We want to use CURRENT_VERSION in production environment (use cases for text index are going to need the 8-byte chunk offset because of huge raw data). 
   
   - CURRENT_VERSION is verified in production. We already have realtime segments that got completed and written with CURRENT_VERSION as of the currently deployed release in production. Has also been verified on our internal perf cluster.
   
   - However, for pbnj roll-out we need the flexibility temporarily to rollback the release if something **orthogonal to this** goes wrong. So even though we know, 8-byte format (CURRENT_VERSION) is itself not a problem, something else might go wrong while pbnj is being rolled out internally. In such situations, we should be able to rollback the release. The plan is to use implicit DEFAULT_VERSION while rolling out pbnj and enable CURRENT_VERSION specifically for text index tables. Once the pbnj roll-out is complete, we can remove this. 
   
   Coming to changing the dotted notation, I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. Going to migrate soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#issuecomment-639876788


   > If I understand the intention correctly, this PR want to let production environment to still use the `DEFAULT_VERSION`, but check in the classes of the `CURRENT_VERSION` and the tests for them. If that is the case, can we just add a separate constructor with the version info so that the tests can use that? The default constructor will create the writer with `DEFAULT_VERSION`. We don't really need to make it configurable because in production environment you never want to use `CURRENT_VERSION` before the version is verified. In order to move to the `CURRENT_VERSION`, you can simply change the default constructor to the `CURRENT_VERSION`
   
   - We want to use CURRENT_VERSION in production environment (use cases for text index are going to need the 8-byte chunk offset because of huge raw data). 
   
   - CURRENT_VERSION is verified in production. We already have realtime segments that got completed and written with CURRENT_VERSION as of the currently deployed release in production. Has also been verified on our internal perf cluster.
   
   - However, for pbnj roll-out we need the flexibility temporarily to rollback the release if something **orthogonal to this** goes wrong. So while pbnj is being rolled out internally using implicit DEFAULT_VERSION, tables with text index are going to specifically enable this config since it has already been verified.
   
   Coming to changing the dotted notation, I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. Going to migrate soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia edited a comment on pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#issuecomment-639876788


   > If I understand the intention correctly, this PR want to let production environment to still use the `DEFAULT_VERSION`, but check in the classes of the `CURRENT_VERSION` and the tests for them. If that is the case, can we just add a separate constructor with the version info so that the tests can use that? The default constructor will create the writer with `DEFAULT_VERSION`. We don't really need to make it configurable because in production environment you never want to use `CURRENT_VERSION` before the version is verified. In order to move to the `CURRENT_VERSION`, you can simply change the default constructor to the `CURRENT_VERSION`
   
   - We want to use CURRENT_VERSION in production environment (use cases for text index are going to need the 8-byte chunk offset because of huge raw data). 
   
   - CURRENT_VERSION is verified in production. We already have realtime segments that got completed and written with CURRENT_VERSION as of the currently deployed release in production. Has also been verified on our internal perf cluster.
   
   - However, for pbnj roll-out we need the flexibility temporarily to rollback the release if something **orthogonal to this** goes wrong. So while pbnj is being rolled out internally using implicit DEFAULT_VERSION, tables with text index are going to specifically enable this config since it has already been verified. Once the pbnj roll-out is complete, we can remove this
   
   Coming to changing the dotted notation, I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. Going to migrate soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#discussion_r436181978



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -33,15 +33,16 @@
   private final IndexType _indexType;
   private final Map<String, String> _properties;
 
-  public static String BLOOM_FILTER_COLUMN_KEY = "bloom.filter";

Review comment:
       Why are we changing this? This is backward-incompatible change right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#discussion_r436197601



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -33,15 +33,16 @@
   private final IndexType _indexType;
   private final Map<String, String> _properties;
 
-  public static String BLOOM_FILTER_COLUMN_KEY = "bloom.filter";

Review comment:
       I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. So this change doesn't affect anyone. Going to migrate soon.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#discussion_r436200719



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/BaseChunkSingleValueWriter.java
##########
@@ -60,12 +64,13 @@
    * @param numDocsPerChunk Number of docs per data chunk
    * @param chunkSize Size of chunk
    * @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
-   * @param version Version of file
+   * @param version format version used to determine whether to use 8 or 4 byte chunk offsets

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/FixedByteChunkSingleValueWriter.java
##########
@@ -64,15 +62,15 @@
    * @param compressionType Type of compression to use.
    * @param totalDocs Total number of docs to write.
    * @param numDocsPerChunk Number of documents per chunk.
-   * @param sizeOfEntry Size of entry (in bytes).
+   * @param sizeOfEntry Size of entry (in bytes)
+   * @param writerVersion writer version used to determine whether to use 8 or 4 byte chunk offsets

Review comment:
       done

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/BaseChunkSingleValueWriter.java
##########
@@ -36,6 +37,9 @@
  * Base class for fixed and variable byte writer implementations.
  */
 public abstract class BaseChunkSingleValueWriter implements SingleColumnSingleValueWriter {
+  public static final int DEFAULT_VERSION = 2;

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5503: Config for raw index writer version

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5503:
URL: https://github.com/apache/incubator-pinot/pull/5503#discussion_r436197601



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
##########
@@ -33,15 +33,16 @@
   private final IndexType _indexType;
   private final Map<String, String> _properties;
 
-  public static String BLOOM_FILTER_COLUMN_KEY = "bloom.filter";

Review comment:
       I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. So this change doesn't affect anyone. Just wanted to clean up names and remove the dotted notation. Going to migrate soon.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org