You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2020/03/06 08:55:59 UTC

[GitHub] [samza] bkonold opened a new pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms

bkonold opened a new pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305
 
 
   **Feature:** Allow changelog min.compaction.lag.ms to be assigned a configurable default value that will apply to all stores if present, and unless overriden by a store-specific config.
    
   **Changes:** Introducing "stores.default.changelog.min.compaction.lag.ms" store config as a way to indicate a default value that will apply to all changelogs unless overriden for a store specifically.
    
   **Tests:** Added a unit test which verifies the config works as expected and honors the correct precedence.
   
   **API Changes:** None
   **Upgrade instructions:** None
   **Usage instructions:** Value of the configuration must be specified in milliseconds.  In addition, it follows the given precedence:  1) no configured default or store specific values present 2) configured default present 3) store specific value present

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] bkonold commented on a change in pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms

Posted by GitBox <gi...@apache.org>.
bkonold commented on a change in pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305#discussion_r393323879
 
 

 ##########
 File path: docs/learn/documentation/versioned/jobs/configuration-table.html
 ##########
 @@ -1585,6 +1585,22 @@ <h1>Samza Configuration Reference</h1>
                     </td>
                 </tr>
 
+                <tr>
+                    <td class="property" id="store-default-changelog-min-compaction-lag-ms">stores.default.changelog.min.compaction.lag.ms</td>
+                    <td class="default">14400000</td>
+                    <td class="description">
+                        This property defines the default minimum period that must pass before a changelog message can be compacted.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="store-changelog-min-compaction-lag-ms">stores.<span class="store">store-name</span>.changelog.min.compaction.lag.ms</td>
+                    <td class="default">stores.default.changelog.min.compaction.lag.ms</td>
+                    <td class="description">
+                        This property defines the minimum period that must pass before a message in the store's changelog can be compacted.
 
 Review comment:
   added details

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] bkonold commented on issue #1305: SAMZA-2479: Add configurable default for min compaction lag ms

Posted by GitBox <gi...@apache.org>.
bkonold commented on issue #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305#issuecomment-597286149
 
 
   > Few questions
   > 
   > 1. What do you think about having this configuration under `JobConfig` hierarchy instead of `StoreConfig`? I see why it is the way it is in your current PR and the reason for it to belong under `StoreConfig`. However, `stores.default....` dilutes the (implicit) stronger contract of what follows `stores` is the store name and makes `default` as a reserved store name.
   > 2. What are the challenges in coming up with code default for it? Can you add details in PR description about the limitations of code default and the need for configurability?
   
   1. I think that if the intention behind the various Config classes is to divide things semantically rather than by namespace, this is a good suggestion since this config will, if set, apply to all changelogs within the job. Is there any problem on StorageConfig depending on JobConfig? I will need to instantiate JobConfig to access to job-wide default to enforce the correct precedence.
   
   2. Will respond here and add to the PR. The need for a different value comes from an evaluation done internally (within LI) that we can have a more relaxed setting than 4hr across all our jobs. The code default was chosen when the feature was implemented and is now part of a released build of samza; this 4hr value has been communicated to OSS customers and since changing this value has an operational impact on customers' kafka deployments (storage space) and each customer may have different thresholds for what is acceptable in terms of cost, making this default configurable empowers them to easily configure their samza/kafka clusters.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat commented on a change in pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms

Posted by GitBox <gi...@apache.org>.
mynameborat commented on a change in pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305#discussion_r393256390
 
 

 ##########
 File path: docs/learn/documentation/versioned/jobs/configuration-table.html
 ##########
 @@ -1585,6 +1585,22 @@ <h1>Samza Configuration Reference</h1>
                     </td>
                 </tr>
 
+                <tr>
+                    <td class="property" id="store-default-changelog-min-compaction-lag-ms">stores.default.changelog.min.compaction.lag.ms</td>
+                    <td class="default">14400000</td>
+                    <td class="description">
+                        This property defines the default minimum period that must pass before a changelog message can be compacted.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="store-changelog-min-compaction-lag-ms">stores.<span class="store">store-name</span>.changelog.min.compaction.lag.ms</td>
+                    <td class="default">stores.default.changelog.min.compaction.lag.ms</td>
+                    <td class="description">
+                        This property defines the minimum period that must pass before a message in the store's changelog can be compacted.
 
 Review comment:
   Can you shed some light on how this configuration impacts application (if any), on Kafka?
   Are there any gotchas around how to configure this if you have TTL'd enabled on a log compacted store? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] bkonold edited a comment on issue #1305: SAMZA-2479: Add configurable default for min compaction lag ms

Posted by GitBox <gi...@apache.org>.
bkonold edited a comment on issue #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305#issuecomment-597286149
 
 
   > Few questions
   > 
   > 1. What do you think about having this configuration under `JobConfig` hierarchy instead of `StoreConfig`? I see why it is the way it is in your current PR and the reason for it to belong under `StoreConfig`. However, `stores.default....` dilutes the (implicit) stronger contract of what follows `stores` is the store name and makes `default` as a reserved store name.
   > 2. What are the challenges in coming up with code default for it? Can you add details in PR description about the limitations of code default and the need for configurability?
   
   @mynameborat 
   
   1. I think that if the intention behind the various Config classes is to divide things semantically rather than by namespace, this is a good suggestion since this config will, if set, apply to all changelogs within the job. Is there any problem on StorageConfig depending on JobConfig? I will need to instantiate JobConfig to access to job-wide default to enforce the correct precedence.
   
   2. Will respond here and add to the PR. The need for a different value comes from an evaluation done internally (within LI) that we can have a more relaxed setting than 4hr across all our jobs. The code default was chosen when the feature was implemented and is now part of a released build of samza; this 4hr value has been communicated to OSS customers and since changing this value has an operational impact on customers' kafka deployments (storage space) and each customer may have different thresholds for what is acceptable in terms of cost, making this default configurable empowers them to easily configure their samza/kafka clusters.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] bkonold commented on issue #1305: SAMZA-2479: Add configurable default for min compaction lag ms

Posted by GitBox <gi...@apache.org>.
bkonold commented on issue #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305#issuecomment-595668676
 
 
   @prateekm @bharathkk can you take a look?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat merged pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms

Posted by GitBox <gi...@apache.org>.
mynameborat merged pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] mynameborat commented on a change in pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms

Posted by GitBox <gi...@apache.org>.
mynameborat commented on a change in pull request #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305#discussion_r393256390
 
 

 ##########
 File path: docs/learn/documentation/versioned/jobs/configuration-table.html
 ##########
 @@ -1585,6 +1585,22 @@ <h1>Samza Configuration Reference</h1>
                     </td>
                 </tr>
 
+                <tr>
+                    <td class="property" id="store-default-changelog-min-compaction-lag-ms">stores.default.changelog.min.compaction.lag.ms</td>
+                    <td class="default">14400000</td>
+                    <td class="description">
+                        This property defines the default minimum period that must pass before a changelog message can be compacted.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="store-changelog-min-compaction-lag-ms">stores.<span class="store">store-name</span>.changelog.min.compaction.lag.ms</td>
+                    <td class="default">stores.default.changelog.min.compaction.lag.ms</td>
+                    <td class="description">
+                        This property defines the minimum period that must pass before a message in the store's changelog can be compacted.
 
 Review comment:
   Can you shed some light on how this configuration impacts application (if any), Kafka (data footprint etc)?
   Are there any gotchas around how to configure this if you have TTL'd enabled on a log compacted store? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [samza] bkonold edited a comment on issue #1305: SAMZA-2479: Add configurable default for min compaction lag ms

Posted by GitBox <gi...@apache.org>.
bkonold edited a comment on issue #1305: SAMZA-2479: Add configurable default for min compaction lag ms
URL: https://github.com/apache/samza/pull/1305#issuecomment-597286149
 
 
   > Few questions
   > 
   > 1. What do you think about having this configuration under `JobConfig` hierarchy instead of `StoreConfig`? I see why it is the way it is in your current PR and the reason for it to belong under `StoreConfig`. However, `stores.default....` dilutes the (implicit) stronger contract of what follows `stores` is the store name and makes `default` as a reserved store name.
   > 2. What are the challenges in coming up with code default for it? Can you add details in PR description about the limitations of code default and the need for configurability?
   
   @mynameborat 
   
   1. This sounds reasonable. So you are suggesting we change this config to `job.default.changelog...`? I am good with that; I was only following what seemed to be an already established precedent in our configs w.r.t. "default" store configs: https://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html#advanced-storage-configurations. Take a look at replication factor.
   
   2. Will respond here and add to the PR. The need for a different value comes from an evaluation done internally (within LI) that we can have a more relaxed setting than 4hr across all our jobs. The code default was chosen when the feature was implemented and is now part of a released build of samza; this 4hr value has been communicated to OSS customers and since changing this value has an operational impact on customers' kafka deployments (storage space) and each customer may have different thresholds for what is acceptable in terms of cost, making this default configurable empowers them to easily configure their samza/kafka clusters.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services