You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/06/07 02:55:00 UTC

[GitHub] [druid] FrankChen021 opened a new pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

FrankChen021 opened a new pull request #11333:
URL: https://github.com/apache/druid/pull/11333


   
   ### Background
   
   #10203 introduces a human readable way to configure values of some byte-related properties. In that PR, two different standard unit systems are supported: [decimal format and binary format](https://en.wikipedia.org/wiki/Binary_prefix). 
   
   The former one is a unit system based on powers of 10(also known as SI format), while the latter one is based on powers of 2(also knowns as IEC format). For example, 1KB = 1000 bytes while 1KiB = 1024 bytes.
   
   But in old days, KB/MB/GB are also used to express values based on power of 2, which causes ambiguities. If there's an a word `64MB`, it's hard to tell whether it is `64 * 1000 * 100 * 1000` or `64 * 1024 * 1024 * 1024`.
   
   ### Description
   
   This PR tidies up all KB/MB/GB usages in the doc and javadoc to use the right unit system based on the code. 
   After this PR, **all KB/MB/GB in the doc/javadoc are decimal format**.  And all the doc/javadoc in the future should use the right unit system. 
   
   Generally speaking, for the disk size/file size/segment size, decimal format is used, while for other values, binary format is applied.
   
   This PR has:
   - [X] been self-reviewed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r647427119



##########
File path: core/src/main/java/org/apache/druid/data/input/MaxSizeSplitHintSpec.java
##########
@@ -53,7 +53,7 @@
    * - 'jute.maxbuffer' in ZooKeeper. This system property controls the max size of ZNode. As its default is 500KB,
    *   task allocation can fail if the serialized ingestion spec is larger than this limit.
    * - 'max_allowed_packet' in MySQL. This is the max size of a communication packet sent to a MySQL server.
-   *   The default is either 64MB or 4MB depending on MySQL version. Updating metadata store can fail if the serialized
+   *   The default is either <a href="https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_allowed_packet">64MiB</a> or 4MiB depending on MySQL version. Updating metadata store can fail if the serialized

Review comment:
       Fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland merged pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
sthetland merged pull request #11333:
URL: https://github.com/apache/druid/pull/11333


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r646245760



##########
File path: docs/development/extensions-core/druid-lookups.md
##########
@@ -139,7 +139,7 @@ Off heap cache is backed by [MapDB](http://www.mapdb.org/) implementation. MapDB
 
 |Field|Type|Description|Required|default|
 |-----|----|-----------|--------|-------|
-|maxStoreSize|double|maximal size of store in GB, if store is larger entries will start expiring|no |0|

Review comment:
       There's no neither javadoc nor explicit code in druid, but according to the implementation of underlying library(org.mapdb.DB), it's `GiB`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r659147570



##########
File path: docs/tutorials/cluster.md
##########
@@ -361,17 +361,17 @@ The resulting configs after the split:
 New Historical (on 2 Data servers)
 
 ```
- druid.processing.buffer.sizeBytes=500000000
- druid.processing.numMergeBuffers=8
- druid.processing.numThreads=31
+druid.processing.buffer.sizeBytes=500MiB

Review comment:
       All configuration files are using `500MiB` or `100MiB` too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] techdocsmith commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
techdocsmith commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r647024996



##########
File path: core/src/main/java/org/apache/druid/data/input/MaxSizeSplitHintSpec.java
##########
@@ -53,7 +53,7 @@
    * - 'jute.maxbuffer' in ZooKeeper. This system property controls the max size of ZNode. As its default is 500KB,
    *   task allocation can fail if the serialized ingestion spec is larger than this limit.
    * - 'max_allowed_packet' in MySQL. This is the max size of a communication packet sent to a MySQL server.
-   *   The default is either 64MB or 4MB depending on MySQL version. Updating metadata store can fail if the serialized
+   *   The default is either <a href="https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_allowed_packet">64MiB</a> or 4MiB depending on MySQL version. Updating metadata store can fail if the serialized

Review comment:
       nit: why not use markdown anchor syntax.

##########
File path: website/.spelling
##########
@@ -1210,18 +1210,17 @@ taskId
 taskid
 un
  - ../docs/operations/basic-cluster-tuning.md
-100MB
-128MB
+100MiB

Review comment:
       too bad spelling system doesn't have a better way of handling all these like `KiB, GiB` :/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r659147101



##########
File path: docs/tutorials/cluster.md
##########
@@ -337,11 +337,11 @@ MiddleManager (Single-server)
 ```
 druid.worker.capacity=8
 druid.indexer.fork.property.druid.processing.numMergeBuffers=2
-druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
+druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB

Review comment:
       The configuration file uses the human-readable form `100MiB` already
   
   https://github.com/apache/druid/blob/af2ab98574582099557932e6a6370bf8dcdf3fab/examples/conf/druid/single-server/small/middleManager/runtime.properties#L35




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #11333:
URL: https://github.com/apache/druid/pull/11333#issuecomment-868978196


   Hi @sthetland , all values in the example configuration files are using human-readable format, so the changes you mentioned above make the document consistent with those configuration files. I think the previous PR(#10203) forgot to change this document. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r647420109



##########
File path: website/.spelling
##########
@@ -1210,18 +1210,17 @@ taskId
 taskid
 un
  - ../docs/operations/basic-cluster-tuning.md
-100MB
-128MB
+100MiB

Review comment:
       Yes, spelling check is not so smart, it treats `100MiB` as a whole word. Suppression of `MiB` won't apply to it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r659146984



##########
File path: docs/tutorials/cluster.md
##########
@@ -322,12 +322,12 @@ You can copy your existing `coordinator-overlord` configs from the single-server
 
 #### Data
 
-Suppose we are migrating from a single-server deployment that had 32 CPU and 256GB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
+Suppose we are migrating from a single-server deployment that had 32 CPU and 256GiB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
 
 Historical (Single-server)
 
 ```
-druid.processing.buffer.sizeBytes=500000000
+druid.processing.buffer.sizeBytes=500MiB

Review comment:
       This change makes the document consistent with the runtime.properties file. 
   
   https://github.com/apache/druid/blob/af2ab98574582099557932e6a6370bf8dcdf3fab/examples/conf/druid/single-server/small/historical/runtime.properties#L27




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r646244124



##########
File path: docs/configuration/index.md
##########
@@ -393,7 +393,7 @@ The Druid servers [emit various metrics](../operations/metrics.md) and alerts vi
 |`druid.emitter.http.flushTimeOut`|The timeout after which an event should be sent to the endpoint, even if internal buffers are not filled, in milliseconds.|not specified = no timeout|
 |`druid.emitter.http.batchingStrategy`|The strategy of how the batch is formatted. "ARRAY" means `[event1,event2]`, "NEWLINES" means `event1\nevent2`, ONLY_EVENTS means `event1event2`.|ARRAY|
 |`druid.emitter.http.maxBatchSize`|The maximum batch size, in bytes.|the minimum of (10% of JVM heap size divided by 2) or (5242880 (i. e. 5 MiB))|
-|`druid.emitter.http.batchQueueSizeLimit`|The maximum number of batches in emitter queue, if there are problems with emitting.|the maximum of (2) or (10% of the JVM heap size divided by 5MB)|

Review comment:
       From the code we could see that the defualt value is `5MiB`
   
   https://github.com/apache/druid/blob/0c5d1c9725303111df69198a232166ddf00672d2/core/src/main/java/org/apache/druid/java/util/emitter/core/BaseHttpEmittingConfig.java#L63




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r646245057



##########
File path: docs/development/extensions-core/druid-kerberos.md
##########
@@ -72,7 +72,7 @@ If required, multiple rules can be be joined by newline character and specified
 ### Increasing HTTP Header size for large SPNEGO negotiate header
 In Active Directory environment, SPNEGO token in the Authorization header includes PAC (Privilege Access Certificate) information,
 which includes all security groups for the user. In some cases when the user belongs to many security groups the header to grow beyond what druid can handle by default.
-In such cases, max request header size that druid can handle can be increased by setting `druid.server.http.maxRequestHeaderSize` (default 8Kb) and `druid.router.http.maxRequestBufferSize` (default 8Kb).

Review comment:
       According to the code, its default value is `8KiB`
   
   https://github.com/apache/druid/blob/0c5d1c9725303111df69198a232166ddf00672d2/server/src/main/java/org/apache/druid/server/initialization/ServerConfig.java#L122




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r646244652



##########
File path: docs/configuration/index.md
##########
@@ -1433,7 +1433,7 @@ Druid uses Jetty to serve HTTP requests.
 
 |Property|Description|Default|
 |--------|-----------|-------|
-|`druid.processing.buffer.sizeBytes`|This specifies a buffer size (less than 2GiB) for the storage of intermediate results. The computation engine in the Indexer processes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. [Human-readable format](human-readable-byte.md) is supported.|auto (max 1GB)|

Review comment:
       max value is `1GiB` according to the code
   
   https://github.com/apache/druid/blob/0c5d1c9725303111df69198a232166ddf00672d2/processing/src/main/java/org/apache/druid/query/DruidProcessingConfig.java#L39




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r646246239



##########
File path: docs/querying/querying.md
##########
@@ -144,5 +144,5 @@ Possible Druid error codes for the `error` field include:
 |`Query timeout`|504|The query timed out.|
 |`Query interrupted`|500|The query was interrupted, possibly due to JVM shutdown.|
 |`Query cancelled`|500|The query was cancelled through the query cancellation API.|
-|`Truncated response context`|500|An intermediate response context for the query exceeded the built-in limit of 7KB.<br/><br/>The response context is an internal data structure that Druid servers use to share out-of-band information when sending query results to each other. It is serialized in an HTTP header with a maximum length of 7KB. This error occurs when an intermediate response context sent from a data server (like a Historical) to the Broker exceeds this limit.<br/><br/>The response context is used for a variety of purposes, but the one most likely to generate a large context is sharing details about segments that move during a query. That means this error can potentially indicate that a very large number of segments moved in between the time a Broker issued a query and the time it was processed on Historicals. This should rarely, if ever, occur during normal operation.|

Review comment:
       According to the code, its `7KiB`
   
   https://github.com/apache/druid/blob/0c5d1c9725303111df69198a232166ddf00672d2/server/src/main/java/org/apache/druid/server/ResponseContextConfig.java#L29




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #11333: Eliminate ambiguities of KB/MB/GB in the doc

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #11333:
URL: https://github.com/apache/druid/pull/11333#discussion_r659060551



##########
File path: docs/tutorials/cluster.md
##########
@@ -337,11 +337,11 @@ MiddleManager (Single-server)
 ```
 druid.worker.capacity=8
 druid.indexer.fork.property.druid.processing.numMergeBuffers=2
-druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
+druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB

Review comment:
       As in previous comment. Might be better to keep in the original form, just to match the config files as shipped to avoid confusion. 
   
    

##########
File path: docs/tutorials/cluster.md
##########
@@ -361,17 +361,17 @@ The resulting configs after the split:
 New Historical (on 2 Data servers)
 
 ```
- druid.processing.buffer.sizeBytes=500000000
- druid.processing.numMergeBuffers=8
- druid.processing.numThreads=31
+druid.processing.buffer.sizeBytes=500MiB

Review comment:
       Here too, if reverting the change above, along with line 374 below 

##########
File path: docs/tutorials/cluster.md
##########
@@ -322,12 +322,12 @@ You can copy your existing `coordinator-overlord` configs from the single-server
 
 #### Data
 
-Suppose we are migrating from a single-server deployment that had 32 CPU and 256GB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
+Suppose we are migrating from a single-server deployment that had 32 CPU and 256GiB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
 
 Historical (Single-server)
 
 ```
-druid.processing.buffer.sizeBytes=500000000
+druid.processing.buffer.sizeBytes=500MiB

Review comment:
       This change and the next one are lines from the single server small configuration file, so it might be clearer to leave them in their original form, so that they match the config files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org