You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/01/20 13:59:12 UTC

[GitHub] [drill] jnturton opened a new pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

jnturton opened a new pull request #2432:
URL: https://github.com/apache/drill/pull/2432


   # [DRILL-7994](https://issues.apache.org/jira/browse/DRILL-7994): Dependency version updates for severe vulnerabilities
   
   ## Description
   
   Based on the OWASP dependency check report.
   
   ## Documentation
   N/A
   
   ## Testing
   Unit test suite.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795057164



##########
File path: pom.xml
##########
@@ -98,8 +98,8 @@
     <javax.validation.api>2.0.1.Final</javax.validation.api>
     <asm.version>9.2</asm.version>
     <excludedGroups />
-    <memoryMb>1800</memoryMb>
-    <directMemoryMb>3000</directMemoryMb>
+    <memoryMb>2000</memoryMb>
+    <directMemoryMb>2500</directMemoryMb>

Review comment:
       @vdiravka do you mean _increase_ slightly?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton edited a comment on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton edited a comment on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1023528517


   @tdunning thank you for clarifying.  I speak under correction but the application of the t-digest here will be to help the query planner to estimate relation cardinalities, those being statistics which it will then use to decide e.g. which of two relations should be the "build side" of a hash join.  Optimally, the smaller relation goes on the build side in that example.
   
   To estimate a relation's cardinality after a range predicate has been applied, the planner can teach for quantiles that with luck were precomputed over the relevant column.  I can't see a reason why real-world range predicates would in general be more focused on the tails of distributions, the shapes of which we'd probably like to assume nothing about anyway.
   
   My take is that probably any t-digest scale factor is quite accurate enough for a query planner, but of them all I'd agree that K_0 is probably the best choice under the assumptions here.  We can now either hard code Drill to K_0 or default it to K_0 and add a config option so that users can override.  I'm not convinced the config option will justify its inclusion because it is one that users would need to vary from table to table depending on specifics of the data there.  So I'm inclined to just hard code a good general default.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r794488084



##########
File path: distribution/pom.xml
##########
@@ -199,28 +199,30 @@
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64-fedora</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>osx-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <!-- bump warning: windows-x86_64 jars apparently stopped being published
+      after 2.0.36, see https://repo1.maven.org/maven2/io/netty/netty-tcnative/ -->
+      <version>2.0.36.Final</version>

Review comment:
       I don't _think_ so, just that Windows users are stuck on 2.0.36.Final.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795059136



##########
File path: exec/jdbc-all/pom.xml
##########
@@ -33,7 +33,7 @@
        "package.namespace.prefix" equals to "oadd.". It can be overridden if necessary within any profile -->
   <properties>
     <package.namespace.prefix>oadd.</package.namespace.prefix>
-    <jdbc-all-jar.maxsize>46700000</jdbc-all-jar.maxsize>
+    <jdbc-all-jar.maxsize>44100000</jdbc-all-jar.maxsize>

Review comment:
       @vdiravka I did not shrink jdbc-all by that anything like that much in the end.  I did not realise that I had removed some things that are real runtime dependencies.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795055265



##########
File path: pom.xml
##########
@@ -98,8 +98,8 @@
     <javax.validation.api>2.0.1.Final</javax.validation.api>
     <asm.version>9.2</asm.version>
     <excludedGroups />
-    <memoryMb>1800</memoryMb>
-    <directMemoryMb>3000</directMemoryMb>
+    <memoryMb>2000</memoryMb>
+    <directMemoryMb>2500</directMemoryMb>

Review comment:
       Not sure 2500 is enough. Drill can use more. If you will run several times tests with this configs sucesfully - it is fine. Otherwise try to decrease slightly - 100-200Mb




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1026594592


   The changes look good to me. But need to solve jdk8 build for sure. What do you think what lib affects requiring more memory? `t-diggest`? And looks like the aggregation function require more memory (`TestAggregateFunctions`).
   To understand better the memory usage during execution it is possible to check that test case in debug mode with VisualVM or Intellij Idea profiler tool
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r794296716



##########
File path: contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpScanBatchCreator.java
##########
@@ -161,8 +161,8 @@ public void bind(ManagedScanFramework framework) { }
         }
       } else if (paginatorConfig != null) {
         /*
-        * If the paginator is not null and generated a list of URLs, we create
-        * a new batch reader for each URL.  In the future, this could be parallelized in
+        * If the paginator is not null we create a new batch reader for each
+        * URL that it generates.  In the future, this could be parallelized in

Review comment:
       @vdiravka I didn't realise that two spaces after a period has gone out of fashion, but it seems it started dying shortly after the typewriter :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton merged pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton merged pull request #2432:
URL: https://github.com/apache/drill/pull/2432


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795056982



##########
File path: distribution/pom.xml
##########
@@ -199,28 +199,30 @@
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64-fedora</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>osx-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <!-- bump warning: windows-x86_64 jars apparently stopped being published
+      after 2.0.36, see https://repo1.maven.org/maven2/io/netty/netty-tcnative/ -->
+      <version>2.0.36.Final</version>

Review comment:
       @vdiravka I only see binaries for linux and osx in versions after 2.0.36, e.g.
   
   https://repo1.maven.org/maven2/io/netty/netty-tcnative/2.0.48.Final/
   
   Should I look at other repos?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r794335227



##########
File path: pom.xml
##########
@@ -104,7 +104,7 @@
     <license.skip>true</license.skip>
     <docker.repository>apache/drill</docker.repository>
     <antlr.version>4.8-1</antlr.version>
-    <maven.version>3.6.3</maven.version>
+    <maven.version>3.8.4</maven.version>

Review comment:
       @vdiravka, it builds fine for me in Maven 3.8.4 too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795054838



##########
File path: exec/jdbc-all/pom.xml
##########
@@ -33,7 +33,7 @@
        "package.namespace.prefix" equals to "oadd.". It can be overridden if necessary within any profile -->
   <properties>
     <package.namespace.prefix>oadd.</package.namespace.prefix>
-    <jdbc-all-jar.maxsize>46700000</jdbc-all-jar.maxsize>
+    <jdbc-all-jar.maxsize>46600000</jdbc-all-jar.maxsize>

Review comment:
       Previous value looked better 😁




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795446355



##########
File path: contrib/storage-jdbc/src/test/java/org/apache/drill/exec/store/jdbc/TestJdbcWriterWithH2.java
##########
@@ -142,14 +142,14 @@ public void testBasicCTASWithDataTypes() throws Exception {
     DirectRowSet results = queryBuilder().sql(testQuery).rowSet();
 
     TupleMetadata expectedSchema = new SchemaBuilder()
-      .addNullable("int_field", MinorType.INT, 10)
-      .addNullable("bigint_field", MinorType.BIGINT, 19)
-      .addNullable("float4_field", MinorType.VARDECIMAL, 38, 37)
-      .addNullable("float8_field", MinorType.VARDECIMAL, 38, 37)
+      .addNullable("int_field", MinorType.INT, 32)
+      .addNullable("bigint_field", MinorType.BIGINT, 38)
+      .addNullable("float4_field", MinorType.FLOAT4, 38)
+      .addNullable("float8_field", MinorType.FLOAT8, 38)

Review comment:
       Does it mean we started supporting float with new h2 version?

##########
File path: pom.xml
##########
@@ -104,7 +104,7 @@
     <license.skip>true</license.skip>
     <docker.repository>apache/drill</docker.repository>
     <antlr.version>4.8-1</antlr.version>
-    <maven.version>3.6.3</maven.version>
+    <maven.version>3.8.4</maven.version>

Review comment:
       You can update `Envirnment.md` with links to recommended and minimum Maven versions :
   ```
   ...
   To build, you need to have the following software installed on your system to successfully complete a build. 
     * Java 8
     * Maven 3.6.3 or greater
   ```

##########
File path: pom.xml
##########
@@ -92,20 +92,21 @@
     <javassist.version>3.28.0-GA</javassist.version>
     <msgpack.version>0.6.6</msgpack.version>
     <reflections.version>0.9.10</reflections.version>
-    <avro.version>1.9.1</avro.version>
+    <avro.version>1.11.0</avro.version>
     <metrics.version>4.0.2</metrics.version>
-    <jetty.version>9.4.41.v20210516</jetty.version>
+    <jetty.version>9.4.44.v20210927</jetty.version>
     <jersey.version>2.34</jersey.version>
     <javax.validation.api>2.0.1.Final</javax.validation.api>
     <asm.version>9.2</asm.version>
     <excludedGroups />
-    <memoryMb>1800</memoryMb>
-    <directMemoryMb>3000</directMemoryMb>
+    <memoryMb>2000</memoryMb>
+    <directMemoryMb>2700</directMemoryMb>
     <rat.skip>true</rat.skip>
     <license.skip>true</license.skip>
     <docker.repository>apache/drill</docker.repository>
     <antlr.version>4.8-1</antlr.version>
-    <maven.version>3.6.3</maven.version>
+    <maven.version>3.8.4</maven.version>
+    <maven.version.min>3.6.3</maven.version.min>

Review comment:
       What about:
   ```suggestion
       <maven.min.version>3.6.3</maven.min.version>
   ```

##########
File path: distribution/pom.xml
##########
@@ -199,28 +199,30 @@
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64-fedora</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>osx-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <!-- bump warning: windows-x86_64 jars apparently stopped being published
+      after 2.0.36, see https://repo1.maven.org/maven2/io/netty/netty-tcnative/ -->
+      <version>2.0.36.Final</version>

Review comment:
       you are right, `2.0.48.Final` for windows is missing. But looks like it is present for [2.0.46.Final](https://repo1.maven.org/maven2/io/netty/netty-tcnative-boringssl-static/2.0.46.Final/)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1023204744


   @tdunning when you say "median accuracy" there, do you mean median estimate error across the entire distribution, or do you mean estimate error specifically near the distribution's median?  Do you think that in the case of Drill we'd make no assumptions about user data distributions, and opt for a t-digest scale factor that optimises an _overall_ accuracy statistic rather than one that targets any specific area (e.g. tails, median).  PS I'm waving my hands a bit here, I don't know if what I've said is well defined...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1022065612


   @tdunning, that would be great thanks.  Here the selectivity of a range predicate on an integer column is estimated from an a set of quantile estimates, here termed an equidepth histogram (see [NumericEquiDepthHistogram](https://github.com/apache/drill/blob/4aefcef2b665c5737471664912a26ef6ed9a6cfc/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/NumericEquiDepthHistogram.java)).  The quantile estimates are recovered from a MergingDigest over the integer column that was created by an earlier `ANALYSE TABLE` command.  If said quantile estimates have shifted then that would explain what we're seeing...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] tdunning commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
tdunning commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1022642104


   If you want equal absolute error across the range then you can set the
   scale function on the t-digest to K_0 so it doesn't prioritize the tails.
   
   This can give roughly 2:1 improvement on median accuracy.
   
   
   
   On Wed, Jan 26, 2022 at 2:27 AM James Turton ***@***.***>
   wrote:
   
   > @tdunning <https://github.com/tdunning>, that would be great thanks. Here
   > the selectivity of a range predicate on an integer column is estimated from
   > an a set of quantile estimates, here termed an equidepth histogram (see
   > NumericEquiDepthHistogram
   > <https://github.com/apache/drill/blob/4aefcef2b665c5737471664912a26ef6ed9a6cfc/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/NumericEquiDepthHistogram.java>).
   > The quantile estimates are recovered from a MergingDigest over the integer
   > column that was created by an earlier ANALYSE TABLE command. If said
   > quantile estimates have shifted then that would explain what we're seeing...
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/drill/pull/2432#issuecomment-1022065612>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAB5E6WPTDXIWKPQKL6TAK3UX7EBPANCNFSM5MM2LDFQ>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795532467



##########
File path: contrib/storage-jdbc/src/test/java/org/apache/drill/exec/store/jdbc/TestJdbcWriterWithH2.java
##########
@@ -142,14 +142,14 @@ public void testBasicCTASWithDataTypes() throws Exception {
     DirectRowSet results = queryBuilder().sql(testQuery).rowSet();
 
     TupleMetadata expectedSchema = new SchemaBuilder()
-      .addNullable("int_field", MinorType.INT, 10)
-      .addNullable("bigint_field", MinorType.BIGINT, 19)
-      .addNullable("float4_field", MinorType.VARDECIMAL, 38, 37)
-      .addNullable("float8_field", MinorType.VARDECIMAL, 38, 37)
+      .addNullable("int_field", MinorType.INT, 32)
+      .addNullable("bigint_field", MinorType.BIGINT, 38)
+      .addNullable("float4_field", MinorType.FLOAT4, 38)
+      .addNullable("float8_field", MinorType.FLOAT8, 38)

Review comment:
       @vdiravka no - this change is not specific to H2.  I noticed while debugging H2 test failures that the new JDBC writer code maps Drill's FLOAT4 AND FLOAT8 to the JDBC data type NUMERIC (a vardecimal type).  Apparently the idea was to try to avoid inconsistent float support across databases, but I don't think this is a good idea.  Vardecimal columns have a fixed precision and scale, making them fixed point rather floating point, and introducing the potential for data conversion problems.
   
   For example, in `TestJdbcWriterWithH2#testBasicCTASWithDataTypes` if you just change a test value by one decimal place the unit test fails:
   ```
   --- "CAST(3.0 AS FLOAT) AS float4_field," +
   +++ "CAST(30.0 AS FLOAT) AS float4_field," +
   ```
   because it has tried to convert a FLOAT4 to a DECIMAL(38,37) which can only represents numbers smaller than 10.  Since I found this while trying to upgrade H2, I decided to try to fix it too.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795058531



##########
File path: exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestAnalyze.java
##########
@@ -492,7 +492,7 @@ public void testHistogramWithDataTypes1() throws Exception {
           .match();
 
       query = "select 1 from dfs.tmp.employee1 where store_id < 15";
-      String[] expectedPlan2 = {"Filter\\(condition.*\\).*rowcount = 676.*,.*",
+      String[] expectedPlan2 = {"Filter\\(condition.*\\).*rowcount = 699.*,.*",
               "Scan.*columns=\\[`store_id`\\].*rowcount = 1128.0.*"};

Review comment:
       @vdiravka yes, please see the comments on this PR from @tdunning.  The t-digest data structure changed a bit in 3.3 and estimates like this rowcount = 676 are expected to be a bit more accurate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1024924548


   > After exluding libs from jdbc-all it is necessary to check Drill jdbc driver within any jdbc client, such as SQuirrel. There are no unit tests for that
   
   @vdiravka thanks for this pointer. I configured DBeaver to use the JDBC driver built by this branch and was able to query a local Drillbit with it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] tdunning commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
tdunning commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1023645917


   The best general default is the t-digest default. Even when you don't think
   you care about tail accuracy, it can creep in.
   
   Furthermore, the loss of accuracy near the median is something like 2:1.
   The difference in accuracy near the tails, however, can be 1,000,000:1.
   
   So you don't lose much with the default setting and you gain a ton (in the
   right circumstances).
   
   
   
   On Thu, Jan 27, 2022 at 10:38 AM James Turton ***@***.***>
   wrote:
   
   > @tdunning <https://github.com/tdunning> thank you for clarifying. I speak
   > under correction but the application of the t-digest here will be to help
   > the query planner to estimate relation cardinalities, those being
   > statistics which it will then use to decide e.g. which of two relations
   > should be the "build side" of a hash join. Optimally, the smaller relation
   > goes on the build side in that example.
   >
   > To estimate a relation's cardinality after a range predicate has been
   > applied, the planner can each for quantiles that with luck were precomputed
   > over the relevant column. I can't see a reason why real-world range
   > predicates would be more focused on the tails of distributions, the shapes
   > of which we'd probably like to assume nothing about anyway.
   >
   > My take is that probably any t-digest scale factor is quite accurate
   > enough for a query planner, but of them all I'd agree that K_0 is probably
   > the best choice under the assumptions here. We can now either hard code
   > Drill to K_0 or default it to K_0 and add a config option so that users can
   > override. I'm not convinced the config option will justify its inclusion
   > because it is one that users would need to vary from table to table
   > depending on specifics of the data there. So I'm inclined to just hard code
   > a good general default.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/drill/pull/2432#issuecomment-1023528517>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAB5E6VCK4NTTOXTLWM4ADLUYGGK7ANCNFSM5MM2LDFQ>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795533511



##########
File path: contrib/storage-jdbc/src/test/java/org/apache/drill/exec/store/jdbc/TestJdbcWriterWithH2.java
##########
@@ -142,14 +142,14 @@ public void testBasicCTASWithDataTypes() throws Exception {
     DirectRowSet results = queryBuilder().sql(testQuery).rowSet();
 
     TupleMetadata expectedSchema = new SchemaBuilder()
-      .addNullable("int_field", MinorType.INT, 10)
-      .addNullable("bigint_field", MinorType.BIGINT, 19)
-      .addNullable("float4_field", MinorType.VARDECIMAL, 38, 37)
-      .addNullable("float8_field", MinorType.VARDECIMAL, 38, 37)
+      .addNullable("int_field", MinorType.INT, 32)
+      .addNullable("bigint_field", MinorType.BIGINT, 38)
+      .addNullable("float4_field", MinorType.FLOAT4, 38)
+      .addNullable("float8_field", MinorType.FLOAT8, 38)

Review comment:
       @vdiravka I must still go and make the same test change in the other JDBC writer tests...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795055265



##########
File path: pom.xml
##########
@@ -98,8 +98,8 @@
     <javax.validation.api>2.0.1.Final</javax.validation.api>
     <asm.version>9.2</asm.version>
     <excludedGroups />
-    <memoryMb>1800</memoryMb>
-    <directMemoryMb>3000</directMemoryMb>
+    <memoryMb>2000</memoryMb>
+    <directMemoryMb>2500</directMemoryMb>

Review comment:
       Not sure 2500 is enough. Drill can use more memory. If you will run several times tests with this configs sucesfully - it is fine. Otherwise try to decrease slightly - 100-200Mb




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795453094



##########
File path: distribution/pom.xml
##########
@@ -199,28 +199,30 @@
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64-fedora</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>osx-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <!-- bump warning: windows-x86_64 jars apparently stopped being published
+      after 2.0.36, see https://repo1.maven.org/maven2/io/netty/netty-tcnative/ -->
+      <version>2.0.36.Final</version>

Review comment:
       you are right, `2.0.48.Final` for windows is missing. But looks like it is present for [2.0.46.Final](https://repo1.maven.org/maven2/io/netty/netty-tcnative-boringssl-static/2.0.46.Final/)

##########
File path: distribution/pom.xml
##########
@@ -199,28 +199,30 @@
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64-fedora</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>osx-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <!-- bump warning: windows-x86_64 jars apparently stopped being published
+      after 2.0.36, see https://repo1.maven.org/maven2/io/netty/netty-tcnative/ -->
+      <version>2.0.36.Final</version>

Review comment:
       I see all versions pom file has different sections related to windows, but you are right specific `windows-x86_64` jar is absent (only general, without OS name).
   Let's leave `2.0.36.Final` for now, since it is not very important - `<scope>test</scope>`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r789452648



##########
File path: pom.xml
##########
@@ -104,7 +104,7 @@
     <license.skip>true</license.skip>
     <docker.repository>apache/drill</docker.repository>
     <antlr.version>4.8-1</antlr.version>
-    <maven.version>3.6.3</maven.version>
+    <maven.version>3.8.4</maven.version>

Review comment:
       @vdiravka we have set GitHub ci to use `ubuntu-latest` which at the moments means Ubuntu 20.04 which has a Maven package version of 3.6.3-1.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] tdunning commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
tdunning commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1018676376


   The meaning of the compression parameter became a bit more strict in this
   latest version. As such, the digest should be a bit smaller and tails
   should be a bit more accurate. It is OK to increase the compression
   parameter if you want, but it has to be recognized that the results are
   approximate and a bit randomized.
   
   Can you describe a bit more about what the test is looking at?  I could
   comment more specifically about what to expect if you like.
   
   
   
   On Fri, Jan 21, 2022 at 3:55 AM James Turton ***@***.***>
   wrote:
   
   > Hi @tdunning <https://github.com/tdunning>. This PR includes an update of
   > t-digest from 3.2 to 3.3 and I think that has caused a unit test of
   > metastore statistics to fail with
   >
   > Error:  Failures:
   > Error:    TestAnalyze.testHistogramWithDataTypes1:501 Did not find expected pattern in plan: Filter\(condition.*\).*rowcount = 676.*,.*
   >
   > , where what's changed is the row count estimate for the results of an
   > integer inequality predicate store_id < 15 which has gone from 676 to
   > 699. Have some approximate stats computed by t-digest shifted in this
   > version, e.g. an accuracy improvement? If so, and this is expected, then
   > I'll update the unit test's expectations accordingly...
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/drill/pull/2432#issuecomment-1018440761>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAB5E6SSUDJZ52QF6OIMLK3UXFCTLANCNFSM5MM2LDFQ>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1018440761


   Hi @tdunning.  This PR includes an update of t-digest from 3.2 to 3.3 and I think that has caused a unit test of metastore statistics to fail with
   ```
   Error:  Failures: 
   Error:    TestAnalyze.testHistogramWithDataTypes1:501 Did not find expected pattern in plan: Filter\(condition.*\).*rowcount = 676.*,.*
   ```
   , where what's changed is the row count estimate for the results of an integer inequality predicate `store_id < 15` which has gone from 676 to 699.  Have some approximate stats computed by t-digest shifted in this version, e.g. an accuracy improvement?  If so, and this is expected, then I'll update the unit test's expectations accordingly...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r797394342



##########
File path: pom.xml
##########
@@ -92,20 +92,21 @@
     <javassist.version>3.28.0-GA</javassist.version>
     <msgpack.version>0.6.6</msgpack.version>
     <reflections.version>0.9.10</reflections.version>
-    <avro.version>1.9.1</avro.version>
+    <avro.version>1.11.0</avro.version>
     <metrics.version>4.0.2</metrics.version>
-    <jetty.version>9.4.41.v20210516</jetty.version>
+    <jetty.version>9.4.44.v20210927</jetty.version>
     <jersey.version>2.34</jersey.version>
     <javax.validation.api>2.0.1.Final</javax.validation.api>
     <asm.version>9.2</asm.version>
     <excludedGroups />
-    <memoryMb>1800</memoryMb>
-    <directMemoryMb>3000</directMemoryMb>
+    <memoryMb>2000</memoryMb>
+    <directMemoryMb>2700</directMemoryMb>
     <rat.skip>true</rat.skip>
     <license.skip>true</license.skip>
     <docker.repository>apache/drill</docker.repository>
     <antlr.version>4.8-1</antlr.version>
-    <maven.version>3.6.3</maven.version>
+    <maven.version>3.8.4</maven.version>
+    <maven.version.min>3.6.3</maven.version.min>

Review comment:
       I am ok with either of variants. Just want to confirm this was not resolved accidentally




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton edited a comment on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton edited a comment on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1023528517


   @tdunning thank you for clarifying.  I speak under correction but the application of the t-digest here will be to help the query planner to estimate relation cardinalities, those being statistics which it will then use to decide e.g. which of two relations should be the "build side" of a hash join.  Optimally, the smaller relation goes on the build side in that example.
   
   To estimate a relation's cardinality after a range predicate has been applied, the planner can reach for quantiles that with luck were precomputed over the relevant column.  I can't see a reason why real-world range predicates would in general be more focused on the tails of distributions, the shapes of which we'd probably like to assume nothing about anyway.
   
   My take is that probably any t-digest scale factor is quite accurate enough for a query planner, but of them all I'd agree that K_0 is probably the best choice under the assumptions here.  We can now either hard code Drill to K_0 or default it to K_0 and add a config option so that users can override.  I'm not convinced the config option will justify its inclusion because it is one that users would need to vary from table to table depending on specifics of the data there.  So I'm inclined to just hard code a good general default.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795057164



##########
File path: pom.xml
##########
@@ -98,8 +98,8 @@
     <javax.validation.api>2.0.1.Final</javax.validation.api>
     <asm.version>9.2</asm.version>
     <excludedGroups />
-    <memoryMb>1800</memoryMb>
-    <directMemoryMb>3000</directMemoryMb>
+    <memoryMb>2000</memoryMb>
+    <directMemoryMb>2500</directMemoryMb>

Review comment:
       @vdiravka do you mean _increase_ slightly?  So heap = 2000, direct = 2700?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r794482944



##########
File path: distribution/pom.xml
##########
@@ -199,28 +199,30 @@
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64-fedora</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>osx-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <!-- bump warning: windows-x86_64 jars apparently stopped being published
+      after 2.0.36, see https://repo1.maven.org/maven2/io/netty/netty-tcnative/ -->
+      <version>2.0.36.Final</version>

Review comment:
       Does this mean that Drill will not work on Windows?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795053250



##########
File path: pom.xml
##########
@@ -1802,7 +1803,7 @@
       <dependency>
         <groupId>io.netty</groupId>
         <artifactId>netty-tcnative</artifactId>
-        <version>2.0.39.Final</version>
+        <version>2.0.47.Final</version>

Review comment:
       48 is available now

##########
File path: pom.xml
##########
@@ -104,7 +104,7 @@
     <license.skip>true</license.skip>
     <docker.repository>apache/drill</docker.repository>
     <antlr.version>4.8-1</antlr.version>
-    <maven.version>3.6.3</maven.version>
+    <maven.version>3.8.4</maven.version>

Review comment:
       Could you update the correspondent maven.md doc?

##########
File path: exec/jdbc-all/pom.xml
##########
@@ -33,7 +33,7 @@
        "package.namespace.prefix" equals to "oadd.". It can be overridden if necessary within any profile -->
   <properties>
     <package.namespace.prefix>oadd.</package.namespace.prefix>
-    <jdbc-all-jar.maxsize>46700000</jdbc-all-jar.maxsize>
+    <jdbc-all-jar.maxsize>44100000</jdbc-all-jar.maxsize>

Review comment:
       gj

##########
File path: distribution/pom.xml
##########
@@ -199,28 +199,30 @@
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>linux-x86_64-fedora</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <version>2.0.47.Final</version>
       <classifier>osx-x86_64</classifier>
       <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>io.netty</groupId>
       <artifactId>netty-tcnative</artifactId>
-      <version>2.0.1.Final</version>
+      <!-- bump warning: windows-x86_64 jars apparently stopped being published
+      after 2.0.36, see https://repo1.maven.org/maven2/io/netty/netty-tcnative/ -->
+      <version>2.0.36.Final</version>

Review comment:
       What about 2.0.48.Final ? https://mvnrepository.com/artifact/io.netty/netty-tcnative/2.0.48.Final

##########
File path: exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestAnalyze.java
##########
@@ -492,7 +492,7 @@ public void testHistogramWithDataTypes1() throws Exception {
           .match();
 
       query = "select 1 from dfs.tmp.employee1 where store_id < 15";
-      String[] expectedPlan2 = {"Filter\\(condition.*\\).*rowcount = 676.*,.*",
+      String[] expectedPlan2 = {"Filter\\(condition.*\\).*rowcount = 699.*,.*",
               "Scan.*columns=\\[`store_id`\\].*rowcount = 1128.0.*"};

Review comment:
       Do you know why rowcount is changed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r795054838



##########
File path: exec/jdbc-all/pom.xml
##########
@@ -33,7 +33,7 @@
        "package.namespace.prefix" equals to "oadd.". It can be overridden if necessary within any profile -->
   <properties>
     <package.namespace.prefix>oadd.</package.namespace.prefix>
-    <jdbc-all-jar.maxsize>46700000</jdbc-all-jar.maxsize>
+    <jdbc-all-jar.maxsize>46600000</jdbc-all-jar.maxsize>

Review comment:
       Previous value looked better 😁




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r788851627



##########
File path: pom.xml
##########
@@ -104,7 +104,7 @@
     <license.skip>true</license.skip>
     <docker.repository>apache/drill</docker.repository>
     <antlr.version>4.8-1</antlr.version>
-    <maven.version>3.6.3</maven.version>
+    <maven.version>3.8.4</maven.version>

Review comment:
       Could you try to build the project with this version? Is that used for current CI builds?

##########
File path: contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpScanBatchCreator.java
##########
@@ -161,8 +161,8 @@ public void bind(ManagedScanFramework framework) { }
         }
       } else if (paginatorConfig != null) {
         /*
-        * If the paginator is not null and generated a list of URLs, we create
-        * a new batch reader for each URL.  In the future, this could be parallelized in
+        * If the paginator is not null we create a new batch reader for each
+        * URL that it generates.  In the future, this could be parallelized in

Review comment:
       ```suggestion
           * URL that it generates. In the future, this could be parallelized in
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on a change in pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on a change in pull request #2432:
URL: https://github.com/apache/drill/pull/2432#discussion_r789452648



##########
File path: pom.xml
##########
@@ -104,7 +104,7 @@
     <license.skip>true</license.skip>
     <docker.repository>apache/drill</docker.repository>
     <antlr.version>4.8-1</antlr.version>
-    <maven.version>3.6.3</maven.version>
+    <maven.version>3.8.4</maven.version>

Review comment:
       @vdiravka we have set GitHub ci to use `ubuntu-latest` which at the moment means Ubuntu 20.04 which has a Maven package version of 3.6.3-1.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] tdunning commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
tdunning commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1023413118


   Oops, @James Turton ***@***.***>
   
   Good catch. My language was very ambiguous.
   
   What I meant was estimation of the value of the median of the original data.
   
   The issue is that t-digest is focused on estimation of the value of
   quantiles (which are just generalized percentiles) that are near the tails
   of the distribution. Because of this focus, the algorithm is designed to
   estimate things like the 0.1%-ile or 99.9%-ile very accurately, but to
   estimate the 50th %-ile is lower priority to t-digest so it allocates less
   of its internal information budget to that estimate.
   
   That prioritization can be adjusted by something called the scale function.
   The default is to use a scale function called K_2 which puts a very large
   emphasis on the tails. There is an alternative called K_0 which puts even
   emphasis on all estimates.
   
   The effect of K_2 is that estimates of the 50th %-ile might be ±1 % while
   the estimate of the 99.99th %-ile would be ±0.002 %. Tha makes sense some
   days for things like latencies.
   
   My impression is that K_0 might be more appropriate because we could get
   something like ±0.2% from 0%-ile to 100%-ile.
   
   The specific values here can all be adjusted by setting the compression
   parameter.
   
   I can help with any of this if somebody tells me the purpose of the
   t-digest.
   
   
   
   
   On Thu, Jan 27, 2022 at 5:21 AM James Turton ***@***.***>
   wrote:
   
   > @tdunning <https://github.com/tdunning> when you say "median accuracy"
   > there, do you mean median estimate error across the entire distribution, or
   > do you mean estimate error specifically near the distribution's median? Do
   > you think that in the case of Drill we'd make no assumptions about user
   > data distributions, and opt for a t-digest scale factor that optimises an
   > *overall* accuracy statistic rather than one that targets any specific
   > area (e.g. tails, median). PS I'm waving my hands a bit here, I don't know
   > if what I've said is well defined...
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/drill/pull/2432#issuecomment-1023204744>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAB5E6TSAIXX4HTKXZAGJPTUYFBGJANCNFSM5MM2LDFQ>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1023528517


   @tdunning thank you for clarifying.  I speak under correction but the application of the t-digest here will be to help the query planner to estimate relation cardinalities, those being statistics which it will then use to decide e.g. which of two relations should be the "build side" of a hash join.  Optimally, the smaller relation goes on the build side in that example.
   
   To estimate a relation's cardinality after a range predicate has been applied, the planner can each for quantiles that with luck were precomputed over the relevant column.  I can't see a reason why real-world range predicates would be more focused on the tails of distributions, the shapes of which we'd probably like to assume nothing about anyway.
   
   My take is that probably any t-digest scale factor is quite accurate enough for a query planner, but of them all I'd agree that K_0 is probably the best choice under the assumptions here.  We can now either hard code Drill to K_0 or default it to K_0 and add a config option so that users can override.  I'm not convinced the config option will justify its inclusion because it is one that users would need to vary from table to table depending on specifics of the data there.  So I'm inclined to just hard code a good general default.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton edited a comment on pull request #2432: DRILL-7994: Dependency version updates for severe vulnerabilities

Posted by GitBox <gi...@apache.org>.
jnturton edited a comment on pull request #2432:
URL: https://github.com/apache/drill/pull/2432#issuecomment-1022065612


   @tdunning, that would be great thanks.  Here the selectivity of a range predicate on an integer column is estimated using a set of quantile estimates, here termed an equidepth histogram (see [NumericEquiDepthHistogram](https://github.com/apache/drill/blob/4aefcef2b665c5737471664912a26ef6ed9a6cfc/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/NumericEquiDepthHistogram.java)).  The quantile estimates are recovered from a MergingDigest over the integer column that was created by an earlier `ANALYSE TABLE` command.  If said quantile estimates have shifted then that would explain what we're seeing...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org