You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/12/21 20:58:34 UTC

[GitHub] [pinot] agavra opened a new pull request, #10021: support SKEW_POP and KURTOSIS_POP aggregates

agavra opened a new pull request, #10021:
URL: https://github.com/apache/pinot/pull/10021

   See commit message 😄 - uses the apache commons implementation as the internal format, if we are worried about performance we can copy the code inline but that would add a lot of error-prone mathematical code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] agavra commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
agavra commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1362237602

   I won't inline - benchmark results from inlining everything:
   ```
   Benchmark                       Mode  Cnt  Score    Error  Units
   FourthMomentBenchmark.apacheFM  avgt    5  0.056 ±  0.001  ms/op
   FourthMomentBenchmark.nativeFM  avgt    5  0.056 ±  0.002  ms/op
   ```
   
   ```
   public class NativeFourthMoment {
   
     private long _n = 0;
     private double _m1 = Double.NaN;
     private double _m2 = Double.NaN;
     private double _m3 = Double.NaN;
     private double _m4 = Double.NaN;
   
     public void increment(double d) {
       if (_n == 0) {
         _m4 = 0d;
         _m3 = 0d;
         _m2 = 0d;
         _m1 = 0d;
       }
   
       _n++;
   
       double n = _n; // prevent overflow
       double dev = d - _m1;
       double nDev = dev / n;
       double nDevSq = nDev * nDev;
   
       _m4 = _m4 - 4.0 * nDev * _m3
           + 6.0 * nDevSq * _m2
           + ((n * n) - 3 * (n - 1)) * (nDevSq * nDevSq * (n - 1) * n);
   
       _m3 = _m3 - 3.0 * nDev * _m2
           + (n - 1) * (n - 2) * nDevSq * dev;
   
       _m2 += (n - 1) * dev * nDev;
   
       _m1 += nDev;
     }
   
     public double kurtosis() {
       double kurtosis = Double.NaN;
       if (_n > 3) {
         double variance = _m2 / (_n - 1);
         if (variance < 10E-20) {
           kurtosis = 0.0;
         } else {
           double n = _n;
           kurtosis =
               (n * (n + 1) * _m4 -
                   3 * _m2 * _m2 * (n - 1)) /
                   ((n - 1) * (n -2) * (n -3) * variance * variance);
         }
       }
       return kurtosis;
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] agavra commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
agavra commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1362220879

   @snleee 
   
   > Can we also add SKEWSAMP/KURTOSISSAMP if it's just the matter of dividing by n-1?
   
   I checked whether this is common or not, and it seems that sample skew/sample kurtosis aren't really used (e.g. presto doesn't define those two functions) so I'll go ahead and just rename the functions `SKEW` and `KURTOSIS`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on a diff in pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
snleee commented on code in PR #10021:
URL: https://github.com/apache/pinot/pull/10021#discussion_r1055909204


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/FourthMomentAggregationFunction.java:
##########
@@ -0,0 +1,167 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+

Review Comment:
   remove line?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/customobject/PinotFourthMoment.java:
##########
@@ -0,0 +1,125 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.customobject;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+import org.apache.commons.math.stat.descriptive.moment.FourthMoment;
+import org.apache.commons.math.stat.descriptive.moment.Kurtosis;
+import org.apache.commons.math.stat.descriptive.moment.Skewness;
+
+
+/**
+ * A {@link Comparable} implementation of

Review Comment:
   Can you add the descriptions? e.g. which algorithm/library that we are depending on etc



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/customobject/PinotFourthMoment.java:
##########
@@ -0,0 +1,125 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+

Review Comment:
   remove line :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] agavra commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
agavra commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1363444497

   > @agavra Is Serde failing due to our change?
   
   ah yes... JDK8 doesn't infer properly for anonymous classes 😢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
snleee commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1363540283

   @agavra Thank you for the quick turnaround on this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1362168382

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/10021?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#10021](https://codecov.io/gh/apache/pinot/pull/10021?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (49607d7) into [master](https://codecov.io/gh/apache/pinot/commit/c0345035cc96f1d700eff109d69d85fc98f63a5a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c034503) will **increase** coverage by `0.08%`.
   > The diff coverage is `76.61%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #10021      +/-   ##
   ============================================
   + Coverage     68.67%   68.75%   +0.08%     
   - Complexity     5653     5683      +30     
   ============================================
     Files          1991     1994       +3     
     Lines        107251   107462     +211     
     Branches      16302    16330      +28     
   ============================================
   + Hits          73654    73890     +236     
   + Misses        28382    28350      -32     
   - Partials       5215     5222       +7     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `25.08% <2.41%> (+0.05%)` | :arrow_up: |
   | unittests1 | `67.96% <76.61%> (+0.10%)` | :arrow_up: |
   | unittests2 | `13.59% <0.00%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/10021?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...tion/function/FourthMomentAggregationFunction.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Gb3VydGhNb21lbnRBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `67.27% <67.27%> (ø)` | |
   | [...org/apache/pinot/core/common/ObjectSerDeUtils.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vT2JqZWN0U2VyRGVVdGlscy5qYXZh) | `91.36% <75.00%> (-0.08%)` | :arrow_down: |
   | [.../segment/local/customobject/PinotFourthMoment.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9jdXN0b21vYmplY3QvUGlub3RGb3VydGhNb21lbnQuamF2YQ==) | `84.21% <84.21%> (ø)` | |
   | [...gregation/function/AggregationFunctionFactory.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9BZ2dyZWdhdGlvbkZ1bmN0aW9uRmFjdG9yeS5qYXZh) | `81.81% <100.00%> (+0.25%)` | :arrow_up: |
   | [...che/pinot/segment/spi/AggregationFunctionType.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL0FnZ3JlZ2F0aW9uRnVuY3Rpb25UeXBlLmphdmE=) | `90.62% <100.00%> (+0.19%)` | :arrow_up: |
   | [.../pinot/server/api/resources/TableSizeResource.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvYXBpL3Jlc291cmNlcy9UYWJsZVNpemVSZXNvdXJjZS5qYXZh) | `80.00% <0.00%> (-8.00%)` | :arrow_down: |
   | [.../common/request/context/predicate/EqPredicate.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVxdWVzdC9jb250ZXh0L3ByZWRpY2F0ZS9FcVByZWRpY2F0ZS5qYXZh) | `92.30% <0.00%> (-7.70%)` | :arrow_down: |
   | [...mmon/request/context/predicate/NotEqPredicate.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVxdWVzdC9jb250ZXh0L3ByZWRpY2F0ZS9Ob3RFcVByZWRpY2F0ZS5qYXZh) | `76.92% <0.00%> (-7.70%)` | :arrow_down: |
   | [...tream/kafka20/server/KafkaDataServerStartable.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1zdHJlYW0taW5nZXN0aW9uL3Bpbm90LWthZmthLTIuMC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcGx1Z2luL3N0cmVhbS9rYWZrYTIwL3NlcnZlci9LYWZrYURhdGFTZXJ2ZXJTdGFydGFibGUuamF2YQ==) | `72.91% <0.00%> (-6.25%)` | :arrow_down: |
   | [...ion/function/DistinctCountAggregationFunction.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50QWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | `50.22% <0.00%> (-3.14%)` | :arrow_down: |
   | ... and [44 more](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] agavra commented on a diff in pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
agavra commented on code in PR #10021:
URL: https://github.com/apache/pinot/pull/10021#discussion_r1055944895


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/customobject/PinotFourthMoment.java:
##########
@@ -0,0 +1,125 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.customobject;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+import org.apache.commons.math.stat.descriptive.moment.FourthMoment;
+import org.apache.commons.math.stat.descriptive.moment.Kurtosis;
+import org.apache.commons.math.stat.descriptive.moment.Skewness;
+
+
+/**
+ * A {@link Comparable} implementation of

Review Comment:
   oops! I was in the middle of writing it (clearly) and then forgot all about it...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
snleee commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1363411142

   @agavra Is Serde failing due to our change? 
   
   ```
   [INFO] -------------------------------------------------------------
   Error:  /home/runner/work/pinot/pinot/pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:[490,104] error: cannot infer type arguments for ObjectSerDe<T>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee merged pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates

Posted by GitBox <gi...@apache.org>.
snleee merged PR #10021:
URL: https://github.com/apache/pinot/pull/10021


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org