You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/12/21 20:58:34 UTC
[GitHub] [pinot] agavra opened a new pull request, #10021: support SKEW_POP and KURTOSIS_POP aggregates
agavra opened a new pull request, #10021:
URL: https://github.com/apache/pinot/pull/10021
See commit message 😄 - uses the apache commons implementation as the internal format, if we are worried about performance we can copy the code inline but that would add a lot of error-prone mathematical code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] agavra commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
agavra commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1362237602
I won't inline - benchmark results from inlining everything:
```
Benchmark Mode Cnt Score Error Units
FourthMomentBenchmark.apacheFM avgt 5 0.056 ± 0.001 ms/op
FourthMomentBenchmark.nativeFM avgt 5 0.056 ± 0.002 ms/op
```
```
public class NativeFourthMoment {
private long _n = 0;
private double _m1 = Double.NaN;
private double _m2 = Double.NaN;
private double _m3 = Double.NaN;
private double _m4 = Double.NaN;
public void increment(double d) {
if (_n == 0) {
_m4 = 0d;
_m3 = 0d;
_m2 = 0d;
_m1 = 0d;
}
_n++;
double n = _n; // prevent overflow
double dev = d - _m1;
double nDev = dev / n;
double nDevSq = nDev * nDev;
_m4 = _m4 - 4.0 * nDev * _m3
+ 6.0 * nDevSq * _m2
+ ((n * n) - 3 * (n - 1)) * (nDevSq * nDevSq * (n - 1) * n);
_m3 = _m3 - 3.0 * nDev * _m2
+ (n - 1) * (n - 2) * nDevSq * dev;
_m2 += (n - 1) * dev * nDev;
_m1 += nDev;
}
public double kurtosis() {
double kurtosis = Double.NaN;
if (_n > 3) {
double variance = _m2 / (_n - 1);
if (variance < 10E-20) {
kurtosis = 0.0;
} else {
double n = _n;
kurtosis =
(n * (n + 1) * _m4 -
3 * _m2 * _m2 * (n - 1)) /
((n - 1) * (n -2) * (n -3) * variance * variance);
}
}
return kurtosis;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] agavra commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
agavra commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1362220879
@snleee
> Can we also add SKEWSAMP/KURTOSISSAMP if it's just the matter of dividing by n-1?
I checked whether this is common or not, and it seems that sample skew/sample kurtosis aren't really used (e.g. presto doesn't define those two functions) so I'll go ahead and just rename the functions `SKEW` and `KURTOSIS`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] snleee commented on a diff in pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
snleee commented on code in PR #10021:
URL: https://github.com/apache/pinot/pull/10021#discussion_r1055909204
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/FourthMomentAggregationFunction.java:
##########
@@ -0,0 +1,167 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
Review Comment:
remove line?
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/customobject/PinotFourthMoment.java:
##########
@@ -0,0 +1,125 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.customobject;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+import org.apache.commons.math.stat.descriptive.moment.FourthMoment;
+import org.apache.commons.math.stat.descriptive.moment.Kurtosis;
+import org.apache.commons.math.stat.descriptive.moment.Skewness;
+
+
+/**
+ * A {@link Comparable} implementation of
Review Comment:
Can you add the descriptions? e.g. which algorithm/library that we are depending on etc
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/customobject/PinotFourthMoment.java:
##########
@@ -0,0 +1,125 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
Review Comment:
remove line :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] agavra commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
agavra commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1363444497
> @agavra Is Serde failing due to our change?
ah yes... JDK8 doesn't infer properly for anonymous classes 😢
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] snleee commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
snleee commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1363540283
@agavra Thank you for the quick turnaround on this!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1362168382
# [Codecov](https://codecov.io/gh/apache/pinot/pull/10021?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#10021](https://codecov.io/gh/apache/pinot/pull/10021?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (49607d7) into [master](https://codecov.io/gh/apache/pinot/commit/c0345035cc96f1d700eff109d69d85fc98f63a5a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c034503) will **increase** coverage by `0.08%`.
> The diff coverage is `76.61%`.
```diff
@@ Coverage Diff @@
## master #10021 +/- ##
============================================
+ Coverage 68.67% 68.75% +0.08%
- Complexity 5653 5683 +30
============================================
Files 1991 1994 +3
Lines 107251 107462 +211
Branches 16302 16330 +28
============================================
+ Hits 73654 73890 +236
+ Misses 28382 28350 -32
- Partials 5215 5222 +7
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `25.08% <2.41%> (+0.05%)` | :arrow_up: |
| unittests1 | `67.96% <76.61%> (+0.10%)` | :arrow_up: |
| unittests2 | `13.59% <0.00%> (-0.04%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/10021?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...tion/function/FourthMomentAggregationFunction.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9Gb3VydGhNb21lbnRBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `67.27% <67.27%> (ø)` | |
| [...org/apache/pinot/core/common/ObjectSerDeUtils.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vT2JqZWN0U2VyRGVVdGlscy5qYXZh) | `91.36% <75.00%> (-0.08%)` | :arrow_down: |
| [.../segment/local/customobject/PinotFourthMoment.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9jdXN0b21vYmplY3QvUGlub3RGb3VydGhNb21lbnQuamF2YQ==) | `84.21% <84.21%> (ø)` | |
| [...gregation/function/AggregationFunctionFactory.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9BZ2dyZWdhdGlvbkZ1bmN0aW9uRmFjdG9yeS5qYXZh) | `81.81% <100.00%> (+0.25%)` | :arrow_up: |
| [...che/pinot/segment/spi/AggregationFunctionType.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL0FnZ3JlZ2F0aW9uRnVuY3Rpb25UeXBlLmphdmE=) | `90.62% <100.00%> (+0.19%)` | :arrow_up: |
| [.../pinot/server/api/resources/TableSizeResource.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zZXJ2ZXIvYXBpL3Jlc291cmNlcy9UYWJsZVNpemVSZXNvdXJjZS5qYXZh) | `80.00% <0.00%> (-8.00%)` | :arrow_down: |
| [.../common/request/context/predicate/EqPredicate.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVxdWVzdC9jb250ZXh0L3ByZWRpY2F0ZS9FcVByZWRpY2F0ZS5qYXZh) | `92.30% <0.00%> (-7.70%)` | :arrow_down: |
| [...mmon/request/context/predicate/NotEqPredicate.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVxdWVzdC9jb250ZXh0L3ByZWRpY2F0ZS9Ob3RFcVByZWRpY2F0ZS5qYXZh) | `76.92% <0.00%> (-7.70%)` | :arrow_down: |
| [...tream/kafka20/server/KafkaDataServerStartable.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1zdHJlYW0taW5nZXN0aW9uL3Bpbm90LWthZmthLTIuMC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcGx1Z2luL3N0cmVhbS9rYWZrYTIwL3NlcnZlci9LYWZrYURhdGFTZXJ2ZXJTdGFydGFibGUuamF2YQ==) | `72.91% <0.00%> (-6.25%)` | :arrow_down: |
| [...ion/function/DistinctCountAggregationFunction.java](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9EaXN0aW5jdENvdW50QWdncmVnYXRpb25GdW5jdGlvbi5qYXZh) | `50.22% <0.00%> (-3.14%)` | :arrow_down: |
| ... and [44 more](https://codecov.io/gh/apache/pinot/pull/10021/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] agavra commented on a diff in pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
agavra commented on code in PR #10021:
URL: https://github.com/apache/pinot/pull/10021#discussion_r1055944895
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/customobject/PinotFourthMoment.java:
##########
@@ -0,0 +1,125 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.customobject;
+
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+import org.apache.commons.math.stat.descriptive.moment.FourthMoment;
+import org.apache.commons.math.stat.descriptive.moment.Kurtosis;
+import org.apache.commons.math.stat.descriptive.moment.Skewness;
+
+
+/**
+ * A {@link Comparable} implementation of
Review Comment:
oops! I was in the middle of writing it (clearly) and then forgot all about it...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] snleee commented on pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
snleee commented on PR #10021:
URL: https://github.com/apache/pinot/pull/10021#issuecomment-1363411142
@agavra Is Serde failing due to our change?
```
[INFO] -------------------------------------------------------------
Error: /home/runner/work/pinot/pinot/pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:[490,104] error: cannot infer type arguments for ObjectSerDe<T>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] snleee merged pull request #10021: support SKEW_POP and KURTOSIS_POP aggregates
Posted by GitBox <gi...@apache.org>.
snleee merged PR #10021:
URL: https://github.com/apache/pinot/pull/10021
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org