You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/18 01:10:37 UTC

[GitHub] [iceberg] kbendick edited a comment on pull request #2069: API: handle NaN as min/max stats in evaluators

kbendick edited a comment on pull request #2069:
URL: https://github.com/apache/iceberg/pull/2069#issuecomment-761919569


   For ORC metrics, which currently don't populate `nanValueCounts` in the metrics and for which it seems difficult to populate `nanValueCounts`, will this still work when we create `Metrics` using the non-deprecated constructors that require passing `nanValueCounts`? This is in reference to this issue for removing the `Metrics` constructors that don't take in `nanValueCounts` maps: https://github.com/apache/iceberg/issues/2044
   
   I suppose when we go to deprecate the old `Metrics` constructors, we can simply pass `null` into the `Metrics` constructor for `nanValueCounts` with ORC in order to keep the current behavior. Or update the various evaluator code paths to ensure that a missing key from this map does not indicate anything special if we choose to pass in an empty map.
   
   This is all based off of my understanding of ORC's `ColumnStatistics` and the notes in this PR about `upperBound ` and `lowerBound` being `NaN` in ORC if _any_ value in the file is `NaN`, so please correct me if I'm mistaken and somebody sees an easy way to get NaN stats from an ORC file. To me, I fail to see how we can get the count of NaN values in the following code block (though as I mentioned we can always continue to pass in `null` to match the existing behavior, provided the various metrics evaluators are aware that `null` for that Map entry for nullValueCounts doesnt convey much meaning for ORC).
   
   The relevant code for populating ORC metrics is here: https://github.com/apache/iceberg/blob/29915e3522f9a808420a2e16fb1c2cf3ae165a74/orc/src/main/java/org/apache/iceberg/orc/OrcMetrics.java#L100-L172


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org