You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/18 08:06:18 UTC

[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1319670877

   > the code changes themselves LGTM but I'm still not sure that this is how we'd want to represent **dimensions** in metrics as this doesn't really scale to add a new metric field for each new **dimension** (where the file format is the dimension we use here).
   > 
   > I think it would be interesting to explore we we could actually represent such dimensions in a more natural and scalable way.
   
   Let me give an update on this: At first I tried to introduce a Map<FileFormat, Counter> for this dimensional metric in ScanMetrics, but for me to make this work I ended up writing code to convert this into MAP<FileFormat, CounterResult> plus in ScanMetricsResultParser.fromJson() we still need to have the name of each filed (including the file format name) that isn't generic unfortunately. This approach seemed overcomplicated and still not 100% generic to handle automatically e.g. adding a new dimension.
   I'll give it a try to create something like a MultiDimensionCounter and let's see if it turns out better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org