You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/09 20:37:24 UTC

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

jorisvandenbossche edited a comment on pull request #7692:
URL: https://github.com/apache/arrow/pull/7692#issuecomment-656338363


   > I mean it could be called inside the statistics property accessor so that the returned statistics are never None
   
   Ah, I misunderstood. That might also be nice (avoiding the extra method). Now, the problem with that approach, AFAIU, is that right now `fragment.row_groups` is `None` if no statistics are loaded yet. 
   So it would need to happen also when accessing the `fragment.row_groups` property. But since that sometimes can actually already return a list of RowGroupInfo objects with only the id's, that might get a bit complicated (as then several properties would need first ensure metadata are loaded, both the `row_groups` if not yet existing, and then `RowGroupInfo.statistics/num_rows` for RowGroupInfo's that only had the `id` set).
   
   So although the automatic "ensure_metadata" on access of the properties sounds nice, in the end an explicit method to call yourself might be cleaner, I think (also for downstream users like dask)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org