You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/26 00:26:10 UTC

[GitHub] [iceberg] kbendick opened a new issue #2635: Bug in emitted ScanEvent in BaseAllMetadataTableScan

kbendick opened a new issue #2635:
URL: https://github.com/apache/iceberg/issues/2635


   BaseAllMetadataTableScan emits a `ScanEvent` during its `planFiles` phase.
   
   However, the `ScanEvent` uses the `tableName` of the table to which the metadata corresponds, and not the actual metadata table.
   
   For example, for a table `foo`, if one does a query on the table `foo.all_data_files`, the `ScanEvent` is going to be emitted with `foo` as the table name.
   
   This is likely causing incorrect data anywhere that `ScanEvent is being used (which might for example include people's custom observability data - I don't see anywhere that these are being used outside of tests, so I assume that it is for somebody's internal observability or something).
   
   This will occur for anybody who is tracking the `ScanEvent` performed on most of the `All*MetadataTableScans`.
   
   The line in question is here: https://github.com/apache/iceberg/blob/7a048fc3aba2a5395a1b36b729e3bdf8597fb38b/core/src/main/java/org/apache/iceberg/BaseAllMetadataTableScan.java#L42
   
   Unfortunately, the `table` that is passed in is a reference to the actual table itself, and not the metadata table.
   
   I have tested this in unit tests, as I am refactoring to provide better error messages here https://github.com/apache/iceberg/pull/2617, and indeed the call to `table().name()` does not result in the metadata table name.
   
   I have an open PR where I am refactoring to make the name of the table that is actually being scanned available in that class (for this issue https://github.com/apache/iceberg/issues/2599), so we can tackle this after that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org