You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "RussellSpitzer (via GitHub)" <gi...@apache.org> on 2023/10/17 14:27:55 UTC

[I] Improve `All` Metadata Tables with Snapshot Information [iceberg]

RussellSpitzer opened a new issue, #8856:
URL: https://github.com/apache/iceberg/issues/8856

   ### Feature Request / Improvement
   
   Currently all versions of metadata tables have the exact same schema as their not "all" versions. This is actually not very useful if you are attempting to locate the state of a particular entry at a specific time because the `snapshot_id` always just shows the file's original snapshot.
   
   For example the entries table looks like
   
   ```
   scala> spark.sql("SELECT * FROM db.timezoned.entries").show
   warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`
   +------+-------------------+---------------+--------------------+--------------------+--------------------+
   |status|        snapshot_id|sequence_number|file_sequence_number|           data_file|    readable_metrics|
   +------+-------------------+---------------+--------------------+--------------------+--------------------+
   |     1|6561920950175488866|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     1|5535987506380389562|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     1|2517256618694516958|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     1|2750236691316126600|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     1|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   ```
   
   And `all_entries` looks like
   
   ```
   scala> spark.sql("SELECT * FROM db.timezoned.all_entries").show
   warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`
   +------+-------------------+---------------+--------------------+--------------------+--------------------+
   |status|        snapshot_id|sequence_number|file_sequence_number|           data_file|    readable_metrics|
   +------+-------------------+---------------+--------------------+--------------------+--------------------+
   |     1|2517256618694516958|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     1|6561920950175488866|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     1|2750236691316126600|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     1|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     1|5535987506380389562|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     0|6561920950175488866|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     0|5535987506380389562|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     0|2517256618694516958|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     0|2750236691316126600|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |     0|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   +------+-------------------+---------------+--------------------+--------------------+--------------------+
   ````
   
   By looking at all_entries it is impossible for me to determine whether or not all the manifests were rewritten at once, or if they were rewritten in groups. Ideally we would see something like 
   
   ```
   +-------------------+-------------+------+-------------------+---------------+--------------------+--------------------+--------------------+
   |              as_of|         time|status|        snapshot_id|sequence_number|file_sequence_number|           data_file|    readable_metrics|
   +-------------------+-------------+------+-------------------+---------------+--------------------+--------------------+--------------------+
   |7179885233531513409|1697493267302|     1|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2750236691316126600|1697493268363|     1|2750236691316126600|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2750236691316126600|1697493268363|     1|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2517256618694516958|1697493269568|     1|2517256618694516958|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2517256618694516958|1697493269568|     1|2750236691316126600|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2517256618694516958|1697493269568|     1|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |5535987506380389562|1697493270419|     1|5535987506380389562|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |5535987506380389562|1697493270419|     1|2517256618694516958|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |5535987506380389562|1697493270419|     1|2750236691316126600|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |5535987506380389562|1697493270419|     1|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |6561920950175488866|1697493271193|     1|6561920950175488866|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |6561920950175488866|1697493271193|     1|5535987506380389562|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |6561920950175488866|1697493271193|     1|2517256618694516958|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |6561920950175488866|1697493271193|     1|2750236691316126600|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |6561920950175488866|1697493271193|     1|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2315221993819944328|1697497673027|     1|6561920950175488866|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2315221993819944328|1697497673027|     1|5535987506380389562|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2315221993819944328|1697497673027|     1|2517256618694516958|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2315221993819944328|1697497673027|     1|2750236691316126600|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   |2315221993819944328|1697497673027|     1|7179885233531513409|              0|                   0|{0, /Users/russel...|{{51, 1, 0, null,...|
   +-------------------+-------------+------+-------------------+---------------+--------------------+--------------------+--------------------+
   ```
   
   Apologies that my code is not getting status status correctly but basically the idea being that we add on two columns (names are still up for debate)
   "as_of_snapshot", "as_of_time"
   
   These would allow us to be able to analyze the actual history in all_entires and all_manifests
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Improve `All` Metadata Tables with Snapshot Information [iceberg]

Posted by "nk1506 (via GitHub)" <gi...@apache.org>.
nk1506 commented on issue #8856:
URL: https://github.com/apache/iceberg/issues/8856#issuecomment-1791915604

   @RussellSpitzer , Is it fine, If I will continue on this ? If so Please assign this to me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org