You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/21 20:27:11 UTC

[GitHub] [iceberg] jasonhughes248 opened a new pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

jasonhughes248 opened a new pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622


   while working on additional educational materials for iceberg (will be making them public shortly), I found making a few changes to the table format diagram on https://iceberg.apache.org/spec/#overview helped people better tie the higher level architecture diagram to what they were seeing in the filesystem as operations were made.
   
   changes made to the diagram:
   
   - added iceberg catalog because it's a component of the table format and because it shows the starting point for all operations against that table
   - within the catalog, show visually that there can be multiple tables stored in the catalog, so it shows that you don't need a separate discovery service to find each iceberg table in your environment
       - put db1.table1 to show that this is a sample table name and that the other boxes behind it are additional tables that are in the catalog, but I'm not very partial on keeping vs changing vs removing the sample table name
       - named the reference to the current metadata file in the iceberg catalog "current metadata pointer" because I'm not aware if there is an official term for what this pointer/reference is called. if there is, or if there's a better name someone wants to suggest, I'm happy to change it
   - added metadata file boxes and labels around the snapshot objects because metadata files were missing as a file type and snapshot pointers are within the metadata files, rather than being their own files, like the rest of the other boxes in the diagram
   
   
   I used the png from the site which looks a little more compressed than how I saved it, so there's a little difference between the sharpness of the text and lines of the current content vs the additions. I'm happy to make the changes on the original if you want to share the original chart with me
   
   cc @rymurr 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jasonhughes248 commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
jasonhughes248 commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-853341685


   @rdblue I just updated the point-in-time spec diagram in the PR to do the original changes on the google drawing as well as show both snapshots in the later metadata file and have dividing lines of the metadata vs data layers
   
   I couldn't find a way put the contents/purpose of each file type without doing more harm than good in this intro diagram (either the boxes and therefore the diagram getting way bigger, leaving off some important detail so people come away thinking they know the full scope of the file type but they don't, or it being short but redundant of the file type name)
   
   on the over-time/examples walkthrough, I tried the sql + spec diagram illustrating the example change + fs contents and I think it worked well. you can see it in [this webinar deck](https://docs.google.com/presentation/d/1hfYs6YIaw9H_fD7QEgIlSc_0lo-6YPQY5dt2qXfgY2g/edit?usp=sharing). it's covered in slides 14-19. the slides build so it's best to view those in presentation mode for the full over-time effect. you may notice a couple familiar slides later in the deck there 🙂
   
   I'm working on a written version of this content which should be public in a couple weeks that I'm hoping will be useful for folks who prefer reading this kind of content when looking for an architectural intro about something like iceberg. I'll post that here when it's done, it'd be awesome to get your and others' feedback on it too
   
   what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-866427375


   The updated version looks great!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-870110227


   I just deployed the spec with this update. Thanks, @jasonhughes248!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-846631871


   Thanks, @jasonhughes248! This looks like an improvement. I can share the original drawing with you if you want to edit that instead. There are also a couple of things we may want to change.
   
   First, I just gave a talk with what I think might be a better series of diagrams than the one here. I'd be interested in getting your feedback on that and whether we should replace the doc based on the new ones instead.
   [Iceberg Row-level Updates.pdf](https://github.com/apache/iceberg/files/6528986/Iceberg.Row-level.Updates.pdf)
   
   Second, assuming that we go with the older drawing, I think we also need to show that both snapshots, `s1` and `s2` exist in the same metadata file. One of the biggest misconceptions is that each metadata file points to a single snapshot, but they actually contain a list of all valid snapshots. Aging off snapshots removes older ones from that list and updates the history, then writes a new metadata file.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jasonhughes248 commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
jasonhughes248 commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-847465081


   Sounds good, thanks @rdblue!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jasonhughes248 commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
jasonhughes248 commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-870041970


   thanks @rdblue! 
   
   on the discussion above on a different diagram for the over-time state of the table, I put [this content together](https://www.dremio.com/apache-iceberg-an-architectural-look-under-the-covers/#a-look-under-the-covers-when-cruding) that includes an attempt to explain what happens to an iceberg table's structure as different operations are done on it. it'd be great to get your feedback on that and/or any other sections in the post. there are a few remaining tweaks I want to do to it, so I can certainly incorporate any edits you think would help too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-847363490


   @jasonhughes248, sounds great. I added you to the google drawing as an editor. Feel free to make changes!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jasonhughes248 commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
jasonhughes248 commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-870041970


   thanks @rdblue! 
   
   on the discussion above on a different diagram for the over-time state of the table, I put [this content together](https://www.dremio.com/apache-iceberg-an-architectural-look-under-the-covers/#a-look-under-the-covers-when-cruding) that includes an attempt to explain what happens to an iceberg table's structure as different operations are done on it. it'd be great to get your feedback on that and/or any other sections in the post. there are a few remaining tweaks I want to do to it, so I can certainly incorporate any edits you think would help too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-846382485


   Looks good to me @jasonhughes248 !
   
   Any thoughts @rdblue @aokolnychyi ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-870110227


   I just deployed the spec with this update. Thanks, @jasonhughes248!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jasonhughes248 commented on pull request #2622: Docs: add catalog and metadata files to metadata structure diagram

Posted by GitBox <gi...@apache.org>.
jasonhughes248 commented on pull request #2622:
URL: https://github.com/apache/iceberg/pull/2622#issuecomment-847247618


   @rdblue yeah, I'd be happy to make these changes in the original drawing if you want to share that with me. if it's lucidchart which it looks like, my username is jason@dremio.com
   
   I think the diagram in that linked deck has some good additional aspects:
   1.  info on the content/purpose for each file type
   2. visually distinguishing catalog vs metadata vs data tiers
   3. the view of how the high-level physical representation of the table changes over time. 
   
   but, I think the lucidchart diagram looks more polished/official, which is probably better for the docs site, so I think it'd be good to try to merge the two - use the current diagram as the base, and add the aspects of the one you linked (plus the both snapshots in the metadata file change). only thing I worry about is the size of the boxes and diagram with the file content/purpose in them since they'll all be visible at the same time in this one-image usage vs the multi-slide build in the deck, but let's see. I can create a copy of the diagram and try it out. 
   
   on #3, I think a diagram like this can be helpful in two related but different situations - an introductory overview for new folks that gives a high-level understanding of the layout and types of files, and a deeper view of how the layout and underlying structure changes as the dataset changes. 
   
   to me, having the point-in-time diagram works really well for the spot it's currently in the docs, since it's a good introduction for new users on that page. then there could be another section covering the changes-over-time content for users who want that more detailed view into how it changes as table changes are made
   
   I actually did a presentation internally where a subset has that structure ("layout overview and file types, then how it changes over time") and it worked well. I'm also doing it publicly via webinar on thursday and making a written version of it shortly thereafter, so that could help here too. the changes-over-time view in that content currently shows the files and their location in the filesystem with color-coding of file types, but I think this diagram view like you have here is a better way to tell that story (or maybe both). I can try to address point #3 with multiple versions of this new merged diagram in that content and share here to see if it works well.
   
   what do you think? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org