You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/12 19:10:56 UTC

[GitHub] [iceberg] holdenk opened a new issue #2079: Rewrite metrics during schema transformation

holdenk opened a new issue #2079:
URL: https://github.com/apache/iceberg/issues/2079


   See discussion @ https://github.com/apache/iceberg/pull/2048#discussion_r555831963


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-766143674


   @holdenk, could you elaborate a bit more on the idea? I am not sure I fully understood.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-764913239


   Thanks, @holdenk!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-764908410


   @RussellSpitzer @holdenk, will you have time to work on this one? I think it is quite important since our schema evolution will break if someone renames or drops a column that is referenced by the metrics config. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780811718


   Oops, I missed the last comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
holdenk commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-764912823


   Yeah I'm happy to do the follow on task :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary edited a comment on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
pvary edited a comment on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-781309976


   What happens if we have missing metrics? Could we still read the data, or we error out?
   
   Anyway, I would suggest to have a different prefix for the new properties, so we do not accidentally try to read column names as column ids or column ids as names.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-764908410






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780856834


   We will have to be careful with existing tables as they may be accessed by different versions of Iceberg but I feel we can figure that out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
holdenk commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-776871936


   So when we resolve we'd transform the table properties to point to column references. This would mean the table properties would sort of change underneath folks which might not be desirable, but it would simplify the logic for handling updates.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
holdenk commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780830166


   So what I'm saying is when we write out table properties we resolve the current column name to the column index and write out the column index.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-781309976


   What happens if we have missing metrics? Could we still read the data, or we error out?
   
   Anyway, I would suggest to have a different prefix for the new properties, so we do not accidentally try to read column names as column ids or column names as ids.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
holdenk commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-765804115


   So I'm thinking that rather than rewrite the metrics during schema transformation, when the schema is loaded we convert the from column names to references. What do folks think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780818768


   So, @holdenk, are you saying instead of modifying the table properties during schema evolution, try to resolve them correctly during writes?
   
   Let's take an example. Suppose we have a metrics config for `sort_col_1` as `full`. Then we rename `sort_col_1` into `sort_col_2`. What should happen with metrics? I am afraid it will fail now as there is no `sort_col_1` anymore. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780832994


   Essentially migrate the metrics config to use column ids instead of column names?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-781530064


   > Ah, that's a good point, how about I start with the rewriting for now and we can figure out what we want to do with the translation later?
   
   Seems like old writers would still need us to rewrite, yeah.
   
   > What happens if we have missing metrics? Could we still read the data, or we error out?
   
   I *think* we just use the default value and never fail writes. If that is not the case, we should fix it.
   
   > Anyway, I would suggest to have a different prefix for the new properties,
   
   Maybe we can have some prefix for internal properties we don't want to show to the user? Thoughts, @holdenk @pvary @rdblue @RussellSpitzer?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
holdenk commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780860046


   Ah, that's a good point, how about I start with the rewriting for now and we can figure out what we want to do with the translation later?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780850155


   We have been discussing using column ids in the metrics config for a while. I'd support the migration now.
   
   @RussellSpitzer @rdblue @shardulm94 @danielcweeks @pvary @openinx, thoughts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
holdenk commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-764912823


   Yeah I'm happy to do the follow on task :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780842763


   Let me summarize how I got it.
   
   The idea is to interpret table properties like `write.metadata.metrics.col1` as `write.metadata.metrics.10` (where 10 is the column id of `col1`). That way, we don't have to do anything when a column is renamed as the id stays same. We will also need to add some logic to convert column ids back into column names when showing table properties to the user.
   
    When we drop `col1`, it is alright to keep the old property and it will simply not be shown to the user?
   
   Is that the idea, @holdenk?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
holdenk commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780000345


   So @aokolnychyi WDYT?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on issue #2079: Rewrite metrics during schema transformation

Posted by GitBox <gi...@apache.org>.
holdenk commented on issue #2079:
URL: https://github.com/apache/iceberg/issues/2079#issuecomment-780847481


   That's the idea, however now that I'm looking at the code again that would require changing the (public) API of `MetricsConfig.java` so it's maybe not ideal.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org