You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/02/24 10:05:07 UTC

[GitHub] [parquet-mr] wgtmac opened a new pull request, #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

wgtmac opened a new pull request, #1034:
URL: https://github.com/apache/parquet-mr/pull/1034

   ### Jira
   
   https://issues.apache.org/jira/PARQUET-2230
   
   ### Tests
   
   Added `RewriteCommandTest`.
   
   ### Commits
   
   Implemented RewriteCommand to parquet-cli to support rewriting one or more parquet files with prune, mask and transform-codec.
   
   ### Documentation
   
   Updated README.md to reflect the new command.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1034:
URL: https://github.com/apache/parquet-mr/pull/1034#issuecomment-1445965822

   @gszadovszky Unfortunately, those commands and classes have been released in v1.12.0. I have already annotated `deprecated` to those classes. Do you think it makes sense to deprecate those commands as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1034:
URL: https://github.com/apache/parquet-mr/pull/1034#issuecomment-1445977675

   > @wgtmac, I think it would be nice to inform the cli users as well about the deprecation. There is no harm keeping these commands only that if the related implementation is deprecated we won't maintain the functionality of them.
   
   Agreed. I will submit a PR to address this. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1034:
URL: https://github.com/apache/parquet-mr/pull/1034#issuecomment-1443507343

   Please take a look when you get the chance. Thanks!
   @gszadovszky @ggershinsky @shangxinli 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1034:
URL: https://github.com/apache/parquet-mr/pull/1034#issuecomment-1443560593

   > Are we planning to deprecate the other tools covered by this one? (Or if they were not released yet we might simply remove them?)
   
   Let me take a look. If those tools and classes are not released yet, it would be a good time to remove them.
   
   > Next potential topic around parquet-cli if you're intereseted :) There are some implementations around the hadoop conf in parquet-cli but I do not fully understand how it works. If it works as is we should document it otherwise we should make it work somehow. It would be great if the different read/write flags could be used in the tools. Like setting the zstd compression ratio for rewrite. Or using a non-default encoding. What do you think?
   
   I am not familiar with it either. I will dig into it to find how it works and what can be done later.
   
   It is Friday now. Will get back to it next week. :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] gszadovszky merged pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

Posted by "gszadovszky (via GitHub)" <gi...@apache.org>.
gszadovszky merged PR #1034:
URL: https://github.com/apache/parquet-mr/pull/1034


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] gszadovszky commented on pull request #1034: PARQUET-2230: Add a new rewrite command powered by ParquetRewriter

Posted by "gszadovszky (via GitHub)" <gi...@apache.org>.
gszadovszky commented on PR #1034:
URL: https://github.com/apache/parquet-mr/pull/1034#issuecomment-1445971800

   @wgtmac, I think it would be nice to inform the cli users as well about the deprecation. There is no harm keeping these commands only that if the related implementation is deprecated we won't maintain the functionality of them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org