You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/16 15:44:04 UTC

[GitHub] [iceberg] findepi opened a new issue, #6443: Provide Puffin reader API allowing read without decompression

findepi opened a new issue, #6443:
URL: https://github.com/apache/iceberg/issues/6443

   ### Feature Request / Improvement
   
   When a query engine wants to add new stats to a snapshot that already has some stats, it currently needs to merge existing stats file' blobs with new ones.  
   
   Currently, the only Puffin reader API for reading blobs will decompress them implicitly.
   The application merging stats probably doesn't know much about these old stats, so also doesn't know whether they should be compressed, so it should preserve the compression. Thus it will want to re-compress them again.
   
   - This process is wasteful: redundant decompression and compression
   - Also, it is not possible to implement it in a future-proof manner: application can preserve compression only for the puffin codecs it was built with
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6443: Provide Puffin reader API allowing read without decompression

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6443:
URL: https://github.com/apache/iceberg/issues/6443#issuecomment-1595902402

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on issue #6443: Provide Puffin reader API allowing read without decompression

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #6443:
URL: https://github.com/apache/iceberg/issues/6443#issuecomment-1357223293

   > When a query engine wants to add new stats to a snapshot that already has some stats, it currently needs to merge existing stats file' blobs with new ones.
   
   Can you please explain about when a "query engine wants to add new stats to a snapshot that already has some stats" ? 
   
   Aren't stats computed once for the snapshot (using ANALYZE command). Are we planning that each column can generate stats independently? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on issue #6443: Provide Puffin reader API allowing read without decompression

Posted by GitBox <gi...@apache.org>.
findepi commented on issue #6443:
URL: https://github.com/apache/iceberg/issues/6443#issuecomment-1357228866

   @ajantha-bhat there may be many different types of stats, and stats can be computed for subset of columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on issue #6443: Provide Puffin reader API allowing read without decompression

Posted by "findepi (via GitHub)" <gi...@apache.org>.
findepi commented on issue #6443:
URL: https://github.com/apache/iceberg/issues/6443#issuecomment-1596786148

   It's needed for https://github.com/trinodb/trino/issues/15440.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org