You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/02/10 14:54:07 UTC

[GitHub] [parquet-format] wgtmac opened a new pull request, #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

wgtmac opened a new pull request, #192:
URL: https://github.com/apache/parquet-format/pull/192

   Propose to explicitly state that no padding is allowed within a data page. This makes it easier  for BYTE_STREAM_SPLIT decoder to decode page with nulls. In this way, it can simply get the number of encoded values by `total_length_encoded_stream / K (4 for float and 8 for double)`. Otherwise, it has to decode def/rep levels to get exact number of non-null values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] mapleFU commented on pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #192:
URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1426068317

   I think should we check that no more padding is added in all impl? At least, seems C++, Rust, parquet-mr didn't padding at the end of data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] emkornfield commented on pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

Posted by "emkornfield (via GitHub)" <gi...@apache.org>.
emkornfield commented on PR #192:
URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1426163629

   Seems OK to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] shangxinli merged pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

Posted by "shangxinli (via GitHub)" <gi...@apache.org>.
shangxinli merged PR #192:
URL: https://github.com/apache/parquet-format/pull/192


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #192:
URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1425935594

   cc @wjones127


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] wgtmac commented on pull request #192: PARQUET-2241: Update wording of BYTE_STREAM_SPLIT encoding

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #192:
URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1425920012

   cc @shangxinli @gszadovszky @ggershinsky @pitrou @emkornfield


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org