You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/02/16 22:48:44 UTC

[GitHub] [arrow] wjones127 commented on a diff in pull request #34140: GH-15173: [C++][Parquet] Fixing ByteStreamSplit Standard broken

wjones127 commented on code in PR #34140:
URL: https://github.com/apache/arrow/pull/34140#discussion_r1109079983


##########
cpp/src/parquet/encoding.cc:
##########
@@ -2937,11 +2937,16 @@ ByteStreamSplitDecoder<DType>::ByteStreamSplitDecoder(const ColumnDescriptor* de
 template <typename DType>
 void ByteStreamSplitDecoder<DType>::SetData(int num_values, const uint8_t* data,
                                             int len) {
-  DecoderImpl::SetData(num_values, data, len);
-  if (num_values * static_cast<int64_t>(sizeof(T)) > len) {
-    throw ParquetException("Data size too small for number of values (corrupted file?)");
+  if (num_values * static_cast<int64_t>(sizeof(T)) < len) {
+    throw ParquetException("Data size too large for number of values (padding file?)");

Review Comment:
   It might be nice to be more specific.
   ```suggestion
       throw ParquetException("Data size too large for number of values (padding in byte stream split data page?)");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org