You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/21 02:05:30 UTC

[GitHub] [arrow] wjones127 opened a new pull request, #13665: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

wjones127 opened a new pull request, #13665:
URL: https://github.com/apache/arrow/pull/13665

   With these changes I can successfully read the parquet file provides in the original report.
   
   Parquet file: https://www.dropbox.com/s/portxgch3fpovnz/test2.parq?dl=0
   Gist to generate: https://gist.github.com/bivald/f93448eaf25808284c4029c691a58a6a
   Original report: https://lists.apache.org/thread/wtbqozdhj2hwn6f0sps2j70lr07grk06
   
   Based off of changes in ARROW-10353


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r927995829


##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3943,6 +3943,19 @@ TEST(TestArrowReaderAdHoc, WriteBatchedNestedNullableStringColumn) {
   ::arrow::AssertTablesEqual(*expected, *actual, /*same_chunk_layout=*/false);
 }
 
+TEST(TestArrowReaderAdHoc, OldDataPageV2) {
+  // ARROW-17100
+  const char* c_root = std::getenv("ARROW_TEST_DATA");
+  if (!c_root) {
+    GTEST_SKIP() << "ARROW_TEST_DATA not set.";
+  }

Review Comment:
   I think the feeling was it was Arrow C++ specific because it's not *technically* a valid parquet file but we wanted to be able to read it anyways for backwards compatibility. My understanding of the `parquet-testing` repo was it contains files that all readers should either definitely support or definitely not. But I'm open to moving if I'm misunderstanding the purpose of `parquet-testing`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm merged pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
lidavidm merged PR #13665:
URL: https://github.com/apache/arrow/pull/13665


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r928145460


##########
cpp/src/parquet/metadata.cc:
##########
@@ -58,6 +58,13 @@ const ApplicationVersion& ApplicationVersion::PARQUET_MR_FIXED_STATS_VERSION() {
   return version;
 }
 
+const ApplicationVersion& ApplicationVersion::PARQUET_CPP_10353_FIXED_VERSION() {
+  // Parquet 1.5.1 had this problem, and after that we switched to the

Review Comment:
   Which problem? Neither the comment nor the function name explains it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r929101106


##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3944,7 +3944,10 @@ TEST(TestArrowReaderAdHoc, WriteBatchedNestedNullableStringColumn) {
 }
 
 TEST(TestArrowReaderAdHoc, OldDataPageV2) {
-  // ARROW-17100
+// ARROW-17100

Review Comment:
   nit: fairly sure this will be a lint error (the comment needs to be indented)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1190954195

   https://issues.apache.org/jira/browse/ARROW-17100


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1197417078

   Benchmark runs are scheduled for baseline = e10d8e898b52f69ef4c259c9b6fe934319db8f94 and contender = 545b4313d6db2dfcc4ea0aa4ac23785d64450e1d. 545b4313d6db2dfcc4ea0aa4ac23785d64450e1d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:32.14% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c723d557fc1546619227dc83d4c6eae0...a75c77ea8e474504b7cfe6cbafacc4de/)
   [Finished :arrow_down:0.1% :arrow_up:0.03%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/af235fa72fa8460f8d17cbc7b567c6a7...5f429c5c1f5c426b825237d97f2614b5/)
   [Failed :arrow_down:0.0% :arrow_up:1.09%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/7645daf7076b4796a06247dae2aeedd4...dda8b8ba11ac4c02a1f75d9997f07612/)
   [Finished :arrow_down:0.82% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ca8916189b6d47f697ebcd3168000c2c...ee2fe44af3bc4affbc1e415225362ce4/)
   Buildkite builds:
   [Failed] [`545b4313` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1207)
   [Finished] [`545b4313` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1219)
   [Finished] [`545b4313` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1201)
   [Finished] [`545b4313` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1221)
   [Failed] [`e10d8e89` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1206)
   [Finished] [`e10d8e89` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1218)
   [Failed] [`e10d8e89` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1200)
   [Finished] [`e10d8e89` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1220)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r928000424


##########
cpp/src/parquet/column_reader.h:
##########
@@ -105,10 +105,12 @@ class PARQUET_EXPORT PageReader {
 
   static std::unique_ptr<PageReader> Open(
       std::shared_ptr<ArrowInputStream> stream, int64_t total_num_rows,
-      Compression::type codec, ::arrow::MemoryPool* pool = ::arrow::default_memory_pool(),
+      Compression::type codec, bool compression_always_true,

Review Comment:
   Yeah I went to default this to false and then removed the argument from the calls in tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r928000634


##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3943,6 +3943,19 @@ TEST(TestArrowReaderAdHoc, WriteBatchedNestedNullableStringColumn) {
   ::arrow::AssertTablesEqual(*expected, *actual, /*same_chunk_layout=*/false);
 }
 
+TEST(TestArrowReaderAdHoc, OldDataPageV2) {
+  // ARROW-17100
+  const char* c_root = std::getenv("ARROW_TEST_DATA");
+  if (!c_root) {
+    GTEST_SKIP() << "ARROW_TEST_DATA not set.";
+  }

Review Comment:
   Ah, ok - sounds good then



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1193897460

   Note there's a lint issue, and the new test file uses compression, meaning we should also add a SKIP for when the codec isn't enabled, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r928145317


##########
cpp/src/parquet/column_reader.h:
##########
@@ -105,11 +105,13 @@ class PARQUET_EXPORT PageReader {
 
   static std::unique_ptr<PageReader> Open(
       std::shared_ptr<ArrowInputStream> stream, int64_t total_num_rows,
-      Compression::type codec, ::arrow::MemoryPool* pool = ::arrow::default_memory_pool(),
+      Compression::type codec, bool compression_always_true = false,

Review Comment:
   Call this `always_compressed` or something?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] emkornfield commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
emkornfield commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1193153130

   Generally looks OK to me, we might want to replace the boolean with a struct that contains the boolean, I can imagine more of these types of issues cropping up, but we can always do the conversion for the next one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r927983874


##########
cpp/src/parquet/column_reader.h:
##########
@@ -105,10 +105,12 @@ class PARQUET_EXPORT PageReader {
 
   static std::unique_ptr<PageReader> Open(
       std::shared_ptr<ArrowInputStream> stream, int64_t total_num_rows,
-      Compression::type codec, ::arrow::MemoryPool* pool = ::arrow::default_memory_pool(),
+      Compression::type codec, bool compression_always_true,

Review Comment:
   nit: default to false?



##########
cpp/src/parquet/column_io_benchmark.cc:
##########
@@ -130,7 +130,8 @@ std::shared_ptr<Int64Reader> BuildReader(std::shared_ptr<Buffer>& buffer,
                                          int64_t num_values, Compression::type codec,
                                          ColumnDescriptor* schema) {
   auto source = std::make_shared<::arrow::io::BufferReader>(buffer);
-  std::unique_ptr<PageReader> page_reader = PageReader::Open(source, num_values, codec);
+  std::unique_ptr<PageReader> page_reader =
+      PageReader::Open(source, num_values, codec, false);

Review Comment:
   nit: add `/*param_name=*/ false` so readers can more easily tell what's going on



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3943,6 +3943,19 @@ TEST(TestArrowReaderAdHoc, WriteBatchedNestedNullableStringColumn) {
   ::arrow::AssertTablesEqual(*expected, *actual, /*same_chunk_layout=*/false);
 }
 
+TEST(TestArrowReaderAdHoc, OldDataPageV2) {
+  // ARROW-17100
+  const char* c_root = std::getenv("ARROW_TEST_DATA");
+  if (!c_root) {
+    GTEST_SKIP() << "ARROW_TEST_DATA not set.";
+  }

Review Comment:
   This also needs a SKIP like the one for Snappy above



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3943,6 +3943,19 @@ TEST(TestArrowReaderAdHoc, WriteBatchedNestedNullableStringColumn) {
   ::arrow::AssertTablesEqual(*expected, *actual, /*same_chunk_layout=*/false);
 }
 
+TEST(TestArrowReaderAdHoc, OldDataPageV2) {
+  // ARROW-17100
+  const char* c_root = std::getenv("ARROW_TEST_DATA");
+  if (!c_root) {
+    GTEST_SKIP() << "ARROW_TEST_DATA not set.";
+  }

Review Comment:
   FWIW, everything else in `parquet` uses PARQUET_TEST_DATA…should it have gone there instead?



##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -85,7 +85,7 @@ class TestPrimitiveWriter : public PrimitiveTypedTest<TestType> {
     ASSERT_OK_AND_ASSIGN(auto buffer, sink_->Finish());
     auto source = std::make_shared<::arrow::io::BufferReader>(buffer);
     std::unique_ptr<PageReader> page_reader =
-        PageReader::Open(std::move(source), num_rows, compression);
+        PageReader::Open(std::move(source), num_rows, compression, false);

Review Comment:
   ditto the comment above here (though: adding the default would also fix it)



##########
cpp/src/parquet/column_reader.cc:
##########
@@ -449,7 +452,10 @@ std::shared_ptr<Page> SerializedPageReader::NextPage() {
           header.repetition_levels_byte_length < 0) {
         throw ParquetException("Invalid page header (negative levels byte length)");
       }
-      bool is_compressed = header.__isset.is_compressed ? header.is_compressed : false;
+      // Some implementations set is_compressed to false but still compressed.

Review Comment:
   "Some implementations" -> Specifically, Arrow prior to 3.0.0?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13665: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1190953898

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW
   
   Opening JIRAs ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
wjones127 commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1192924818

   @lidavidm Would you be willing to review? Antoine is on holiday for the next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r928317730


##########
cpp/src/parquet/metadata.cc:
##########
@@ -58,6 +58,13 @@ const ApplicationVersion& ApplicationVersion::PARQUET_MR_FIXED_STATS_VERSION() {
   return version;
 }
 
+const ApplicationVersion& ApplicationVersion::PARQUET_CPP_10353_FIXED_VERSION() {
+  // Parquet 1.5.1 had this problem, and after that we switched to the

Review Comment:
   The 10353 was meant to refer to the ARROW-10353 issue. I have added a summary of the issue (or specifically the part that makes use need to workaround) and referred explicitly to "ARROW-10353".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1191164130

   Hmm, so let's put it in `testing/data/parquet` then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
wjones127 commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1190955662

   @pitrou I can add a test with the offending parquet file, but not sure where to put it. testing/data/parquet? or cpp/submodules/parquet-testing/data? The former seems Arrow-specific and the latter like it's data that all Parquet readers should be expected to read successfully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1192925841

   CC @emkornfield if you want to take a look as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1192925672

   @wjones127 I'll take a look, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13665:
URL: https://github.com/apache/arrow/pull/13665#discussion_r929138697


##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3944,7 +3944,10 @@ TEST(TestArrowReaderAdHoc, WriteBatchedNestedNullableStringColumn) {
 }
 
 TEST(TestArrowReaderAdHoc, OldDataPageV2) {
-  // ARROW-17100
+// ARROW-17100

Review Comment:
   My local formatter de-indented it originally (moved with the `#ifdef`), but seems to be ambivalent to the indentation now (will preserve either way). I changed it anyways.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13665: ARROW-17100: [C++][Parquet] Fix backwards compatibility for ParquetV2 data pages written prior to 3.0.0 per ARROW-10353

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13665:
URL: https://github.com/apache/arrow/pull/13665#issuecomment-1197417260

   ['Python', 'R'] benchmarks have high level of regressions.
   [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c723d557fc1546619227dc83d4c6eae0...a75c77ea8e474504b7cfe6cbafacc4de/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org