You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "anjakefala (via GitHub)" <gi...@apache.org> on 2023/06/16 23:14:32 UTC

[GitHub] [arrow] anjakefala opened a new pull request, #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

anjakefala opened a new pull request, #36137:
URL: https://github.com/apache/arrow/pull/36137

   Change the default parquet version to 2.6.
   
   Discussed in: 
   * [ML](https://lists.apache.org/thread/027g366yr3m03hwtpst6sr58b3trwhsm)
   * [Issue](https://github.com/apache/arrow/issues/35746)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] anjakefala commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1252292753


##########
python/pyarrow/tests/parquet/test_datetime.py:
##########
@@ -378,28 +379,31 @@ def test_parquet_version_timestamp_differences():
     a_us = pa.array(d_us, type=pa.timestamp('us'))
     a_ns = pa.array(d_ns, type=pa.timestamp('ns'))
 
+    all_versions = ['1.0', '2.4', '2.6']
+
     names = ['ts:s', 'ts:ms', 'ts:us', 'ts:ns']
     table = pa.Table.from_arrays([a_s, a_ms, a_us, a_ns], names)
 
-    # Using Parquet version 1.0, seconds should be coerced to milliseconds
+    # Using Parquet version 1.0 and 2.4, seconds should be coerced to milliseconds
     # and nanoseconds should be coerced to microseconds by default
     expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_us], names)
-    _check_roundtrip(table, expected)
+    _check_roundtrip(table, expected, version='1.0')
+    _check_roundtrip(table, expected, version='2.4')
 
-    # Using Parquet version 2.0, seconds should be coerced to milliseconds
+    # Using Parquet version 2.6, seconds should be coerced to milliseconds
     # and nanoseconds should be retained by default
     expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_ns], names)
     _check_roundtrip(table, expected, version='2.6')
 
-    # Using Parquet version 1.0, coercing to milliseconds or microseconds
+    # For either Parquet version coercing to milliseconds or microseconds
     # is allowed
     expected = pa.Table.from_arrays([a_ms, a_ms, a_ms, a_ms], names)
-    _check_roundtrip(table, expected, coerce_timestamps='ms')
+    for ver in all_versions:
+        _check_roundtrip(table, expected, coerce_timestamps='ms', version=ver)
 
-    # Using Parquet version 2.0, coercing to milliseconds or microseconds
-    # is allowed
     expected = pa.Table.from_arrays([a_us, a_us, a_us, a_us], names)
-    _check_roundtrip(table, expected, version='2.6', coerce_timestamps='us')
+    for ver in all_versions:
+        _check_roundtrip(table, expected, version=ver, coerce_timestamps='us')
 
     # TODO: after pyarrow allows coerce_timestamps='ns', tests like the
     # following should pass ...

Review Comment:
   Yes, on the C++ side it appears to be implemented!
   
   > Expected 'WriteTable(*input_table, ::arrow::default_memory_pool(), CreateOutputStream(), input_table->num_rows(), properties, arrow_coerce_to_nanos_properties)' to fail with NotImplemented, but got OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] anjakefala commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1252273616


##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -649,7 +649,7 @@ TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion1_0) {
 }
 
 TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion2_0) {
-  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_4);
+  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_6);

Review Comment:
   I am working on having the tests cover at least 2.4 and 2.6! I started something locally, and will push once I have it built.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1617710753

   The test in C++ looks ok. But I'm afraid that we might leak some test under 2.4, so maybe we can separate and test 2.4 and 2.6?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1595446430

   :warning: GitHub issue #35746 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] anjakefala commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1610019112

   @jorisvandenbossche This was so helpful!
   
   Yeah, with the `_check_roundtrip` calls, it seems that when the version is not explicitly set, it uses the default for `write_table`. And sometimes, the assumption was that this default version was '1.0'. Tested behaviour didn't change for 2.4 and so it didn't matter that the new default version was being used for `_check_roundtrip`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1623153205

   > `AsofJoinTest` needs to be updated as well.
   
   I think this failure is not related, the AsofJoinTest regularly fails on main as well. See https://github.com/apache/arrow/issues/36482


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1254041300


##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -1973,7 +1983,9 @@ TEST(TestArrowReadWrite, ParquetVersionTimestampDifferences) {
                               field("ts:us", t_us), field("ts:ns", t_ns)});
   auto input_table = Table::Make(input_schema, {a_s, a_ms, a_us, a_ns});
 
-  auto parquet_version_1_properties = ::parquet::default_writer_properties();

Review Comment:
   nice catch!



##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -649,7 +649,7 @@ TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion1_0) {
 }
 
 TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion2_0) {
-  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_4);
+  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_6);

Review Comment:
   ```C++
   this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_4);
   this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_6);
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1609666970

   Taking a quick look at some of the Python failures:
   
   - `test_write_error_deletes_incomplete_file`: I assume the test relied on parquet.write_table to raise an error because of nanosecods being present, and they would get truncated. So either we need to add something else that errors, or we can add `version="2.4"` to keep it using the older parquet version and thus it still errors
   - `test_timestamp_restore_timezone_nanosecond`: it is creating the expected result with microseconds, but now it comes back as nanoseconds. I think here we want to test explicitly that the timezone is preserved when the resolution changes, so I would also add a `version="2.4"` to keep the test as it was. But you could parametrize the test just above to use different resolutions (ms, us, ns) to cover all of them.
   - `test_parquet_version_timestamp_differences`: also needs to be updated because the `expected` now contains nanoseconds instead of microseconds (but there are some comments there that are a bit outdated it seems ..)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1623655094

   Thanks @anjakefala!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1250505457


##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3334,7 +3334,7 @@ TEST(ArrowReadWrite, NestedRequiredOuterOptional) {
     auto arrow_writer_props = ArrowWriterProperties::Builder();
     arrow_writer_props.store_schema();
     if (inner_type->id() == ::arrow::Type::UINT32) {
-      writer_props.version(ParquetVersion::PARQUET_2_4);
+      writer_props.version(ParquetVersion::PARQUET_2_6);

Review Comment:
   The default writer_props created above with `WriterProperties::Builder();` will now use 2_6 by default, so potentially this could be removed alltogether



##########
python/pyarrow/tests/parquet/test_datetime.py:
##########
@@ -378,28 +379,31 @@ def test_parquet_version_timestamp_differences():
     a_us = pa.array(d_us, type=pa.timestamp('us'))
     a_ns = pa.array(d_ns, type=pa.timestamp('ns'))
 
+    all_versions = ['1.0', '2.4', '2.6']
+
     names = ['ts:s', 'ts:ms', 'ts:us', 'ts:ns']
     table = pa.Table.from_arrays([a_s, a_ms, a_us, a_ns], names)
 
-    # Using Parquet version 1.0, seconds should be coerced to milliseconds
+    # Using Parquet version 1.0 and 2.4, seconds should be coerced to milliseconds
     # and nanoseconds should be coerced to microseconds by default
     expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_us], names)
-    _check_roundtrip(table, expected)
+    _check_roundtrip(table, expected, version='1.0')
+    _check_roundtrip(table, expected, version='2.4')
 
-    # Using Parquet version 2.0, seconds should be coerced to milliseconds
+    # Using Parquet version 2.6, seconds should be coerced to milliseconds
     # and nanoseconds should be retained by default
     expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_ns], names)
     _check_roundtrip(table, expected, version='2.6')
 
-    # Using Parquet version 1.0, coercing to milliseconds or microseconds
+    # For either Parquet version coercing to milliseconds or microseconds
     # is allowed
     expected = pa.Table.from_arrays([a_ms, a_ms, a_ms, a_ms], names)
-    _check_roundtrip(table, expected, coerce_timestamps='ms')
+    for ver in all_versions:
+        _check_roundtrip(table, expected, coerce_timestamps='ms', version=ver)
 
-    # Using Parquet version 2.0, coercing to milliseconds or microseconds
-    # is allowed
     expected = pa.Table.from_arrays([a_us, a_us, a_us, a_us], names)
-    _check_roundtrip(table, expected, version='2.6', coerce_timestamps='us')
+    for ver in all_versions:
+        _check_roundtrip(table, expected, version=ver, coerce_timestamps='us')
 
     # TODO: after pyarrow allows coerce_timestamps='ns', tests like the
     # following should pass ...

Review Comment:
   Could also be left for a follow-up, but we could now allow this (in `_create_arrow_writer_properties` in _parquet.pyx we would need to update the cython bindings to pass it through, I think on C++ side it's already implemented)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] anjakefala commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1252273616


##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -649,7 +649,7 @@ TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion1_0) {
 }
 
 TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion2_0) {
-  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_4);
+  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_6);

Review Comment:
   I am working on having the tests cover at least 2.4 and 2.6! I started something locally, and will push once it successfully builds.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] anjakefala commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1252292753


##########
python/pyarrow/tests/parquet/test_datetime.py:
##########
@@ -378,28 +379,31 @@ def test_parquet_version_timestamp_differences():
     a_us = pa.array(d_us, type=pa.timestamp('us'))
     a_ns = pa.array(d_ns, type=pa.timestamp('ns'))
 
+    all_versions = ['1.0', '2.4', '2.6']
+
     names = ['ts:s', 'ts:ms', 'ts:us', 'ts:ns']
     table = pa.Table.from_arrays([a_s, a_ms, a_us, a_ns], names)
 
-    # Using Parquet version 1.0, seconds should be coerced to milliseconds
+    # Using Parquet version 1.0 and 2.4, seconds should be coerced to milliseconds
     # and nanoseconds should be coerced to microseconds by default
     expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_us], names)
-    _check_roundtrip(table, expected)
+    _check_roundtrip(table, expected, version='1.0')
+    _check_roundtrip(table, expected, version='2.4')
 
-    # Using Parquet version 2.0, seconds should be coerced to milliseconds
+    # Using Parquet version 2.6, seconds should be coerced to milliseconds
     # and nanoseconds should be retained by default
     expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_ns], names)
     _check_roundtrip(table, expected, version='2.6')
 
-    # Using Parquet version 1.0, coercing to milliseconds or microseconds
+    # For either Parquet version coercing to milliseconds or microseconds
     # is allowed
     expected = pa.Table.from_arrays([a_ms, a_ms, a_ms, a_ms], names)
-    _check_roundtrip(table, expected, coerce_timestamps='ms')
+    for ver in all_versions:
+        _check_roundtrip(table, expected, coerce_timestamps='ms', version=ver)
 
-    # Using Parquet version 2.0, coercing to milliseconds or microseconds
-    # is allowed
     expected = pa.Table.from_arrays([a_us, a_us, a_us, a_us], names)
-    _check_roundtrip(table, expected, version='2.6', coerce_timestamps='us')
+    for ver in all_versions:
+        _check_roundtrip(table, expected, version=ver, coerce_timestamps='us')
 
     # TODO: after pyarrow allows coerce_timestamps='ns', tests like the
     # following should pass ...

Review Comment:
   Yes, on the C++ side it appears to be implemented!
   
   ```
   Expected 'WriteTable(*input_table, ::arrow::default_memory_pool(), CreateOutputStream(), input_table->num_rows(), properties, arrow_coerce_to_nanos_properties)' to fail with NotImplemented, but got OK
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1254039562


##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -649,7 +649,7 @@ TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion1_0) {
 }
 
 TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion2_0) {
-  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_4);
+  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_6);

Review Comment:
   ```suggestion
     this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_4);
     this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_6);
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] anjakefala commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1252292753


##########
python/pyarrow/tests/parquet/test_datetime.py:
##########
@@ -378,28 +379,31 @@ def test_parquet_version_timestamp_differences():
     a_us = pa.array(d_us, type=pa.timestamp('us'))
     a_ns = pa.array(d_ns, type=pa.timestamp('ns'))
 
+    all_versions = ['1.0', '2.4', '2.6']
+
     names = ['ts:s', 'ts:ms', 'ts:us', 'ts:ns']
     table = pa.Table.from_arrays([a_s, a_ms, a_us, a_ns], names)
 
-    # Using Parquet version 1.0, seconds should be coerced to milliseconds
+    # Using Parquet version 1.0 and 2.4, seconds should be coerced to milliseconds
     # and nanoseconds should be coerced to microseconds by default
     expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_us], names)
-    _check_roundtrip(table, expected)
+    _check_roundtrip(table, expected, version='1.0')
+    _check_roundtrip(table, expected, version='2.4')
 
-    # Using Parquet version 2.0, seconds should be coerced to milliseconds
+    # Using Parquet version 2.6, seconds should be coerced to milliseconds
     # and nanoseconds should be retained by default
     expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_ns], names)
     _check_roundtrip(table, expected, version='2.6')
 
-    # Using Parquet version 1.0, coercing to milliseconds or microseconds
+    # For either Parquet version coercing to milliseconds or microseconds
     # is allowed
     expected = pa.Table.from_arrays([a_ms, a_ms, a_ms, a_ms], names)
-    _check_roundtrip(table, expected, coerce_timestamps='ms')
+    for ver in all_versions:
+        _check_roundtrip(table, expected, coerce_timestamps='ms', version=ver)
 
-    # Using Parquet version 2.0, coercing to milliseconds or microseconds
-    # is allowed
     expected = pa.Table.from_arrays([a_us, a_us, a_us, a_us], names)
-    _check_roundtrip(table, expected, version='2.6', coerce_timestamps='us')
+    for ver in all_versions:
+        _check_roundtrip(table, expected, version=ver, coerce_timestamps='us')
 
     # TODO: after pyarrow allows coerce_timestamps='ns', tests like the
     # following should pass ...

Review Comment:
   Yes, on the C++ side it appears to be implemented!
   
   > Expected 'WriteTable(*input_table, ::arrow::default_memory_pool(), CreateOutputStream(), input_table->num_rows(), properties, arrow_coerce_to_nanos_properties)' to fail with NotImplemented, but got OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1623108750

   @github-actions crossbow submit -g integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] anjakefala commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1622637510

   `AsofJoinTest` needs to be updated as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1623654234

   There might still be a few cases where the test coverage for all options can be improved (see some of the comments), but if needed, let's do that in a follow-up PR. 
   I am going to merge this now, because we have another PR waiting on this one that we also want to get in before the feature freeze.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche merged PR #36137:
URL: https://github.com/apache/arrow/pull/36137


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1623112161

   Revision: 932de523ac2782c1d492be87cafe7bf64e11384e
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-34c5b677fb](https://github.com/ursacomputing/crossbow/branches/all?query=actions-34c5b677fb)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.10-hdfs-2.9.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.10-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions/runs/5472585221/jobs/9964946833)|
   |test-conda-python-3.10-hdfs-3.2.1|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.10-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions/runs/5472587140/jobs/9964951121)|
   |test-conda-python-3.10-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.10-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5472586899/jobs/9964950480)|
   |test-conda-python-3.10-pandas-nightly|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.10-pandas-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/5472587369/jobs/9964951554)|
   |test-conda-python-3.10-spark-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.10-spark-master)](https://github.com/ursacomputing/crossbow/actions/runs/5472586067/jobs/9964948694)|
   |test-conda-python-3.11-dask-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.11-dask-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5472587916/jobs/9964953053)|
   |test-conda-python-3.11-dask-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.11-dask-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5472585808/jobs/9964947847)|
   |test-conda-python-3.11-pandas-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.11-pandas-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5472587570/jobs/9964952103)|
   |test-conda-python-3.8-pandas-1.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.8-pandas-1.0)](https://github.com/ursacomputing/crossbow/actions/runs/5472587703/jobs/9964952486)|
   |test-conda-python-3.8-spark-v3.1.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.8-spark-v3.1.2)](https://github.com/ursacomputing/crossbow/actions/runs/5472588129/jobs/9964953463)|
   |test-conda-python-3.9-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.9-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5472586316/jobs/9964949207)|
   |test-conda-python-3.9-spark-v3.2.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-34c5b677fb-github-test-conda-python-3.9-spark-v3.2.0)](https://github.com/ursacomputing/crossbow/actions/runs/5472586585/jobs/9964949708)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] anjakefala commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "anjakefala (via GitHub)" <gi...@apache.org>.
anjakefala commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1252273616


##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -649,7 +649,7 @@ TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion1_0) {
 }
 
 TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion2_0) {
-  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_4);
+  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_6);

Review Comment:
   I am working on having the tests cover at least 2.4 and 2.6! I started something locally, and will push once it builds.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1632581662

   Conbench analyzed the 6 benchmark runs on commit `22f7e597`.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14982060365) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36137:
URL: https://github.com/apache/arrow/pull/36137#issuecomment-1609671954

   For C++, the test failures are in
   
   ```
    [  FAILED  ] 2 tests, listed below:
   [  FAILED  ] TestArrowReadWrite.DateTimeTypes
   [  FAILED  ] TestArrowReadWrite.ParquetVersionTimestampDifferences
   ```
   
   and I suppose those will also be updated to reflect the new defaults (nanoseconds are preserved), or updated to test both old default of 2.4 and new of 2.6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #36137: GH-35746: [Parquet][C++][Python] Switch default Parquet version to 2.6

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1250547461


##########
cpp/src/parquet/arrow/arrow_schema_test.cc:
##########
@@ -891,7 +890,7 @@ TEST_F(TestConvertArrowSchema, ArrowFields) {
        LogicalType::Timestamp(true, LogicalType::TimeUnit::MICROS), ParquetType::INT64,
        -1},
       {"timestamp(nanosecond, CET)", ::arrow::timestamp(::arrow::TimeUnit::NANO, "CET"),
-       LogicalType::Timestamp(true, LogicalType::TimeUnit::MICROS), ParquetType::INT64,
+       LogicalType::Timestamp(true, LogicalType::TimeUnit::NANOS), ParquetType::INT64,

Review Comment:
   Should we extract NANOS, and test with both 2_4 and 2_6? Since 2.4 might still be used for some time



##########
cpp/src/parquet/column_writer_test.cc:
##########
@@ -649,7 +649,7 @@ TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion1_0) {
 }
 
 TYPED_TEST(TestPrimitiveWriter, DictionaryFallbackVersion2_0) {
-  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_4);
+  this->TestDictionaryFallbackEncoding(ParquetVersion::PARQUET_2_6);

Review Comment:
   Emmm should we traverse all 2_0, 2_4, 2_6 ? Or just 2_0 is ok?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org