You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by jo...@apache.org on 2023/06/13 12:37:54 UTC

[arrow] branch main updated: GH-35858: [Python] disallow none schema parquet writer (#36011)

This is an automated email from the ASF dual-hosted git repository.

jorisvandenbossche pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 1ddeaab30b GH-35858: [Python] disallow none schema parquet writer (#36011)
1ddeaab30b is described below

commit 1ddeaab30b905b97c1c17a41daa1b3cd923e91d9
Author: Weston Pace <we...@gmail.com>
AuthorDate: Tue Jun 13 05:37:39 2023 -0700

    GH-35858: [Python] disallow none schema parquet writer (#36011)
    
    ### Rationale for this change
    
    Previously, passing in None for the schema would cause a segmentation fault.
    
    ### What changes are included in this PR?
    
    Now a TypeError is raised instead
    
    ### Are these changes tested?
    
    Yes, a new unit test is created
    
    ### Are there any user-facing changes?
    
    No
    * Closes: #35858
    
    Authored-by: Weston Pace <we...@gmail.com>
    Signed-off-by: Joris Van den Bossche <jo...@gmail.com>
---
 python/pyarrow/_parquet.pyx                         | 2 +-
 python/pyarrow/tests/parquet/test_parquet_writer.py | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/python/pyarrow/_parquet.pyx b/python/pyarrow/_parquet.pyx
index 2fc0494cbc..f9cd5289c7 100644
--- a/python/pyarrow/_parquet.pyx
+++ b/python/pyarrow/_parquet.pyx
@@ -1691,7 +1691,7 @@ cdef class ParquetWriter(_Weakrefable):
         int64_t dictionary_pagesize_limit
         object store_schema
 
-    def __cinit__(self, where, Schema schema, use_dictionary=None,
+    def __cinit__(self, where, Schema schema not None, use_dictionary=None,
                   compression=None, version=None,
                   write_statistics=None,
                   MemoryPool memory_pool=None,
diff --git a/python/pyarrow/tests/parquet/test_parquet_writer.py b/python/pyarrow/tests/parquet/test_parquet_writer.py
index 6ae4307135..e6fbd97053 100644
--- a/python/pyarrow/tests/parquet/test_parquet_writer.py
+++ b/python/pyarrow/tests/parquet/test_parquet_writer.py
@@ -93,6 +93,14 @@ def test_validate_schema_write_table(tempdir):
         with pytest.raises(ValueError):
             w.write_table(simple_table)
 
+def test_parquet_invalid_writer():
+
+    with pytest.raises(TypeError):
+        some_schema = pa.schema([pa.field("x", pa.int32())])
+        pq.ParquetWriter(None, some_schema)
+
+    with pytest.raises(TypeError):
+        pq.ParquetWriter("some_path", None)
 
 @pytest.mark.pandas
 @parametrize_legacy_dataset