You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/13 22:50:54 UTC

[GitHub] [arrow] westonpace opened a new pull request #10717: ARROW-13091: [Python] Added compression_level to IpcWriteOptions

westonpace opened a new pull request #10717:
URL: https://github.com/apache/arrow/pull/10717


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r672334238



##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -599,6 +602,9 @@ def test_v2_set_chunksize():
 
 
 @pytest.mark.pandas
+@pytest.mark.lz4
+@pytest.mark.snappy
+@pytest.mark.zstd

Review comment:
       Could we expand this as:
   
   ```
   pytest.mark.parametrize("codec", [
       pytest.param("lz4", marks=pytest.mark.lz4),
       pytest.param("snappy", marks=pytest.mark.snappy),
       pytest.param("zstd", marks=pytest.mark.zstd),
   ])
   ```
   
   By letting pytest to iterate through the cases rather than doing it ourselves?
   
   This would enable us to run this case just for the available plugins.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882636939


   (I accidentally submitted master to Crossbow in an unrelated build and noticed a new failure that looked related to here, want to confirm that)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882744641


   Seems like we have more version issues in our crossbow builds: https://github.com/apache/arrow/pull/10659#issuecomment-882712592


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r671352602



##########
File path: python/pyarrow/tests/test_ipc.py
##########
@@ -329,6 +329,36 @@ def test_stream_simple_roundtrip(stream_fixture, use_legacy_ipc_format):
         reader.read_next_batch()
 
 
+def test_compression_roundtrip():
+    sink = io.BytesIO()
+    values = np.random.randint(0, 10, 10000)
+    table = pa.Table.from_arrays([values], names=["values"])
+
+    options = pa.ipc.IpcWriteOptions(compression='zstd')

Review comment:
       Should this test be gated on zstd being built?

##########
File path: python/pyarrow/ipc.pxi
##########
@@ -158,9 +160,14 @@ cdef class IpcWriteOptions(_Weakrefable):
     def compression(self, value):
         if value is None:
             self.c_options.codec.reset()
-        else:
+        elif isinstance(value, str):
             self.c_options.codec = shared_ptr[CCodec](GetResultValue(
                 CCodec.Create(_ensure_compression(value))).release())
+        elif isinstance(value, Codec):
+            self.c_options.codec = (<Codec>value).wrapped
+        else:
+            raise Exception(

Review comment:
       raise TypeError?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882635401


   Revision: 91a6ea48bf342955d86980ebd795398066845030
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-611](https://github.com/ursacomputing/crossbow/branches/all?query=actions-611)
   
   |Task|Status|
   |----|------|
   |homebrew-r-autobrew|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-611-github-homebrew-r-autobrew)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-611-github-homebrew-r-autobrew)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882635401


   Revision: 91a6ea48bf342955d86980ebd795398066845030
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-611](https://github.com/ursacomputing/crossbow/branches/all?query=actions-611)
   
   |Task|Status|
   |----|------|
   |homebrew-r-autobrew|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-611-github-homebrew-r-autobrew)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-611-github-homebrew-r-autobrew)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
westonpace commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-881734525


   I think this is ready for merge.  @kszucs Any last thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882744641


   Seems like we have more version issues in our crossbow builds: https://github.com/apache/arrow/pull/10659#issuecomment-882712592


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r671526013



##########
File path: python/pyarrow/tests/test_ipc.py
##########
@@ -329,6 +329,40 @@ def test_stream_simple_roundtrip(stream_fixture, use_legacy_ipc_format):
         reader.read_next_batch()
 
 
+@pytest.mark.zstd
+def test_compression_roundtrip():
+    if not pa.Codec.is_available('zstd'):
+        pytest.skip("{} support is not built".format('zstd'))

Review comment:
       Sorry, but - is this check redundant with the mark?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882634344






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r669634388



##########
File path: python/pyarrow/tests/test_ipc.py
##########
@@ -329,6 +329,42 @@ def test_stream_simple_roundtrip(stream_fixture, use_legacy_ipc_format):
         reader.read_next_batch()
 
 
+def test_compression_roundtrip():
+    # The ability to set a seed this way is not present on older versions of
+    # numpy (currently in our python 3.6 CI build).  Some inputs might just
+    # happen to compress the same between the two levels so using seeded
+    # random numbers is neccesary
+    if not hasattr(np.random, 'default_rng'):
+        pytest.skip('Requires newer version of numpy')
+    sink = io.BytesIO()
+    rng = np.random.default_rng(seed=42)
+    values = rng.integers(0, 10, 100000)
+    table = pa.Table.from_arrays([values], names=["values"])
+
+    options = pa.ipc.IpcWriteOptions(compression='zstd', compression_level=1)
+    writer = pa.ipc.RecordBatchFileWriter(sink, table.schema, options=options)

Review comment:
       Not really a major thing, but given that the `RecordBatchFileWriter` can act as a context manager and sometimes people look at the tests to learn how to use things, it might be a good idea to write this in the form of
   
   ```
   with pa.ipc.RecordBatchFileWriter(sink, table.schema, options=options) as writer:
      writer.write_table(table)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-880633496


   @westonpace We already have a `Codec` [python wrapper exposed](https://github.com/apache/arrow/blob/master/python/pyarrow/io.pxi#L1618), could we add an optional `level` argument to its constructor and reuse that from the `IpcWriteOptions`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r672331156



##########
File path: python/pyarrow/ipc.pxi
##########
@@ -158,9 +160,14 @@ cdef class IpcWriteOptions(_Weakrefable):
     def compression(self, value):
         if value is None:
             self.c_options.codec.reset()
-        else:
+        elif isinstance(value, str):
             self.c_options.codec = shared_ptr[CCodec](GetResultValue(
                 CCodec.Create(_ensure_compression(value))).release())
+        elif isinstance(value, Codec):
+            self.c_options.codec = (<Codec>value).wrapped

Review comment:
       Just a nit, but we could use `.unwrap()` here.

##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -599,6 +602,9 @@ def test_v2_set_chunksize():
 
 
 @pytest.mark.pandas
+@pytest.mark.lz4
+@pytest.mark.snappy
+@pytest.mark.zstd

Review comment:
       Could we expand this as:
   
   ```
   pytest.mark.parametrize("codec", [
       pytest.param("lz4", marks=pytest.mark.lz4),
       pytest.param("snappy", marks=pytest.mark.snappy),
       pytest.param("zstd", marks=pytest.mark.zstd),
   ])
   ```
   
   By letting pytest to iterate through the cases rather than doing it ourselves?
   
   This would enable us to run this case just for the available plugins.

##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -599,6 +602,9 @@ def test_v2_set_chunksize():
 
 
 @pytest.mark.pandas
+@pytest.mark.lz4
+@pytest.mark.snappy
+@pytest.mark.zstd

Review comment:
       Could we expand this as:
   
   ```python
   pytest.mark.parametrize("codec", [
       pytest.param("lz4", marks=pytest.mark.lz4),
       pytest.param("snappy", marks=pytest.mark.snappy),
       pytest.param("zstd", marks=pytest.mark.zstd),
   ])
   ```
   
   By letting pytest to iterate through the cases rather than doing it ourselves?
   
   This would enable us to run this case just for the available plugins.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r672525066



##########
File path: python/pyarrow/ipc.pxi
##########
@@ -158,9 +160,14 @@ cdef class IpcWriteOptions(_Weakrefable):
     def compression(self, value):
         if value is None:
             self.c_options.codec.reset()
-        else:
+        elif isinstance(value, str):
             self.c_options.codec = shared_ptr[CCodec](GetResultValue(
                 CCodec.Create(_ensure_compression(value))).release())
+        elif isinstance(value, Codec):
+            self.c_options.codec = (<Codec>value).wrapped

Review comment:
       There already was an `unwrap()` but it returned `CCodec*` and not `shared_ptr<CCoded>`.  I think I'd need to investigate where the current `unwrap()` was used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs closed pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs closed pull request #10717:
URL: https://github.com/apache/arrow/pull/10717


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882635401


   Revision: 91a6ea48bf342955d86980ebd795398066845030
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-611](https://github.com/ursacomputing/crossbow/branches/all?query=actions-611)
   
   |Task|Status|
   |----|------|
   |homebrew-r-autobrew|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-611-github-homebrew-r-autobrew)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-611-github-homebrew-r-autobrew)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r672331156



##########
File path: python/pyarrow/ipc.pxi
##########
@@ -158,9 +160,14 @@ cdef class IpcWriteOptions(_Weakrefable):
     def compression(self, value):
         if value is None:
             self.c_options.codec.reset()
-        else:
+        elif isinstance(value, str):
             self.c_options.codec = shared_ptr[CCodec](GetResultValue(
                 CCodec.Create(_ensure_compression(value))).release())
+        elif isinstance(value, Codec):
+            self.c_options.codec = (<Codec>value).wrapped

Review comment:
       Just a nit, but we could use `.unwrap()` here.

##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -599,6 +602,9 @@ def test_v2_set_chunksize():
 
 
 @pytest.mark.pandas
+@pytest.mark.lz4
+@pytest.mark.snappy
+@pytest.mark.zstd

Review comment:
       Could we expand this as:
   
   ```
   pytest.mark.parametrize("codec", [
       pytest.param("lz4", marks=pytest.mark.lz4),
       pytest.param("snappy", marks=pytest.mark.snappy),
       pytest.param("zstd", marks=pytest.mark.zstd),
   ])
   ```
   
   By letting pytest to iterate through the cases rather than doing it ourselves?
   
   This would enable us to run this case just for the available plugins.

##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -599,6 +602,9 @@ def test_v2_set_chunksize():
 
 
 @pytest.mark.pandas
+@pytest.mark.lz4
+@pytest.mark.snappy
+@pytest.mark.zstd

Review comment:
       Could we expand this as:
   
   ```python
   pytest.mark.parametrize("codec", [
       pytest.param("lz4", marks=pytest.mark.lz4),
       pytest.param("snappy", marks=pytest.mark.snappy),
       pytest.param("zstd", marks=pytest.mark.zstd),
   ])
   ```
   
   By letting pytest to iterate through the cases rather than doing it ourselves?
   
   This would enable us to run this case just for the available plugins.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
westonpace commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-881155699


   @kszucs Done.  I also added some helper methods to make working with compression_level a little easier.  It ended up being a bit more involved than I expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882634344


   @github-actions crossbow submit homebrew-r-autobrew


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882634344






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882707954


   See ARROW-13384 as followup


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r672525066



##########
File path: python/pyarrow/ipc.pxi
##########
@@ -158,9 +160,14 @@ cdef class IpcWriteOptions(_Weakrefable):
     def compression(self, value):
         if value is None:
             self.c_options.codec.reset()
-        else:
+        elif isinstance(value, str):
             self.c_options.codec = shared_ptr[CCodec](GetResultValue(
                 CCodec.Create(_ensure_compression(value))).release())
+        elif isinstance(value, Codec):
+            self.c_options.codec = (<Codec>value).wrapped

Review comment:
       There already was an `unwrap()` but it returned `CCodec*` and not `shared_ptr<CCoded>`.  I think I'd need to investigate where the current `unwrap()` was used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r672525066



##########
File path: python/pyarrow/ipc.pxi
##########
@@ -158,9 +160,14 @@ cdef class IpcWriteOptions(_Weakrefable):
     def compression(self, value):
         if value is None:
             self.c_options.codec.reset()
-        else:
+        elif isinstance(value, str):
             self.c_options.codec = shared_ptr[CCodec](GetResultValue(
                 CCodec.Create(_ensure_compression(value))).release())
+        elif isinstance(value, Codec):
+            self.c_options.codec = (<Codec>value).wrapped

Review comment:
       There already was an `unwrap()` but it returned `CCodec*` and not `shared_ptr<CCoded>`.  I think I'd need to investigate where the current `unwrap()` was used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10717: ARROW-13091: [Python] Added compression_level to IpcWriteOptions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-879456403


   https://issues.apache.org/jira/browse/ARROW-13091


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs closed pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs closed pull request #10717:
URL: https://github.com/apache/arrow/pull/10717


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#issuecomment-882744641


   Seems like we have more version issues in our crossbow builds: https://github.com/apache/arrow/pull/10659#issuecomment-882712592


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r672331156



##########
File path: python/pyarrow/ipc.pxi
##########
@@ -158,9 +160,14 @@ cdef class IpcWriteOptions(_Weakrefable):
     def compression(self, value):
         if value is None:
             self.c_options.codec.reset()
-        else:
+        elif isinstance(value, str):
             self.c_options.codec = shared_ptr[CCodec](GetResultValue(
                 CCodec.Create(_ensure_compression(value))).release())
+        elif isinstance(value, Codec):
+            self.c_options.codec = (<Codec>value).wrapped

Review comment:
       Just a nit, but we could use `.unwrap()` here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r672334238



##########
File path: python/pyarrow/tests/test_feather.py
##########
@@ -599,6 +602,9 @@ def test_v2_set_chunksize():
 
 
 @pytest.mark.pandas
+@pytest.mark.lz4
+@pytest.mark.snappy
+@pytest.mark.zstd

Review comment:
       Could we expand this as:
   
   ```python
   pytest.mark.parametrize("codec", [
       pytest.param("lz4", marks=pytest.mark.lz4),
       pytest.param("snappy", marks=pytest.mark.snappy),
       pytest.param("zstd", marks=pytest.mark.zstd),
   ])
   ```
   
   By letting pytest to iterate through the cases rather than doing it ourselves?
   
   This would enable us to run this case just for the available plugins.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs closed pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
kszucs closed pull request #10717:
URL: https://github.com/apache/arrow/pull/10717


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r671529465



##########
File path: python/pyarrow/tests/test_ipc.py
##########
@@ -329,6 +329,40 @@ def test_stream_simple_roundtrip(stream_fixture, use_legacy_ipc_format):
         reader.read_next_batch()
 
 
+@pytest.mark.zstd
+def test_compression_roundtrip():
+    if not pa.Codec.is_available('zstd'):
+        pytest.skip("{} support is not built".format('zstd'))

Review comment:
       Don't be sorry :).  Thanks for catching it.  I've pushed a fix.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10717: ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10717:
URL: https://github.com/apache/arrow/pull/10717#discussion_r670944292



##########
File path: python/pyarrow/tests/test_ipc.py
##########
@@ -329,6 +329,42 @@ def test_stream_simple_roundtrip(stream_fixture, use_legacy_ipc_format):
         reader.read_next_batch()
 
 
+def test_compression_roundtrip():
+    # The ability to set a seed this way is not present on older versions of
+    # numpy (currently in our python 3.6 CI build).  Some inputs might just
+    # happen to compress the same between the two levels so using seeded
+    # random numbers is neccesary
+    if not hasattr(np.random, 'default_rng'):
+        pytest.skip('Requires newer version of numpy')
+    sink = io.BytesIO()
+    rng = np.random.default_rng(seed=42)
+    values = rng.integers(0, 10, 100000)
+    table = pa.Table.from_arrays([values], names=["values"])
+
+    options = pa.ipc.IpcWriteOptions(compression='zstd', compression_level=1)
+    writer = pa.ipc.RecordBatchFileWriter(sink, table.schema, options=options)

Review comment:
       Switched.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org