You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/08 13:13:45 UTC

[GitHub] [arrow] raulcd opened a new pull request, #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

raulcd opened a new pull request, #12837:
URL: https://github.com/apache/arrow/pull/12837

   This PR makes `pyarrow.parquet` a package and moves `pyarrow.parquet_encryption` to `pyarrow.parquet.encryption`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on a diff in pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
raulcd commented on code in PR #12837:
URL: https://github.com/apache/arrow/pull/12837#discussion_r849226398


##########
docs/source/python/parquet.rst:
##########
@@ -683,7 +683,7 @@ For example, in order to use the ``MyKmsClient`` defined above:
 
    crypto_factory = CryptoFactory(kms_client_factory)
 
-An :download:`example <../../../python/examples/parquet_encryption/sample_vault_kms_client.py>`
+An :download:`example <../../../python/examples/parquet.encryption/sample_vault_kms_client.py>`

Review Comment:
   Thanks for the comment, you are correct, I have reverted back to the current examples directory: `python/examples/parquet_encryption/sample_vault_kms_client.py`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #12837:
URL: https://github.com/apache/arrow/pull/12837#issuecomment-1186804839

   I certainly agree that it is indeed not a typical pattern to put so much code in the init file (it also makes it more difficult to open the file by name in an editor ..). As Raúl asked above, we could also move all this code into `pyarrow/parquet/parquet.py` (or some other name). 
   
   I think one small advantage of the `__init__.py` in this case is that it leaves the path of the functions the same (`pyarrow.parquet.read_table`, instead of `pyarrow.parquet.parquet.read_table`), and whenever the actual path is not the public path, you will always get users that will start using full non-public path. But don't know if that's enough to say "practicality beats purity" in this case :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
AlenkaF commented on PR #12837:
URL: https://github.com/apache/arrow/pull/12837#issuecomment-1096592935

   Thanks for this, looks good to me 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #12837:
URL: https://github.com/apache/arrow/pull/12837#issuecomment-1187226522

   We could certainly move all code to `pyarrow/parquet/core.py` and just import the APIs from the `__init__.py`. Someone just needs to open a JIRA and submit a PR :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12837:
URL: https://github.com/apache/arrow/pull/12837#issuecomment-1098682002

   Benchmark runs are scheduled for baseline = ccaef092c297b8af5d375bc542d908cda8fd415e and contender = 421e8852344f7b184a2f7992bdc08a93c1c8f0b1. 421e8852344f7b184a2f7992bdc08a93c1c8f0b1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/2e9d87d7fd6e47f28db724865b2cd65d...78df5caedd8b4806b8e2b2ca503a4fb9/)
   [Finished :arrow_down:1.09% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/933ccb229ab440cba868d29028c0f6c2...c83db1ea11fd4a888533515d103bb31c/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e82c67e137f24b6296b68fc3359899ee...7a3dc9a23dac44ad8dd03f0f7eb3fe62/)
   [Finished :arrow_down:0.3% :arrow_up:0.17%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/afd4a2f941274236a8a02be32c3d7404...462a6eb2a8604535b032af0518dff48d/)
   Buildkite builds:
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/499| `421e8852` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/485| `421e8852` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/485| `421e8852` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/495| `421e8852` ursa-thinkcentre-m75q>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/498| `ccaef092` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/484| `ccaef092` test-mac-arm>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/484| `ccaef092` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/494| `ccaef092` ursa-thinkcentre-m75q>
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] andersonm-ibm commented on a diff in pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
andersonm-ibm commented on code in PR #12837:
URL: https://github.com/apache/arrow/pull/12837#discussion_r848555757


##########
docs/source/python/parquet.rst:
##########
@@ -683,7 +683,7 @@ For example, in order to use the ``MyKmsClient`` defined above:
 
    crypto_factory = CryptoFactory(kms_client_factory)
 
-An :download:`example <../../../python/examples/parquet_encryption/sample_vault_kms_client.py>`
+An :download:`example <../../../python/examples/parquet.encryption/sample_vault_kms_client.py>`

Review Comment:
   Either the examples folder should change to `parquet/encryption/` for consistency and this would change to  `parquet/encryption/sample_vault_kms_client.py` , or this should stay parquet_encryption.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
raulcd commented on PR #12837:
URL: https://github.com/apache/arrow/pull/12837#issuecomment-1187329368

   I've created a ticket to track it https://issues.apache.org/jira/browse/ARROW-17106 and tagged as good-first-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amol- closed pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
amol- closed pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package
URL: https://github.com/apache/arrow/pull/12837


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
raulcd commented on PR #12837:
URL: https://github.com/apache/arrow/pull/12837#issuecomment-1092885454

   @jorisvandenbossche I've opted to just move the previous `parquet.py` to the `parquet/__init__.py` package file. Do you think we should also split the implementation from the `__init__.py` to its own implementation file (like `parquet/parquet.py`) and only expose the API on `__init__.py`?
   From my understanding this is the minimal change for the requested ticket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12837:
URL: https://github.com/apache/arrow/pull/12837#issuecomment-1092847176

   https://issues.apache.org/jira/browse/ARROW-15824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wesm commented on pull request #12837: ARROW-15824: [Python] Make pyarrow.parquet a package

Posted by GitBox <gi...@apache.org>.
wesm commented on PR #12837:
URL: https://github.com/apache/arrow/pull/12837#issuecomment-1186654950

   I apologize for the bikeshed moment, but I was debugging something and found that much of pyarrow/parquet.py had been relocated to pyarrow/parquet/__init__.py -- I have often believed that having non-trivial implementation code in __init__.py files was an antipattern (and we have some code in pyarrow/__init__.py and maybe elsewhere that might be better moved to a utility module). However, my beliefs may not be aligned with broader community practices in the Python community nowadays. Thoughts from other frequent Python contributors (@jorisvandenbossche, @pitrou)? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org