You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/11 18:33:08 UTC

[GitHub] [arrow] itamarst opened a new pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

itamarst opened a new pull request #10917:
URL: https://github.com/apache/arrow/pull/10917


   Fixes https://issues.apache.org/jira/browse/ARROW-9226
   
   The functionality for reading `core-site.xml` is inside `libhdfs`, all this does is expose this functionality to the new API, as it was in `pyarrow.hdfs.connect()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] itamarst commented on pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
itamarst commented on pull request #10917:
URL: https://github.com/apache/arrow/pull/10917#issuecomment-897059588


   I have tested this with a locally configured setup, and @jmwinton will be testing it as well with a more sophisticated setup. Basic setup:
   
   Starts up a server or two: `mapred minicluster -Dnamenodes=2 -format -nnport 9030`
   
   Edit `etc/hadoop/core-site.xml` in $HADOOP_HOME so it points at these servers:
   
   ```xml
   <property>
       <name>fs.defaultFS</name>
       <value>hdfs://localhost:9030</value>
       <description>Where HDFS NameNode can be found on the network</description>
   </property>
   ```
   
   The following program should give the same results for `example.py localhost 9030` and `example.py default 0` (the latter will get the host/port from the `core-site.xml` config file we edited above):
   
   ```python
   import pyarrow.fs
   import sys
   
   hdfs_interface = pyarrow.fs.HadoopFileSystem(host=sys.argv[1], port=int(sys.argv[2]))
   print("ls 1:")
   print(hdfs_interface.get_file_info("/")
   listing = hdfs_interface.get_file_info("/")
   print("ls 2: ")
   print(listing, sep="\n")
   ```
   
   Thanks to @jwminton for figuring out the above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10917:
URL: https://github.com/apache/arrow/pull/10917#issuecomment-897858936


   Will do then. Thanks a lot for contributing this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] itamarst edited a comment on pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
itamarst edited a comment on pull request #10917:
URL: https://github.com/apache/arrow/pull/10917#issuecomment-897059588


   I have tested this with a locally configured setup, and @jwminton will be testing it as well with a more sophisticated setup. Basic setup:
   
   Starts up a server or two: `mapred minicluster -Dnamenodes=2 -format -nnport 9030`
   
   Edit `etc/hadoop/core-site.xml` in $HADOOP_HOME so it points at these servers:
   
   ```xml
   <property>
       <name>fs.defaultFS</name>
       <value>hdfs://localhost:9030</value>
       <description>Where HDFS NameNode can be found on the network</description>
   </property>
   ```
   
   The following program should give the same results for `example.py localhost 9030` and `example.py default 0` (the latter will get the host/port from the `core-site.xml` config file we edited above):
   
   ```python
   import pyarrow.fs
   import sys
   
   hdfs_interface = pyarrow.fs.HadoopFileSystem(host=sys.argv[1], port=int(sys.argv[2]))
   print("ls 1:")
   print(hdfs_interface.get_file_info("/")
   listing = hdfs_interface.get_file_info("/")
   print("ls 2: ")
   print(listing, sep="\n")
   ```
   
   Thanks to @jwminton for figuring out the above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10917:
URL: https://github.com/apache/arrow/pull/10917#issuecomment-897430263


   Well, that was easier than I imagined :-) Should I wait for further confirmation that it works as expected?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10917:
URL: https://github.com/apache/arrow/pull/10917#issuecomment-897859723


   @github-actions crossbow submit test-*hdfs*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10917:
URL: https://github.com/apache/arrow/pull/10917#issuecomment-897056822


   https://issues.apache.org/jira/browse/ARROW-9226


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10917:
URL: https://github.com/apache/arrow/pull/10917


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10917:
URL: https://github.com/apache/arrow/pull/10917#issuecomment-897860312


   Revision: 92a4cabb95e42f878dbee0ef1627ef25edb93571
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-772](https://github.com/ursacomputing/crossbow/branches/all?query=actions-772)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.7-hdfs-2.9.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-772-github-test-conda-python-3.7-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-772-github-test-conda-python-3.7-hdfs-2.9.2)|
   |test-conda-python-3.7-hdfs-3.2.1|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-772-github-test-conda-python-3.7-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-772-github-test-conda-python-3.7-hdfs-3.2.1)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] itamarst commented on pull request #10917: ARROW-9226: [Python] Support core-site.xml default filesystem.

Posted by GitBox <gi...@apache.org>.
itamarst commented on pull request #10917:
URL: https://github.com/apache/arrow/pull/10917#issuecomment-897831509


   Can just merge if you're happy with it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org