You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/09 09:13:30 UTC

[GitHub] [arrow] jorisvandenbossche opened a new pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

jorisvandenbossche opened a new pull request #8149:
URL: https://github.com/apache/arrow/pull/8149


   WIP, still need to deprecate hdfs as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#discussion_r500982607



##########
File path: python/pyarrow/filesystem.py
##########
@@ -237,12 +238,28 @@ class LocalFileSystem(FileSystem):
 
     _instance = None
 
+    def __init__(self):
+        warnings.warn(
+            "pyarrow.filesystem.LocalFileSystem is deprecated as of 2.0.0, "
+            "please use pyarrow.fs.LocalFileSystem instead",
+            DeprecationWarning, stacklevel=2)

Review comment:
       Put a helper in `pyarrow.util`? (and use `FutureWarning`?)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#discussion_r500982254



##########
File path: python/pyarrow/__init__.py
##########
@@ -190,23 +191,58 @@ def show_versions():
                          SerializationCallbackError,
                          DeserializationCallbackError)
 
-from pyarrow.filesystem import FileSystem, LocalFileSystem
-
-from pyarrow.hdfs import HadoopFileSystem
 import pyarrow.hdfs as hdfs
 
 from pyarrow.ipc import serialize_pandas, deserialize_pandas
 import pyarrow.ipc as ipc
 
-
-localfs = LocalFileSystem.get_instance()
-
 from pyarrow.serialization import (default_serialization_context,
                                    register_default_serialization_handlers,
                                    register_torch_serialization_handlers)
 
 import pyarrow.types as types
 
+
+# deprecated filesystems
+
+from pyarrow.filesystem import FileSystem as _FileSystem, LocalFileSystem as _LocalFileSystem
+from pyarrow.hdfs import HadoopFileSystem as _HadoopFileSystem
+
+_localfs = _LocalFileSystem._get_instance()
+
+
+_msg = "pyarrow.{0} is deprecated as of 2.0.0, please use pyarrow.fs.{1} instead."
+
+
+if _sys.version_info >= (3, 7):
+    def __getattr__(name):
+        if name == "localfs":

Review comment:
       Looks like you want to write a loop and/or helper function to avoid all this copy/pasting?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8149:
URL: https://github.com/apache/arrow/pull/8149


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#issuecomment-704897703


   @pitrou @kszucs more feedback here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#issuecomment-701980265


   We *could* in theory also add a deprecation in each method on the legacy HadoopFileSystem, so that any use outside of passing it to one of our own functions accepting a filesystem gets deprecated, but not the construction of the filesystem object itself (and then we can eventually return a different object, and for now convert it internally). 
   
   But, even if we do the effort for that, we are still left with this `pyarrow.hdfs` namespace which only has the `connect()` function long term ..
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#discussion_r501021863



##########
File path: python/pyarrow/__init__.py
##########
@@ -190,23 +191,58 @@ def show_versions():
                          SerializationCallbackError,
                          DeserializationCallbackError)
 
-from pyarrow.filesystem import FileSystem, LocalFileSystem
-
-from pyarrow.hdfs import HadoopFileSystem
 import pyarrow.hdfs as hdfs
 
 from pyarrow.ipc import serialize_pandas, deserialize_pandas
 import pyarrow.ipc as ipc
 
-
-localfs = LocalFileSystem.get_instance()
-
 from pyarrow.serialization import (default_serialization_context,
                                    register_default_serialization_handlers,
                                    register_torch_serialization_handlers)
 
 import pyarrow.types as types
 
+
+# deprecated filesystems
+
+from pyarrow.filesystem import FileSystem as _FileSystem, LocalFileSystem as _LocalFileSystem
+from pyarrow.hdfs import HadoopFileSystem as _HadoopFileSystem
+
+_localfs = _LocalFileSystem._get_instance()
+
+
+_msg = "pyarrow.{0} is deprecated as of 2.0.0, please use pyarrow.fs.{1} instead."
+
+
+if _sys.version_info >= (3, 7):
+    def __getattr__(name):
+        if name == "localfs":

Review comment:
       Put the info in a dict, and simplified this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#discussion_r501034872



##########
File path: python/pyarrow/filesystem.py
##########
@@ -237,12 +238,28 @@ class LocalFileSystem(FileSystem):
 
     _instance = None
 
+    def __init__(self):
+        warnings.warn(
+            "pyarrow.filesystem.LocalFileSystem is deprecated as of 2.0.0, "
+            "please use pyarrow.fs.LocalFileSystem instead",
+            DeprecationWarning, stacklevel=2)

Review comment:
       Moved a template to `pyarrow.util`. Can also make it a function raising the warning if you prefer




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#discussion_r501068821



##########
File path: python/pyarrow/filesystem.py
##########
@@ -431,7 +447,18 @@ def _ensure_filesystem(fs):
             # In case its a simple LocalFileSystem (e.g. dask) use native arrow
             # FS
             elif mro.__name__ == 'LocalFileSystem':
-                return LocalFileSystem.get_instance()
+                return LocalFileSystem._get_instance()
+
+        try:
+            import fsspec

Review comment:
       `fsspec` seems to take some time to import, we should probably lookup directly in `sys.modules` instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#discussion_r500981913



##########
File path: python/pyarrow/__init__.py
##########
@@ -190,23 +191,58 @@ def show_versions():
                          SerializationCallbackError,
                          DeserializationCallbackError)
 
-from pyarrow.filesystem import FileSystem, LocalFileSystem
-
-from pyarrow.hdfs import HadoopFileSystem
 import pyarrow.hdfs as hdfs
 
 from pyarrow.ipc import serialize_pandas, deserialize_pandas
 import pyarrow.ipc as ipc
 
-
-localfs = LocalFileSystem.get_instance()
-
 from pyarrow.serialization import (default_serialization_context,
                                    register_default_serialization_handlers,
                                    register_torch_serialization_handlers)
 
 import pyarrow.types as types
 
+
+# deprecated filesystems
+
+from pyarrow.filesystem import FileSystem as _FileSystem, LocalFileSystem as _LocalFileSystem
+from pyarrow.hdfs import HadoopFileSystem as _HadoopFileSystem
+
+_localfs = _LocalFileSystem._get_instance()
+
+
+_msg = "pyarrow.{0} is deprecated as of 2.0.0, please use pyarrow.fs.{1} instead."
+
+
+if _sys.version_info >= (3, 7):
+    def __getattr__(name):
+        if name == "localfs":
+            _warnings.warn(_msg.format("localfs", "LocalFileSystem"),
+                           DeprecationWarning, stacklevel=2)
+            return _localfs
+        elif name == "FileSystem":
+            _warnings.warn(_msg.format("FileSystem", "FileSystem"),
+                           DeprecationWarning, stacklevel=2)
+            return _FileSystem
+        elif name == "LocalFileSystem":
+            _warnings.warn(_msg.format("LocalFileSystem", "LocalFileSystem"),
+                           DeprecationWarning, stacklevel=2)
+            return _LocalFileSystem
+        elif name == "HadoopFileSystem":
+            _warnings.warn(_msg.format("HadoopFileSystem", "HadoopFileSystem"),
+                           DeprecationWarning, stacklevel=2)

Review comment:
       We use `FutureWarning` elsewhere.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#issuecomment-704827008


   @github-actions crossbow submit -g integration


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#issuecomment-704833499


   Revision: dfdc62246bb4bb450bbbaba2b4247d3edeeb2264
   
   Submitted crossbow builds: [ursa-labs/crossbow @ actions-607](https://github.com/ursa-labs/crossbow/branches/all?query=actions-607)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.6-pandas-0.23|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.6-pandas-0.23)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.6-pandas-0.23)|
   |test-conda-python-3.7-dask-latest|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-dask-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-dask-latest)|
   |test-conda-python-3.7-hdfs-2.9.2|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-hdfs-2.9.2)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-hdfs-2.9.2)|
   |test-conda-python-3.7-kartothek-latest|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-kartothek-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-kartothek-latest)|
   |test-conda-python-3.7-kartothek-master|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-kartothek-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-kartothek-master)|
   |test-conda-python-3.7-pandas-latest|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-pandas-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-pandas-latest)|
   |test-conda-python-3.7-pandas-master|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-pandas-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-pandas-master)|
   |test-conda-python-3.7-spark-branch-3.0|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-spark-branch-3.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-spark-branch-3.0)|
   |test-conda-python-3.7-turbodbc-latest|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-turbodbc-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-turbodbc-latest)|
   |test-conda-python-3.7-turbodbc-master|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.7-turbodbc-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.7-turbodbc-master)|
   |test-conda-python-3.8-dask-master|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.8-dask-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.8-dask-master)|
   |test-conda-python-3.8-jpype|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.8-jpype)|
   |test-conda-python-3.8-pandas-latest|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.8-pandas-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.8-pandas-latest)|
   |test-conda-python-3.8-spark-master|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-607-github-test-conda-python-3.8-spark-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-607-github-test-conda-python-3.8-spark-master)|


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#discussion_r501026239



##########
File path: python/pyarrow/filesystem.py
##########
@@ -237,12 +238,28 @@ class LocalFileSystem(FileSystem):
 
     _instance = None
 
+    def __init__(self):
+        warnings.warn(
+            "pyarrow.filesystem.LocalFileSystem is deprecated as of 2.0.0, "
+            "please use pyarrow.fs.LocalFileSystem instead",
+            DeprecationWarning, stacklevel=2)

Review comment:
       There is a `_deprecate_class` in ``pyarrow.util``, but that serves a different purpose (it aliases the old name to the new class, but with warning). 
   
   And if it is just a helper for raising the warning, I am not sure that is worth it (compared to using a template message that can be filled in). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#issuecomment-689442469


   https://issues.apache.org/jira/browse/ARROW-9645


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#issuecomment-701529217


   I think we should deprecate `pa.hdfs.connect`. We cannot return something else from it...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8149:
URL: https://github.com/apache/arrow/pull/8149#issuecomment-701502891


   The main remaining items is `pa.hdfs.connect()`. It feels a bit annoying to deprecate this *again*, after we let people migrate from `HdfsClient` to `connect` (eg https://stackoverflow.com/questions/47400987/hdfs-connect-vs-hdfsclient-in-pyarrow). 
   
   I was thinking to keep `pa.hdfs.connect`, but letting it return the new filesystem? But since the returned object has a different API, that would also break a lot of code ..
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org