You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/05/24 04:42:43 UTC

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41292: [SPARK-43768][PYTHON][CONNECT] Python dependency management support in Python Spark Connect

HyukjinKwon commented on code in PR #41292:
URL: https://github.com/apache/spark/pull/41292#discussion_r1203417125


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -154,29 +163,52 @@ def _parse_artifacts(self, path_or_uri: str, pyfile: bool) -> List[Artifact]:
                 sys.path.insert(1, local_path)
                 artifact = new_pyfile_artifact(name, LocalFile(local_path))
                 importlib.invalidate_caches()
+            elif archive and (
+                name.endswith(".zip")
+                or name.endswith(".jar")
+                or name.endswith(".tar.gz")
+                or name.endswith(".tgz")
+                or name.endswith(".tar")
+            ):
+                assert any(name.endswith(s) for s in (".zip", ".jar", ".tar.gz", ".tgz", ".tar"))
+
+                if parsed.fragment != "":
+                    # Minimal fix for the workaround of fragment handling in URI.
+                    # This has a limitation - hash(#) in the file name would not work.
+                    if "#" in local_path:
+                        raise ValueError("'#' in the path is not supported for adding an archive.")
+                    name = f"{name}#{parsed.fragment}"

Review Comment:
   This is actually pretty ugly workaround to support fragment in URI (but I believe this is the minimized change). Maybe we should pass URI instead of file path in `Artifact`s (?) but I would like to avoid touching the whole implementation in this PR. cc @hvanhovell @vicennial



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org