You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Julian Fleischer (Jira)" <ji...@apache.org> on 2021/02/15 01:52:00 UTC

[jira] [Created] (SPARK-34438) Python Driver is not correctly detected using presigned URLs

Julian Fleischer created SPARK-34438:
----------------------------------------

             Summary: Python Driver is not correctly detected using presigned URLs
                 Key: SPARK-34438
                 URL: https://issues.apache.org/jira/browse/SPARK-34438
             Project: Spark
          Issue Type: Bug
          Components: Spark Submit
    Affects Versions: 3.0.1, 3.0.0, 3.0.2, 3.1.0
            Reporter: Julian Fleischer


In AWS one can generate so-called presigned URLs. spark-submit accepts URLs for the driver program, e.g. {{http://my-web-server/driver.py}}. Now a presigned URL has a query fragment {{http://my-web-server/driver.py?signature}}.

Now the check for whether the given URL is a python driver simply checks whether it ends in {{.py}} – which the presigned URL does not, as it ends in {{signature}}.

The relevant check is in {{SparkSubmit.scala}}, Line 1051 (commit tagged {{v3.0.1}}):

[https://github.com/apache/spark/blob/v3.0.1/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1051] 

Here is a more realistic example URL:

{{https://bucket-name.s3.us-east-1.amazonaws.com/driver.py?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIATBNPKWPCNUMWMLUR%2F20210214%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210214T062047Z&X-Amz-Expires=172800&X-Amz-SignedHeaders=host&X-Amz-Signature=49ef39b6bb7090001af9312692788892551916a6ac0ff6c961ce52efb9acc235}}

A fix could be to parse the the given path as a {{java.net.URI}} and look for the pathname to end in {{.py}} (as opposed to the whole thing).

To circumvent this issue I am currently appending a fragment to the query which makes it end in {{.py}}, i.e. {{http://my-web-server/driver.py?signature#.py}} which does work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org