You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/04/05 06:22:00 UTC

[jira] [Assigned] (SPARK-34955) ADD JAR command cannot add jar files which contains whitespaces in the path

     [ https://issues.apache.org/jira/browse/SPARK-34955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-34955:
------------------------------------

    Assignee: Apache Spark  (was: Kousuke Saruta)

> ADD JAR command cannot add jar files which contains whitespaces in the path
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-34955
>                 URL: https://issues.apache.org/jira/browse/SPARK-34955
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>            Reporter: Kousuke Saruta
>            Assignee: Apache Spark
>            Priority: Major
>
> ADD JAR command cannot add jar files which contains white spaces in the path.
> If we have `/some/path/test file.jar` and execute the following command:
> {code}
> ADD JAR "/some/path/test file.jar";
> {code}
> The following exception is thrown.
> {code}
> 21/04/05 10:40:38 ERROR SparkSQLDriver: Failed in [add jar "/some/path/test file.jar"]
> java.lang.IllegalArgumentException: Illegal character in path at index 9: /some/path/test file.jar
> 	at java.net.URI.create(URI.java:852)
> 	at org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:129)
> 	at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:34)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> {code}
> This is because `HiveSessionStateBuilder` and `SessionStateBuilder` don't check whether the form of the path is URI or plain path and it always regards the path as URI form.
> Whitespces should be encoded to `%20` so `/some/path/test file.jar` is rejected.
> We can resolve this part by checking whether the given path is URI form or not.
> Unfortunatelly, if we fix this part, another problem occurs.
> When we execute `ADD JAR` command, Hive's `ADD JAR` command is executed in `HiveClientImpl.addJar` and `AddResourceProcessor.run` is transitively invoked.
> In `AddResourceProcessor.run`, the command line is just split by `\\s+` and the path is also split into `/some/path/test` and `file.jar` and passed to `ss.add_resources`.
> https://github.com/apache/hive/blob/f1e87137034e4ecbe39a859d4ef44319800016d7/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java#L56-L75
> So, the command still fails.
> Even if we convert the form of the path to URI like `file:/some/path/test%20file.jar` and execute the following command:
> {code}
> ADD JAR "file:/some/path/test%20file";
> {code}
> The following exception is thrown.
> {code}
> 21/04/05 10:40:53 ERROR SessionState: file:/some/path/test%20file.jar does not exist
> java.lang.IllegalArgumentException: file:/some/path/test%20file.jar does not exist
> 	at org.apache.hadoop.hive.ql.session.SessionState.validateFiles(SessionState.java:1168)
> 	at org.apache.hadoop.hive.ql.session.SessionState$ResourceType.preHook(SessionState.java:1289)
> 	at org.apache.hadoop.hive.ql.session.SessionState$ResourceType$1.preHook(SessionState.java:1278)
> 	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1378)
> 	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1336)
> 	at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:74)
> {code}
> The reason is `Utilities.realFile` invoked in `SessionState.validateFiles` returns `null` as the result of `fs.exists(path)` is `false`.
> https://github.com/apache/hive/blob/f1e87137034e4ecbe39a859d4ef44319800016d7/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1052-L1064
> `fs.exists` checks the existence of the given path by comparing the string representation of Hadoop's `Path`.
> The string representation of `Path` is similar to URI but it's actually different.
> `Path` doesn't encode the given path.
> For example, the URI form of `/some/path/jar file.jar` is `file:/some/path/jar%20file.jar` but the `Path` form of it is `file:/some/path/jar file.jar`. So `fs.exists` returns false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org