You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Song Jun (JIRA)" <ji...@apache.org> on 2017/01/22 12:09:27 UTC

[jira] [Updated] (SPARK-19332) table's location should check if a URI is legal

     [ https://issues.apache.org/jira/browse/SPARK-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Song Jun updated SPARK-19332:
-----------------------------
    Description: 
SPARK-19257 ‘s work is to change the type of  `CatalogStorageFormat` 's locationUri to `URI`, while it has some problem:

1.`CatalogTable` and `CatalogTablePartition` use the same class `CatalogStorageFormat`
2. the type URI is ok for `CatalogTable`, but it is not proper for `CatalogTablePartition`
3. the location of a table partition can contains a not encode whitespace, so 
  if a partition location contains this not encode whitespace, and it will throw an exception for URI. for example `/path/2014-01-01 00%3A00%3A00` is a partition location which has whitespace

so if we change the type to URI, it is bad for `CatalogTablePartition`

and I found Hive has the same issue HIVE-6185
before hive 0.13 the location is URI, while after above PR, it change it to Path, and do some check when DDL.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java#L1553
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L3732

so I think ,we can do the URI check for the table's location , and it is not proper to change the type to URI.


  was:
~SPARK-19257 ‘s work is to change the type of  `CatalogStorageFormat` 's locationUri to `URI`, while it has some problem:

1.`CatalogTable` and `CatalogTablePartition` use the same class `CatalogStorageFormat`
2. the type URI is ok for `CatalogTable`, but it is not proper for `CatalogTablePartition`
3. the location of a table partition can contains a not encode whitespace, so 
  if a partition location contains this not encode whitespace, and it will throw an exception for URI. for example `/path/2014-01-01 00%3A00%3A00` is a partition location which has whitespace

so if we change the type to URI, it is bad for `CatalogTablePartition`

and I found Hive has the same issue ~HIVE-6185
before hive 0.13 the location is URI, while after above PR, it change it to Path, and do some check when DDL.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java#L1553
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L3732

so I think ,we can do the URI check for the table's location , and it is not proper to change the type to URI.



> table's location should check if a URI is legal
> -----------------------------------------------
>
>                 Key: SPARK-19332
>                 URL: https://issues.apache.org/jira/browse/SPARK-19332
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Song Jun
>
> SPARK-19257 ‘s work is to change the type of  `CatalogStorageFormat` 's locationUri to `URI`, while it has some problem:
> 1.`CatalogTable` and `CatalogTablePartition` use the same class `CatalogStorageFormat`
> 2. the type URI is ok for `CatalogTable`, but it is not proper for `CatalogTablePartition`
> 3. the location of a table partition can contains a not encode whitespace, so 
>   if a partition location contains this not encode whitespace, and it will throw an exception for URI. for example `/path/2014-01-01 00%3A00%3A00` is a partition location which has whitespace
> so if we change the type to URI, it is bad for `CatalogTablePartition`
> and I found Hive has the same issue HIVE-6185
> before hive 0.13 the location is URI, while after above PR, it change it to Path, and do some check when DDL.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java#L1553
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L3732
> so I think ,we can do the URI check for the table's location , and it is not proper to change the type to URI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org