You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabor Somogyi (JIRA)" <ji...@apache.org> on 2018/01/02 10:59:00 UTC

[jira] [Commented] (SPARK-21687) Spark SQL should set createTime for Hive partition

    [ https://issues.apache.org/jira/browse/SPARK-21687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307877#comment-16307877 ] 

Gabor Somogyi commented on SPARK-21687:
---------------------------------------

I would like to work on this. Please notify me if somebody already started.

> Spark SQL should set createTime for Hive partition
> --------------------------------------------------
>
>                 Key: SPARK-21687
>                 URL: https://issues.apache.org/jira/browse/SPARK-21687
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Chaozhong Yang
>            Priority: Minor
>
> In Spark SQL, we often use `insert overwite table t partition(p=xx)` to create partition for partitioned table. `createTime` is an important information to manage data lifecycle, e.g TTL.
> However, we found that Spark SQL doesn't call setCreateTime in `HiveClientImpl#toHivePartition` as follows:
> {code:scala}
> def toHivePartition(
>       p: CatalogTablePartition,
>       ht: HiveTable): HivePartition = {
>     val tpart = new org.apache.hadoop.hive.metastore.api.Partition
>     val partValues = ht.getPartCols.asScala.map { hc =>
>       p.spec.get(hc.getName).getOrElse {
>         throw new IllegalArgumentException(
>           s"Partition spec is missing a value for column '${hc.getName}': ${p.spec}")
>       }
>     }
>     val storageDesc = new StorageDescriptor
>     val serdeInfo = new SerDeInfo
>     p.storage.locationUri.map(CatalogUtils.URIToString(_)).foreach(storageDesc.setLocation)
>     p.storage.inputFormat.foreach(storageDesc.setInputFormat)
>     p.storage.outputFormat.foreach(storageDesc.setOutputFormat)
>     p.storage.serde.foreach(serdeInfo.setSerializationLib)
>     serdeInfo.setParameters(p.storage.properties.asJava)
>     storageDesc.setSerdeInfo(serdeInfo)
>     tpart.setDbName(ht.getDbName)
>     tpart.setTableName(ht.getTableName)
>     tpart.setValues(partValues.asJava)
>     tpart.setSd(storageDesc)
>     new HivePartition(ht, tpart)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org