You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabor Somogyi (JIRA)" <ji...@apache.org> on 2018/01/02 10:59:00 UTC
[jira] [Commented] (SPARK-21687) Spark SQL should set createTime
for Hive partition
[ https://issues.apache.org/jira/browse/SPARK-21687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307877#comment-16307877 ]
Gabor Somogyi commented on SPARK-21687:
---------------------------------------
I would like to work on this. Please notify me if somebody already started.
> Spark SQL should set createTime for Hive partition
> --------------------------------------------------
>
> Key: SPARK-21687
> URL: https://issues.apache.org/jira/browse/SPARK-21687
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.1.0, 2.2.0
> Reporter: Chaozhong Yang
> Priority: Minor
>
> In Spark SQL, we often use `insert overwite table t partition(p=xx)` to create partition for partitioned table. `createTime` is an important information to manage data lifecycle, e.g TTL.
> However, we found that Spark SQL doesn't call setCreateTime in `HiveClientImpl#toHivePartition` as follows:
> {code:scala}
> def toHivePartition(
> p: CatalogTablePartition,
> ht: HiveTable): HivePartition = {
> val tpart = new org.apache.hadoop.hive.metastore.api.Partition
> val partValues = ht.getPartCols.asScala.map { hc =>
> p.spec.get(hc.getName).getOrElse {
> throw new IllegalArgumentException(
> s"Partition spec is missing a value for column '${hc.getName}': ${p.spec}")
> }
> }
> val storageDesc = new StorageDescriptor
> val serdeInfo = new SerDeInfo
> p.storage.locationUri.map(CatalogUtils.URIToString(_)).foreach(storageDesc.setLocation)
> p.storage.inputFormat.foreach(storageDesc.setInputFormat)
> p.storage.outputFormat.foreach(storageDesc.setOutputFormat)
> p.storage.serde.foreach(serdeInfo.setSerializationLib)
> serdeInfo.setParameters(p.storage.properties.asJava)
> storageDesc.setSerdeInfo(serdeInfo)
> tpart.setDbName(ht.getDbName)
> tpart.setTableName(ht.getTableName)
> tpart.setValues(partValues.asJava)
> tpart.setSd(storageDesc)
> new HivePartition(ht, tpart)
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org