You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/06/27 00:24:00 UTC

[jira] [Work logged] (HIVE-25779) Deduplicate SerDe Info

     [ https://issues.apache.org/jira/browse/HIVE-25779?focusedWorklogId=784900&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-784900 ]

ASF GitHub Bot logged work on HIVE-25779:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jun/22 00:23
            Start Date: 27/Jun/22 00:23
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on PR #3221:
URL: https://github.com/apache/hive/pull/3221#issuecomment-1166693492

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 784900)
    Time Spent: 50m  (was: 40m)

> Deduplicate SerDe Info
> ----------------------
>
>                 Key: HIVE-25779
>                 URL: https://issues.apache.org/jira/browse/HIVE-25779
>             Project: Hive
>          Issue Type: New Feature
>          Components: Standalone Metastore
>            Reporter: Yu-Wen Lai
>            Assignee: Yu-Wen Lai
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> The proposal is that we can reuse serde info as how we reuse column descriptors. (HIVE-2246)
> Currently, we store the metadata for partitions as PARTITIONS (N partitions) -> SDS (N locations) -> SERDES (N entries). However,  all the SERDES for the partitions in a table are the same if we don't explicitly specify it. That is, each storage descriptor has a associated and exclusive serde info, but the partitions' serde infos are mostly just the same as the table's. By reusing the serde info, we can save some database storage and enhance the query performance from HMS to the backend database.
> For backward compatibility, we also need to introduce a config for this feature because there will be issues if HMS old instance and HMS new instance with this feature are running together. With this feature, we will need to check if others reference the serdes before deleting it, but the old instance will just delete it.
> The other thing we need to take care of is custom serdes. If a partition's serde is modified, we need to create a new record in SERDES so that we don't interfere other partitions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)