You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Mithun Radhakrishnan (JIRA)" <ji...@apache.org> on 2019/06/14 20:34:00 UTC

[jira] [Commented] (HIVE-21877) Change HCatTableInfo to not be transient in PartInfo

    [ https://issues.apache.org/jira/browse/HIVE-21877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864420#comment-16864420 ] 

Mithun Radhakrishnan commented on HIVE-21877:
---------------------------------------------

Pasting your question from the PR here:

{quote}
While using Hcatalog with Apache Beam, we ran into an issue with HCatTableInfo being null during serialization. I don't see a reason why it should be transient. However, there might be use-cases that I may not be aware of and might require it to be transient. Would love to hear some feedback regardless.
{quote}

This has to do with HIVE-9845. It would not be a good idea to make HCatTableInfo non-transient. Doing so will make Pig/HCatLoader, as well as {{HCatInputFormat}} inefficient for large partition sets.
{{HCatTableInfo}} contains table-information that is static for all partition within a partition-set for a given table. {{PartInfo}} is the variable part. Serializing this multiple times for a partition set increases the split-meta-info for a Hadoop job to unreasonable lengths.

I would advise perusing the HCat code to see how {{HCatTableInfo}} is restored, post serialization.

> Change HCatTableInfo to not be transient in PartInfo
> ----------------------------------------------------
>
>                 Key: HIVE-21877
>                 URL: https://issues.apache.org/jira/browse/HIVE-21877
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Ankit Jhalaria
>            Assignee: Ankit Jhalaria
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since HCatTableInfo is serializable, removing the transient annotation from it. We were running into NPE during serialization while using HCatalogIO with Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)