You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ratandeep Ratti (JIRA)" <ji...@apache.org> on 2018/04/20 17:17:00 UTC

[jira] [Created] (HIVE-19256) UDF which shapes the input data according to the specified schema

Ratandeep Ratti created HIVE-19256:
--------------------------------------

             Summary: UDF which shapes the input data according to the specified schema
                 Key: HIVE-19256
                 URL: https://issues.apache.org/jira/browse/HIVE-19256
             Project: Hive
          Issue Type: New Feature
            Reporter: Ratandeep Ratti
            Assignee: Ratandeep Ratti


We use this UDF a lot in our org. This UDF takes an object and a Hive schema and make sure the output object matches the schema completely. In some respects it is similar to {{named
_struct}} UDF which can be used to select columns from a struct, but it is more general since it can work not only on structs, but all Hive data types (expect union). Also the schema can provide certain valid type conversions (int -> double etc)

One scenario where this is quite useful is making sure that the Hive view created with a specific schema will have columns which will always match that schema. In Hive today when a view is created, new nested columns from the underlying table can leak out from the view, even though the user never wanted this behavior. Note that this leaking of columns is only for nested columns and not for top level columns, so in that regard this behavior of Hive is inconsistent.

Sample usage of the UDF
{code}
generic_project(col, "struct<a:array<struct<c:int,d:string>>>") // Returning data which matches the input schema. Here extra columns which are not part of the input will be removed

generic_project(col, "struct<a:double>") //  If the input column had a struct with col a as int . It would type cast 'a' to double.
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)