You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ratandeep Ratti (JIRA)" <ji...@apache.org> on 2015/04/10 17:51:13 UTC

[jira] [Created] (HIVE-10301) Enhancing View registration and access with dynamic dependency artifact resolution

Ratandeep Ratti created HIVE-10301:
--------------------------------------

             Summary: Enhancing View registration and access with dynamic dependency artifact resolution
                 Key: HIVE-10301
                 URL: https://issues.apache.org/jira/browse/HIVE-10301
             Project: Hive
          Issue Type: New Feature
            Reporter: Ratandeep Ratti
            Assignee: Ratandeep Ratti


Since we now have dynamic dependency artifact resolution in Hive (HIVE-9664) . I think we can improve upon view creation (and accessing) process which involve UDFs in their view definition.
 
An example will illustrate what I'm suggesting.

Say we have a simple view definition which involves a UDF function
{code}
hive
> add jar udf-0.0.1.jar;
> create temporary function fn as examples.FunctionUDF;
> create view v as select fn(*) from db.table;
{code}

Now, once the session is closed the view will exist, but the function will not.
In a new session, if we tried to query the view it will fail since it will not be able to find the function, unless we manually re-register the dependency and the function.

I suggest the following improvement for view registration which involves functions/udfs (Provided the udfs is present in some ivy/maven repo)
{code}
hive
> create view v 
tblproperties(
   'dependencies' = 'ivy://org.company.udfs:udf:0.0.1',
   'functions'    = 'fn:examples.FunctionUDF'
)  as select fn(*) from db.table;
{code}

The view's metadata now contains the artifact coordinates and also the function to class mapping. Hive can make use this of this information during view registration and view access. 

Now when a view is created or accessed, before that, Hive will download the artifact from the ivy coordinates specified and register a temporary function with the specified class mapping.

Note above that the user does not have to enter commands "add jar" and "create temporary function".


There are a few things to think through though

1. What if a view 'v' with function and dependency metadata is dependent upon another view which also has its own function and dependency metadata.
In this case we can traverse the view dependency chain/graph and register the functions and dependencies for all views, before the said view 'v' is accessed. Note that there could be function name conflicts in this case. Maybe we could resolve this by prefixing db and table-name to function names during view registration?

2. Certain queries may require some configuration to be set before the query is executed. Could we also specify the configuration setting as part of the table properties?


I'll update the bug again with formal metadata parameters that could be support and complete view creation syntax.

Please do update with your thoughts and concerns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)