You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Raghotham Murthy (JIRA)" <ji...@apache.org> on 2009/08/15 04:38:14 UTC

[jira] Created: (HIVE-758) [contrib] function to load data from hive to hbase

[contrib] function to load data from hive to hbase
--------------------------------------------------

                 Key: HIVE-758
                 URL: https://issues.apache.org/jira/browse/HIVE-758
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Raghotham Murthy
            Priority: Minor


supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-758) function to load data from hive to hbase

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-758:
----------------------------------

    Attachment: hive-758.2.patch

Added warning to description. But, it doesnt look like i cannot describe a temporary function. I get an error: 
{code}
hive> ADD FILE /data/users/rmurthy/dev/hive/build/dist/lib/hive_contrib.jar;                          
hive> CREATE TEMPORARY FUNCTION hbase_put AS 'org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut';
OK
Time taken: 0.263 seconds
hive> DESCRIBE FUNCTION hash_put;                                                                     
FAILED: Error in metadata: java.lang.NullPointerException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
{code}

> function to load data from hive to hbase
> ----------------------------------------
>
>                 Key: HIVE-758
>                 URL: https://issues.apache.org/jira/browse/HIVE-758
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Contrib
>            Reporter: Raghotham Murthy
>            Priority: Minor
>         Attachments: hive-758.1.patch, hive-758.2.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-758) function to load data from hive to hbase

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-758:
--------------------------------

    Component/s: UDF

> function to load data from hive to hbase
> ----------------------------------------
>
>                 Key: HIVE-758
>                 URL: https://issues.apache.org/jira/browse/HIVE-758
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: HBase Handler, UDF
>            Reporter: Raghotham Murthy
>            Priority: Minor
>         Attachments: hive-758.1.patch, hive-758.2.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-758) [contrib] function to load data from hive to hbase

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-758:
----------------------------------

    Attachment: hive-758.1.patch

This is a UDAF to load the data. It returns the total number of rows loaded. Currently, we need to set HBASE_HOME, HIVE_HOME and HADOOP_HOME before running the test script at contrib/src/test/scripts/udaf_hbase_put_test.sh. We need to use this script because hive -f currently does not support running shell commands within a file.

> [contrib] function to load data from hive to hbase
> --------------------------------------------------
>
>                 Key: HIVE-758
>                 URL: https://issues.apache.org/jira/browse/HIVE-758
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Raghotham Murthy
>            Priority: Minor
>         Attachments: hive-758.1.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-758) [contrib] function to load data from hive to hbase

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744229#action_12744229 ] 

Zheng Shao commented on HIVE-758:
---------------------------------

Both this and HIVE-645 should put a javadoc comment/description on the UDAF/UDF warning people about the possibility of side effects (failed tasks, speculative executions).


> [contrib] function to load data from hive to hbase
> --------------------------------------------------
>
>                 Key: HIVE-758
>                 URL: https://issues.apache.org/jira/browse/HIVE-758
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Raghotham Murthy
>            Priority: Minor
>         Attachments: hive-758.1.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-758) function to load data from hive to hbase

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-758:
----------------------------------

    Component/s:     (was: Query Processor)
                 Contrib
        Summary: function to load data from hive to hbase  (was: [contrib] function to load data from hive to hbase)

> function to load data from hive to hbase
> ----------------------------------------
>
>                 Key: HIVE-758
>                 URL: https://issues.apache.org/jira/browse/HIVE-758
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Contrib
>            Reporter: Raghotham Murthy
>            Priority: Minor
>         Attachments: hive-758.1.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-758) function to load data from hive to hbase

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-758:
----------------------------

    Component/s:     (was: Contrib)
                 HBase Handler

> function to load data from hive to hbase
> ----------------------------------------
>
>                 Key: HIVE-758
>                 URL: https://issues.apache.org/jira/browse/HIVE-758
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: HBase Handler
>            Reporter: Raghotham Murthy
>            Priority: Minor
>         Attachments: hive-758.1.patch, hive-758.2.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-758) function to load data from hive to hbase

Posted by "SeanM (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804938#action_12804938 ] 

SeanM commented on HIVE-758:
----------------------------

This UDAF works well but I've encountered two gotchyas:

*strong*Nested queries with a where clause that filter out records will throw an exception, even if there are no null values in the table whatsoever*strong*
{noformat} 
Hive> SELECT hbase_put("test", rowid, "data", colfamily, value, 0) FROM ( SELECT * FROM some_table WHERE value = "some_value") t1;

java.lang.RuntimeException: Error while closing operators
	at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator.iterate(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int)  on object org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator@70e35d5 of class org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator with arguments {null, null, null, null, null, null} of size 6
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211)
	... 4 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator.iterate(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int)  on object org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator@70e35d5 of class org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator with arguments {null, null, null, null, null, null} of size 6
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:661)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:167)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:110)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:768)
	... 12 more
Caused by: java.lang.IllegalArgumentException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:638)
	... 15 more


It's just a hunch, but it seems like the UDAFs iterate() is being called with null values for rows that were filtered out?
{noformat} 


*strong*Null values*strong*
The UDAF is very sensitive to null values. If using mapped or array types, or any field that may possibly be null, use an if construct for safety:
{noformat}
if (some_field IS NULL, "", some_field)
{noformat}




> function to load data from hive to hbase
> ----------------------------------------
>
>                 Key: HIVE-758
>                 URL: https://issues.apache.org/jira/browse/HIVE-758
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Contrib
>            Reporter: Raghotham Murthy
>            Priority: Minor
>         Attachments: hive-758.1.patch, hive-758.2.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.