You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Raghotham Murthy (JIRA)" <ji...@apache.org> on 2009/08/15 04:38:14 UTC
[jira] Created: (HIVE-758) [contrib] function to load data from
hive to hbase
[contrib] function to load data from hive to hbase
--------------------------------------------------
Key: HIVE-758
URL: https://issues.apache.org/jira/browse/HIVE-758
Project: Hadoop Hive
Issue Type: New Feature
Components: Query Processor
Reporter: Raghotham Murthy
Priority: Minor
supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-758) function to load data from hive to hbase
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghotham Murthy updated HIVE-758:
----------------------------------
Attachment: hive-758.2.patch
Added warning to description. But, it doesnt look like i cannot describe a temporary function. I get an error:
{code}
hive> ADD FILE /data/users/rmurthy/dev/hive/build/dist/lib/hive_contrib.jar;
hive> CREATE TEMPORARY FUNCTION hbase_put AS 'org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut';
OK
Time taken: 0.263 seconds
hive> DESCRIBE FUNCTION hash_put;
FAILED: Error in metadata: java.lang.NullPointerException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
{code}
> function to load data from hive to hbase
> ----------------------------------------
>
> Key: HIVE-758
> URL: https://issues.apache.org/jira/browse/HIVE-758
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Contrib
> Reporter: Raghotham Murthy
> Priority: Minor
> Attachments: hive-758.1.patch, hive-758.2.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-758) function to load data from hive to hbase
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-758:
--------------------------------
Component/s: UDF
> function to load data from hive to hbase
> ----------------------------------------
>
> Key: HIVE-758
> URL: https://issues.apache.org/jira/browse/HIVE-758
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: HBase Handler, UDF
> Reporter: Raghotham Murthy
> Priority: Minor
> Attachments: hive-758.1.patch, hive-758.2.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-758) [contrib] function to load data from
hive to hbase
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghotham Murthy updated HIVE-758:
----------------------------------
Attachment: hive-758.1.patch
This is a UDAF to load the data. It returns the total number of rows loaded. Currently, we need to set HBASE_HOME, HIVE_HOME and HADOOP_HOME before running the test script at contrib/src/test/scripts/udaf_hbase_put_test.sh. We need to use this script because hive -f currently does not support running shell commands within a file.
> [contrib] function to load data from hive to hbase
> --------------------------------------------------
>
> Key: HIVE-758
> URL: https://issues.apache.org/jira/browse/HIVE-758
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Raghotham Murthy
> Priority: Minor
> Attachments: hive-758.1.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-758) [contrib] function to load data from
hive to hbase
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744229#action_12744229 ]
Zheng Shao commented on HIVE-758:
---------------------------------
Both this and HIVE-645 should put a javadoc comment/description on the UDAF/UDF warning people about the possibility of side effects (failed tasks, speculative executions).
> [contrib] function to load data from hive to hbase
> --------------------------------------------------
>
> Key: HIVE-758
> URL: https://issues.apache.org/jira/browse/HIVE-758
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Raghotham Murthy
> Priority: Minor
> Attachments: hive-758.1.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-758) function to load data from hive to hbase
Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghotham Murthy updated HIVE-758:
----------------------------------
Component/s: (was: Query Processor)
Contrib
Summary: function to load data from hive to hbase (was: [contrib] function to load data from hive to hbase)
> function to load data from hive to hbase
> ----------------------------------------
>
> Key: HIVE-758
> URL: https://issues.apache.org/jira/browse/HIVE-758
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Contrib
> Reporter: Raghotham Murthy
> Priority: Minor
> Attachments: hive-758.1.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-758) function to load data from hive to hbase
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-758:
----------------------------
Component/s: (was: Contrib)
HBase Handler
> function to load data from hive to hbase
> ----------------------------------------
>
> Key: HIVE-758
> URL: https://issues.apache.org/jira/browse/HIVE-758
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: HBase Handler
> Reporter: Raghotham Murthy
> Priority: Minor
> Attachments: hive-758.1.patch, hive-758.2.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-758) function to load data from hive to
hbase
Posted by "SeanM (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804938#action_12804938 ]
SeanM commented on HIVE-758:
----------------------------
This UDAF works well but I've encountered two gotchyas:
*strong*Nested queries with a where clause that filter out records will throw an exception, even if there are no null values in the table whatsoever*strong*
{noformat}
Hive> SELECT hbase_put("test", rowid, "data", colfamily, value, 0) FROM ( SELECT * FROM some_table WHERE value = "some_value") t1;
java.lang.RuntimeException: Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator.iterate(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int) on object org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator@70e35d5 of class org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator with arguments {null, null, null, null, null, null} of size 6
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211)
... 4 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator.iterate(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int) on object org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator@70e35d5 of class org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator with arguments {null, null, null, null, null, null} of size 6
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:661)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:167)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:110)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:768)
... 12 more
Caused by: java.lang.IllegalArgumentException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:638)
... 15 more
It's just a hunch, but it seems like the UDAFs iterate() is being called with null values for rows that were filtered out?
{noformat}
*strong*Null values*strong*
The UDAF is very sensitive to null values. If using mapped or array types, or any field that may possibly be null, use an if construct for safety:
{noformat}
if (some_field IS NULL, "", some_field)
{noformat}
> function to load data from hive to hbase
> ----------------------------------------
>
> Key: HIVE-758
> URL: https://issues.apache.org/jira/browse/HIVE-758
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Contrib
> Reporter: Raghotham Murthy
> Priority: Minor
> Attachments: hive-758.1.patch, hive-758.2.patch
>
>
> supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.