You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by eminency <gi...@git.apache.org> on 2016/05/17 03:44:33 UTC

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

GitHub user eminency opened a pull request:

    https://github.com/apache/tajo/pull/1026

    TAJO-2027: Writing Hive UDF integration document

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eminency/tajo hiveudf_doc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/1026.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1026
    
----
commit f5791ab1f5c7e24d8b2152378693e729adb7226f
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-05-13T09:14:28Z

    on working

commit 2ed8625385d0f1402b118c1dcfd18937dd72f715
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-05-13T09:42:17Z

    missed adding

commit 7552d8c66f98d0fcd2024a70912245072dd6f6c4
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-05-16T08:32:58Z

    Basically done

commit 10c495564501511b1680d4b1848239b8cf25bf4e
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-05-17T03:42:14Z

    misc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/1026#discussion_r63645472
  
    --- Diff: tajo-docs/src/main/sphinx/functions/hivefunc.rst ---
    @@ -0,0 +1,81 @@
    +##############
    +Hive Functions
    +##############
    +
    +Tajo provides a feature to use Hive functions directly without re-compilation or additional code.
    +
    +*************
    +Configuration
    +*************
    +
    +Only thing to do is registering path to a directory for jar files containing your hive functions.
    +You can do this by set ``tajo.function.hive.code-dir`` in ``tajo-site.xml`` like the following.
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.function.hive.code-dir</name>
    +    <value>/path/to/hive/function/jar</value>
    +  </property>
    +
    +.. note::
    +  The path should be one in local filesystem. HDFS directory is not supported because of JAVA URI compatability problem.
    +
    +.. warning::
    +
    +  The path must point to a directory, not a file. And multiple directory entries are not allowed.
    +  However, it is possible to load multiple jar files.
    +
    +***************
    +Using in detail
    +***************
    +
    +=============
    +Function Name
    +=============
    +
    +Tajo reads hive functions override ``org.apache.hadoop.hive.ql.exec.UDF`` class. Function name is used as specified in
    +``@Description`` annotation. If it doesn't exist, Tajo uses full qualified class name as function name. For example,
    +it can be like this : ``select com_example_hive_udf_myupper('abcd')``, so it is recommended to use Description annotation.
    +
    +And if some function conflict occurs, it may throw ``AmbiguousFunctionException``. This conflict means about function signature,
    +not only about function name.
    +
    +============================
    +Parameter type / Return type
    +============================
    +
    +Hive uses *Writable* type of Hadoop in functions, but Tajo uses its internal *Datum* type.
    +Because Tajo doesn't support a kind of pluggable type system yet, only some Writable types are supported currently by internal converting.
    --- End diff --
    
    Right, I will fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/1026#discussion_r63620568
  
    --- Diff: tajo-docs/src/main/sphinx/functions/hivefunc.rst ---
    @@ -0,0 +1,81 @@
    +##############
    +Hive Functions
    +##############
    +
    +Tajo provides a feature to use Hive functions directly without re-compilation or additional code.
    +
    +*************
    +Configuration
    +*************
    +
    +Only thing to do is registering path to a directory for jar files containing your hive functions.
    +You can do this by set ``tajo.function.hive.code-dir`` in ``tajo-site.xml`` like the following.
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.function.hive.code-dir</name>
    +    <value>/path/to/hive/function/jar</value>
    +  </property>
    +
    +.. note::
    +  The path should be one in local filesystem. HDFS directory is not supported because of JAVA URI compatability problem.
    +
    +.. warning::
    +
    +  The path must point to a directory, not a file. And multiple directory entries are not allowed.
    +  However, it is possible to load multiple jar files.
    +
    +***************
    +Using in detail
    +***************
    +
    +=============
    +Function Name
    +=============
    +
    +Tajo reads hive functions override ``org.apache.hadoop.hive.ql.exec.UDF`` class. Function name is used as specified in
    +``@Description`` annotation. If it doesn't exist, Tajo uses full qualified class name as function name. For example,
    +it can be like this : ``select com_example_hive_udf_myupper('abcd')``, so it is recommended to use Description annotation.
    +
    +And if some function conflict occurs, it may throw ``AmbiguousFunctionException``. This conflict means about function signature,
    +not only about function name.
    +
    +============================
    +Parameter type / Return type
    +============================
    +
    +Hive uses *Writable* type of Hadoop in functions, but Tajo uses its internal *Datum* type.
    +Because Tajo doesn't support a kind of pluggable type system yet, only some Writable types are supported currently by internal converting.
    --- End diff --
    
    I think the statement 'Because Tajo doesn't support a kind of pluggable type system yet' is unnecessary. IMO, it may cause other misunderstanding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/1026#issuecomment-219938008
  
    @hyunsik , it's done. Please check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/1026#issuecomment-220729472
  
    +1 the patch looks good to me. I'll commit it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/1026#issuecomment-220732160
  
    See http://tajo.apache.org/docs/devel/functions/hivefunc.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by eminency <gi...@git.apache.org>.
Github user eminency closed the pull request at:

    https://github.com/apache/tajo/pull/1026


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/1026#discussion_r63645423
  
    --- Diff: tajo-docs/src/main/sphinx/functions/hivefunc.rst ---
    @@ -0,0 +1,81 @@
    +##############
    +Hive Functions
    +##############
    +
    +Tajo provides a feature to use Hive functions directly without re-compilation or additional code.
    +
    +*************
    +Configuration
    +*************
    +
    +Only thing to do is registering path to a directory for jar files containing your hive functions.
    +You can do this by set ``tajo.function.hive.code-dir`` in ``tajo-site.xml`` like the following.
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/1026#discussion_r63620183
  
    --- Diff: tajo-docs/src/main/sphinx/functions/hivefunc.rst ---
    @@ -0,0 +1,81 @@
    +##############
    +Hive Functions
    +##############
    +
    +Tajo provides a feature to use Hive functions directly without re-compilation or additional code.
    +
    +*************
    +Configuration
    +*************
    +
    +Only thing to do is registering path to a directory for jar files containing your hive functions.
    +You can do this by set ``tajo.function.hive.code-dir`` in ``tajo-site.xml`` like the following.
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.function.hive.code-dir</name>
    +    <value>/path/to/hive/function/jar</value>
    +  </property>
    +
    +.. note::
    +  The path should be one in local filesystem. HDFS directory is not supported because of JAVA URI compatability problem.
    +
    +.. warning::
    +
    +  The path must point to a directory, not a file. And multiple directory entries are not allowed.
    +  However, it is possible to load multiple jar files.
    +
    +***************
    +Using in detail
    +***************
    +
    +=============
    +Function Name
    +=============
    +
    +Tajo reads hive functions override ``org.apache.hadoop.hive.ql.exec.UDF`` class. Function name is used as specified in
    +``@Description`` annotation. If it doesn't exist, Tajo uses full qualified class name as function name. For example,
    +it can be like this : ``select com_example_hive_udf_myupper('abcd')``, so it is recommended to use Description annotation.
    +
    +And if some function conflict occurs, it may throw ``AmbiguousFunctionException``. This conflict means about function signature,
    --- End diff --
    
    Does the 'conflict' mean duplication?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/1026#issuecomment-220732091
  
    I also updated the documentation. But, I missed 'close' tag in the commit log. Could you close this ticket?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/1026#discussion_r63645666
  
    --- Diff: tajo-docs/src/main/sphinx/functions/hivefunc.rst ---
    @@ -0,0 +1,81 @@
    +##############
    +Hive Functions
    +##############
    +
    +Tajo provides a feature to use Hive functions directly without re-compilation or additional code.
    +
    +*************
    +Configuration
    +*************
    +
    +Only thing to do is registering path to a directory for jar files containing your hive functions.
    +You can do this by set ``tajo.function.hive.code-dir`` in ``tajo-site.xml`` like the following.
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.function.hive.code-dir</name>
    +    <value>/path/to/hive/function/jar</value>
    +  </property>
    +
    +.. note::
    +  The path should be one in local filesystem. HDFS directory is not supported because of JAVA URI compatability problem.
    +
    +.. warning::
    +
    +  The path must point to a directory, not a file. And multiple directory entries are not allowed.
    +  However, it is possible to load multiple jar files.
    +
    +***************
    +Using in detail
    +***************
    +
    +=============
    +Function Name
    +=============
    +
    +Tajo reads hive functions override ``org.apache.hadoop.hive.ql.exec.UDF`` class. Function name is used as specified in
    +``@Description`` annotation. If it doesn't exist, Tajo uses full qualified class name as function name. For example,
    +it can be like this : ``select com_example_hive_udf_myupper('abcd')``, so it is recommended to use Description annotation.
    +
    +And if some function conflict occurs, it may throw ``AmbiguousFunctionException``. This conflict means about function signature,
    --- End diff --
    
    Yes, that word would be better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/1026#issuecomment-220787028
  
    Thanks, I close this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/1026#discussion_r63620353
  
    --- Diff: tajo-docs/src/main/sphinx/functions/hivefunc.rst ---
    @@ -0,0 +1,81 @@
    +##############
    +Hive Functions
    +##############
    +
    +Tajo provides a feature to use Hive functions directly without re-compilation or additional code.
    +
    +*************
    +Configuration
    +*************
    +
    +Only thing to do is registering path to a directory for jar files containing your hive functions.
    +You can do this by set ``tajo.function.hive.code-dir`` in ``tajo-site.xml`` like the following.
    --- End diff --
    
    ```hive.jar-dir``` would be more proper for this config. I know this is out of scope of this issue. But, we need to improve it before releasing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2027: Writing Hive UDF integration documen...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/1026#discussion_r64113791
  
    --- Diff: tajo-docs/src/main/sphinx/functions/hivefunc.rst ---
    @@ -0,0 +1,81 @@
    +##############
    +Hive Functions
    +##############
    +
    +Tajo provides a feature to use Hive functions directly without re-compilation or additional code.
    +
    +*************
    +Configuration
    +*************
    +
    +Only thing to do is registering path to a directory for jar files containing your hive functions.
    +You can do this by set ``tajo.function.hive.code-dir`` in ``tajo-site.xml`` like the following.
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.function.hive.code-dir</name>
    +    <value>/path/to/hive/function/jar</value>
    +  </property>
    +
    +.. note::
    +  The path should be one in local filesystem. HDFS directory is not supported because of JAVA URI compatability problem.
    +
    +.. warning::
    +
    +  The path must point to a directory, not a file. And multiple directory entries are not allowed.
    +  However, it is possible to load multiple jar files.
    +
    +***************
    +Using in detail
    +***************
    +
    +=============
    +Function Name
    +=============
    +
    +Tajo reads hive functions override ``org.apache.hadoop.hive.ql.exec.UDF`` class. Function name is used as specified in
    +``@Description`` annotation. If it doesn't exist, Tajo uses full qualified class name as function name. For example,
    +it can be like this : ``select com_example_hive_udf_myupper('abcd')``, so it is recommended to use Description annotation.
    +
    +And if some function conflict occurs, it may throw ``AmbiguousFunctionException``. This conflict means about function signature,
    --- End diff --
    
    If so, DuplicateFunctionException would be proper for this case. Anyway, it is also out of scope of this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---