You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by myui <gi...@git.apache.org> on 2018/02/27 07:33:37 UTC

[GitHub] incubator-hivemall pull request #135: [WIP] Merge Brickhouse functions

GitHub user myui opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/135

    [WIP] Merge Brickhouse functions

    ## What changes were proposed in this pull request?
    
    Merge [brickhouse](https://github.com/klout/brickhouse) functions.
    
    ## What type of PR is it?
    
    Feature
    
    ## What is the Jira issue?
    
    https://issues.apache.org/jira/browse/HIVEMALL-145
    
    ## How was this patch tested?
    
    unit tests and manual tests
    
    ## How to use this feature?
    
    (to appear)
    
    ## Checklist
    
    - [ ] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?
    - [ ] Did you run system tests on Hive (or Spark)?
    - [ ] +1 from Klout members

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/myui/incubator-hivemall merge_brickhouse

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/135.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #135
    
----
commit 662b1c432016f8de9eae2da945c2b293430e495d
Author: Makoto Yui <my...@...>
Date:   2018-02-27T07:28:58Z

    Added Klout to NOTICE that provided a SGA to ASF

----


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @maropu Could you check whether `to_json` and `from_json` works on Spark or not if possible?
    
    I'm not sure hcatalog is provided in Spark environment. 
    https://github.com/apache/incubator-hivemall/pull/135/commits/dd99307ef49f0507870573efdf5f2ae8da240cca#diff-357e4854869b2e21c38b1b437f11095aR56


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    If you are using `v0.5.0`, then you need to use [one of v0.5.0](https://github.com/apache/incubator-hivemall/blob/v0.5.0/resources/ddl/define-all.hive).
    
    DDLs are pointing specified release branches in [the distribution page](http://hivemall.incubator.apache.org/download.html).
    
    [Installation manual](http://hivemall.incubator.apache.org/userguide/getting_started/installation.html) can be improved though.


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    ```sql
    create temporary function array_slice as 'hivemall.tools.array.ArraySliceUDF';
    
    select 
      array_slice(
       array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
       0, -- offset
       2 -- length
      ),
      array_slice(
       array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
       6, -- offset
       3 -- length
      ),
      array_slice(
       array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
       6, -- offset
       10 -- length
      ),
      array_slice(
       array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
       6 -- offset
      ),
      array_slice(
       array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
       -3 -- offset
      ),
      array_slice(
       array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
       -3, -- offset
       2 -- length
      );
    ```
    
    > ["zero","one"]  ["six","seven","eight"] ["six","seven","eight","nine","ten"]    ["six","seven","eight","nine","ten"]    ["eight","nine","ten"]  ["eight","nine"]


---

[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    ```sql
    select generate_series(2,4);
    
    value
    2
    3
    4
    
    select generate_series(5,1,-2);
    
    value
    5
    3
    1
    
    select generate_series(4,3);
    
    (no return)
    
    select date_add(current_date(),value) as `date`,value from (select generate_series(1,3)) t;
    
    date    value
    2018-04-21      1
    2018-04-22      2
    2018-04-23      3
    
    WITH input as (
     select 1 as c1, 10 as c2, 3 as step
     UNION ALL
     select 10, 2, -3
    )
    select generate_series(c1, c2, step) as series from input;
    
    series
    1
    4
    7
    10
    10
    7
    4
    ```


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    ```sql
    create temporary function conditional_emit as 'hivemall.tools.array.ConditionalEmitUDTF';
    
    WITH input as (
       select array(true, false, true) as conditions, array("one", "two", "three") as features
       UNION ALL
       select array(true, true, false), array("four", "five", "six")
    )
    select
      conditional_emit(
         conditions, features
      )
    from 
      input;
    ```
    
    |feature|
    |:-:|
    |one|
    |three|
    |four|
    |five|


---

[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @jeromebanks I'm considering to merge this PR. Could you review if possible?


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @paulojblack You need to use up-to-date DDLs since we updated DDLs for `subarray` UDF in https://github.com/apache/incubator-hivemall/pull/135/commits/7003006e1c27cae66d5aa5c91fccf27a21b105a7
    
    By using [define-all.hive](https://github.com/apache/incubator-hivemall/blob/master/resources/ddl/define-all.hive) in master branch, it's working without errors in my environment.


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by paulojblack <gi...@git.apache.org>.
Github user paulojblack commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    That makes sense, I figured a change like that wasnt made blindly. Consider it a heads up on the docs then!


---

[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    ```sql
    create temporary function merge_maps as 'hivemall.tools.map.MergeMapsUDAF';
    
    create table test as 
     SELECT map('A',10,'B',20,'C',30) as m
     UNION ALL 
     SELECT map('A',11,'D',40,'E',50) as m;
    
    > {"A":11,"B":20,"C":30,"D":40,"E":50}
    
    SELECT merge_maps(m) FROM test;
    ```


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @jeromebanks Add you to [the committer list](http://hivemall.incubator.apache.org/team-list.html) with https://github.com/apache/incubator-hivemall/pull/135/commits/e956e98fdc80f3498cad8e55bc88cac1518c5f30
    FYI



---

[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    I'm going to merge this PR to master. If you find any problem, please comment here.


---

[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    For K-minimum Values (KMV) and Sketch related codes, I'll create an another JIRA ticket.
    
    For other UDFs, we accept incoming PRs.
    https://docs.google.com/spreadsheets/d/1gtFNcTvPR9OZAsbobj2D9d37tOx4nAoSlib9CLdEDQg/edit#gid=0


---

[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...

Posted by jeromebanks <gi...@git.apache.org>.
Github user jeromebanks commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
     @myui - Sure.  I've been silently lurking, not sure when to step in. Looks fine in general, +1.  I will do more in depth review however.
    --- @jeromebanks
        On Tuesday, June 5, 2018, 3:23:58 AM PDT, Makoto YUI <no...@github.com> wrote:  
     
     
    @jeromebanks I'm considering to merge this PR. Could you review if possible?
    
    —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub, or mute the thread.
       


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by paulojblack <gi...@git.apache.org>.
Github user paulojblack commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    Just a heads up, I think some of the changes here that were pushed to master recently have broken set-up as instructed in the getting started docs. 
    
    Specifically having trouble with the changes in https://github.com/apache/incubator-hivemall/blob/master/resources/ddl/define-all.hive. After commenting out lines 409-413 it works as expected.


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @jeromebanks merging of Brickhouse functions is in-progress in this PR. FYI
    
    We need to add unit test, improve qualities of functions, and add documents.


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @myui Spark already has these functions: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3118


---

[GitHub] incubator-hivemall pull request #135: [HIVEMALL-145] Merge Brickhouse functi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-hivemall/pull/135


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @paulojblack Thank you for comments. Will confirm it and fix master.


---

[GitHub] incubator-hivemall pull request #135: [WIP][HIVEMALL-145] Merge Brickhouse f...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/135#discussion_r192006959
  
    --- Diff: core/src/main/java/hivemall/tools/sanity/AssertUDF.java ---
    @@ -25,8 +25,10 @@
     
     @Description(name = "assert",
             value = "_FUNC_(boolean condition) or _FUNC_(boolean condition, string errMsg)"
    -                + "- Throws HiveException if condition is not met")
    -@UDFType(deterministic = true, stateful = false)
    +                + "- Throws HiveException if condition is not met",
    +        extended = "SELECT count(1) FROM stock_price WHERE assert(price > 0.0);\n"
    +                + "SELECT count(1) FROM stock_price WHRE assert(price > 0.0, 'price MUST be more than 0.0')")
    --- End diff --
    
    typo `s/WHRE/WHERE/`


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    Still WIP for [reviewing](https://docs.google.com/spreadsheets/d/1gtFNcTvPR9OZAsbobj2D9d37tOx4nAoSlib9CLdEDQg/edit#gid=0) functions to merge.


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    ```sql
    create temporary function moving_avg as 'hivemall.statistics.MovingAverageUDTF';
    
    select moving_avg(x, 3) from (select explode(array(1,2,3,4,5,6,7)) as x) series;
    select moving_avg(x, 3) from (select explode(array(1.0,2.0,3.0,4.0,5.0,6.0,7.0)) as x) series;
    ```
    
    |avg|
    |:-:|
    |1.0|
    |1.5|
    |2.0|
    |3.0|
    |4.0|
    |5.0|
    |6.0|



---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @paulojblack Generally, we recommend to use [Official ASF releases](http://hivemall.incubator.apache.org/download.html), not one in the master branch.
    
    When you are using the master branch, use the latest DDLs with a caution.


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    ```sql
    select 
      NAMED_STRUCT("Name", "John", "age", 31),
      to_json(
         NAMED_STRUCT("Name", "John", "age", 31)
      ),
      to_json(
         NAMED_STRUCT("Name", "John", "age", 31),
         array('Name', 'age')
      ),
      to_json(
         NAMED_STRUCT("Name", "John", "age", 31),
         array('name', 'age')
      ),
      to_json(
         NAMED_STRUCT("Name", "John", "age", 31),
         array('age')
      ),
      to_json(
         NAMED_STRUCT("Name", "John", "age", 31),
         array()
      ),
      to_json(
         null,
         array()
      ),
      to_json(
        struct("123", "456", 789, array(314,007)),
        array('ti','si','i','bi')
      ),
      to_json(
        struct("123", "456", 789, array(314,007)),
        'ti,si,i,bi'
      ),
      to_json(
        struct("123", "456", 789, array(314,007))
      ),
      to_json(
        NAMED_STRUCT("country", "japan", "city", "tokyo")
      ),
      to_json(
        NAMED_STRUCT("country", "japan", "city", "tokyo"), 
        array('city')
      ),
      to_json(
        ARRAY(
          NAMED_STRUCT("country", "japan", "city", "tokyo"), 
          NAMED_STRUCT("country", "japan", "city", "osaka")
        )
      ),
      to_json(
        ARRAY(
          NAMED_STRUCT("country", "japan", "city", "tokyo"), 
          NAMED_STRUCT("country", "japan", "city", "osaka")
        ),
        array('city')
      );
    ```
    
    > {"name":"John","age":31}        {"name":"John","age":31}        {"Name":"John","age":31}        {"name":"John","age":31}        {"age":31}      {}NULL     {"ti":"123","si":"456","i":789,"bi":[314,7]}    {"ti":"123","si":"456","i":789,"bi":[314,7]}    {"col1":"123","col2":"456","col3":789,"col4":[314,7]}      {"country":"japan","city":"tokyo"}      {"city":"tokyo"}        [{"country":"japan","city":"tokyo"},{"country":"japan","city":"osaka"}]    [{"country":"japan","city":"tokyo"},{"country":"japan","city":"osaka"}]
    
    ```sql
    select
      from_json(
        '{ "person" : { "name" : "makoto" , "age" : 37 } }',
        'struct<name:string,age:int>', 
        array('person')
      ),
      from_json(
        '[0.1,1.1,2.2]',
        'array<double>'
      ),
      from_json(to_json(
        ARRAY(
          NAMED_STRUCT("country", "japan", "city", "tokyo"), 
          NAMED_STRUCT("country", "japan", "city", "osaka")
        )
      ),'array<struct<country:string,city:string>>'),
      from_json(to_json(
        ARRAY(
          NAMED_STRUCT("country", "japan", "city", "tokyo"), 
          NAMED_STRUCT("country", "japan", "city", "osaka")
        ),
        array('city')
      ), 'array<struct<country:string,city:string>>'),
      from_json(to_json(
        ARRAY(
          NAMED_STRUCT("country", "japan", "city", "tokyo"), 
          NAMED_STRUCT("country", "japan", "city", "osaka")
        )
      ),'array<struct<city:string>>');
    ```
    
    > {"name":"makoto","age":37}      [0.1,1.1,2.2]   [{"country":"japan","city":"tokyo"},{"country":"japan","city":"osaka"}] [{"country":"japan","city":"tokyo"},{"country":"japan","city":"osaka"}]    [{"city":"tokyo"},{"city":"osaka"}]


---

[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/135
  
    @maropu Deprecated SubarrayUDF to use ArraySliceUDF instead. FYI


---