You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by myui <gi...@git.apache.org> on 2018/02/27 07:33:37 UTC
[GitHub] incubator-hivemall pull request #135: [WIP] Merge Brickhouse functions
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/135
[WIP] Merge Brickhouse functions
## What changes were proposed in this pull request?
Merge [brickhouse](https://github.com/klout/brickhouse) functions.
## What type of PR is it?
Feature
## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-145
## How was this patch tested?
unit tests and manual tests
## How to use this feature?
(to appear)
## Checklist
- [ ] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?
- [ ] +1 from Klout members
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/myui/incubator-hivemall merge_brickhouse
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hivemall/pull/135.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #135
----
commit 662b1c432016f8de9eae2da945c2b293430e495d
Author: Makoto Yui <my...@...>
Date: 2018-02-27T07:28:58Z
Added Klout to NOTICE that provided a SGA to ASF
----
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@maropu Could you check whether `to_json` and `from_json` works on Spark or not if possible?
I'm not sure hcatalog is provided in Spark environment.
https://github.com/apache/incubator-hivemall/pull/135/commits/dd99307ef49f0507870573efdf5f2ae8da240cca#diff-357e4854869b2e21c38b1b437f11095aR56
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
If you are using `v0.5.0`, then you need to use [one of v0.5.0](https://github.com/apache/incubator-hivemall/blob/v0.5.0/resources/ddl/define-all.hive).
DDLs are pointing specified release branches in [the distribution page](http://hivemall.incubator.apache.org/download.html).
[Installation manual](http://hivemall.incubator.apache.org/userguide/getting_started/installation.html) can be improved though.
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
create temporary function array_slice as 'hivemall.tools.array.ArraySliceUDF';
select
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
0, -- offset
2 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6, -- offset
3 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6, -- offset
10 -- length
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
6 -- offset
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
-3 -- offset
),
array_slice(
array("zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"),
-3, -- offset
2 -- length
);
```
> ["zero","one"] ["six","seven","eight"] ["six","seven","eight","nine","ten"] ["six","seven","eight","nine","ten"] ["eight","nine","ten"] ["eight","nine"]
---
[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
select generate_series(2,4);
value
2
3
4
select generate_series(5,1,-2);
value
5
3
1
select generate_series(4,3);
(no return)
select date_add(current_date(),value) as `date`,value from (select generate_series(1,3)) t;
date value
2018-04-21 1
2018-04-22 2
2018-04-23 3
WITH input as (
select 1 as c1, 10 as c2, 3 as step
UNION ALL
select 10, 2, -3
)
select generate_series(c1, c2, step) as series from input;
series
1
4
7
10
10
7
4
```
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
create temporary function conditional_emit as 'hivemall.tools.array.ConditionalEmitUDTF';
WITH input as (
select array(true, false, true) as conditions, array("one", "two", "three") as features
UNION ALL
select array(true, true, false), array("four", "five", "six")
)
select
conditional_emit(
conditions, features
)
from
input;
```
|feature|
|:-:|
|one|
|three|
|four|
|five|
---
[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@jeromebanks I'm considering to merge this PR. Could you review if possible?
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@paulojblack You need to use up-to-date DDLs since we updated DDLs for `subarray` UDF in https://github.com/apache/incubator-hivemall/pull/135/commits/7003006e1c27cae66d5aa5c91fccf27a21b105a7
By using [define-all.hive](https://github.com/apache/incubator-hivemall/blob/master/resources/ddl/define-all.hive) in master branch, it's working without errors in my environment.
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by paulojblack <gi...@git.apache.org>.
Github user paulojblack commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
That makes sense, I figured a change like that wasnt made blindly. Consider it a heads up on the docs then!
---
[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
create temporary function merge_maps as 'hivemall.tools.map.MergeMapsUDAF';
create table test as
SELECT map('A',10,'B',20,'C',30) as m
UNION ALL
SELECT map('A',11,'D',40,'E',50) as m;
> {"A":11,"B":20,"C":30,"D":40,"E":50}
SELECT merge_maps(m) FROM test;
```
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@jeromebanks Add you to [the committer list](http://hivemall.incubator.apache.org/team-list.html) with https://github.com/apache/incubator-hivemall/pull/135/commits/e956e98fdc80f3498cad8e55bc88cac1518c5f30
FYI
---
[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
I'm going to merge this PR to master. If you find any problem, please comment here.
---
[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
For K-minimum Values (KMV) and Sketch related codes, I'll create an another JIRA ticket.
For other UDFs, we accept incoming PRs.
https://docs.google.com/spreadsheets/d/1gtFNcTvPR9OZAsbobj2D9d37tOx4nAoSlib9CLdEDQg/edit#gid=0
---
[GitHub] incubator-hivemall issue #135: [WIP][HIVEMALL-145] Merge Brickhouse function...
Posted by jeromebanks <gi...@git.apache.org>.
Github user jeromebanks commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@myui - Sure. I've been silently lurking, not sure when to step in. Looks fine in general, +1. I will do more in depth review however.
--- @jeromebanks
On Tuesday, June 5, 2018, 3:23:58 AM PDT, Makoto YUI <no...@github.com> wrote:
@jeromebanks I'm considering to merge this PR. Could you review if possible?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by paulojblack <gi...@git.apache.org>.
Github user paulojblack commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
Just a heads up, I think some of the changes here that were pushed to master recently have broken set-up as instructed in the getting started docs.
Specifically having trouble with the changes in https://github.com/apache/incubator-hivemall/blob/master/resources/ddl/define-all.hive. After commenting out lines 409-413 it works as expected.
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@jeromebanks merging of Brickhouse functions is in-progress in this PR. FYI
We need to add unit test, improve qualities of functions, and add documents.
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@myui Spark already has these functions: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3118
---
[GitHub] incubator-hivemall pull request #135: [HIVEMALL-145] Merge Brickhouse functi...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-hivemall/pull/135
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@paulojblack Thank you for comments. Will confirm it and fix master.
---
[GitHub] incubator-hivemall pull request #135: [WIP][HIVEMALL-145] Merge Brickhouse f...
Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/135#discussion_r192006959
--- Diff: core/src/main/java/hivemall/tools/sanity/AssertUDF.java ---
@@ -25,8 +25,10 @@
@Description(name = "assert",
value = "_FUNC_(boolean condition) or _FUNC_(boolean condition, string errMsg)"
- + "- Throws HiveException if condition is not met")
-@UDFType(deterministic = true, stateful = false)
+ + "- Throws HiveException if condition is not met",
+ extended = "SELECT count(1) FROM stock_price WHERE assert(price > 0.0);\n"
+ + "SELECT count(1) FROM stock_price WHRE assert(price > 0.0, 'price MUST be more than 0.0')")
--- End diff --
typo `s/WHRE/WHERE/`
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
Still WIP for [reviewing](https://docs.google.com/spreadsheets/d/1gtFNcTvPR9OZAsbobj2D9d37tOx4nAoSlib9CLdEDQg/edit#gid=0) functions to merge.
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
create temporary function moving_avg as 'hivemall.statistics.MovingAverageUDTF';
select moving_avg(x, 3) from (select explode(array(1,2,3,4,5,6,7)) as x) series;
select moving_avg(x, 3) from (select explode(array(1.0,2.0,3.0,4.0,5.0,6.0,7.0)) as x) series;
```
|avg|
|:-:|
|1.0|
|1.5|
|2.0|
|3.0|
|4.0|
|5.0|
|6.0|
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@paulojblack Generally, we recommend to use [Official ASF releases](http://hivemall.incubator.apache.org/download.html), not one in the master branch.
When you are using the master branch, use the latest DDLs with a caution.
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
select
NAMED_STRUCT("Name", "John", "age", 31),
to_json(
NAMED_STRUCT("Name", "John", "age", 31)
),
to_json(
NAMED_STRUCT("Name", "John", "age", 31),
array('Name', 'age')
),
to_json(
NAMED_STRUCT("Name", "John", "age", 31),
array('name', 'age')
),
to_json(
NAMED_STRUCT("Name", "John", "age", 31),
array('age')
),
to_json(
NAMED_STRUCT("Name", "John", "age", 31),
array()
),
to_json(
null,
array()
),
to_json(
struct("123", "456", 789, array(314,007)),
array('ti','si','i','bi')
),
to_json(
struct("123", "456", 789, array(314,007)),
'ti,si,i,bi'
),
to_json(
struct("123", "456", 789, array(314,007))
),
to_json(
NAMED_STRUCT("country", "japan", "city", "tokyo")
),
to_json(
NAMED_STRUCT("country", "japan", "city", "tokyo"),
array('city')
),
to_json(
ARRAY(
NAMED_STRUCT("country", "japan", "city", "tokyo"),
NAMED_STRUCT("country", "japan", "city", "osaka")
)
),
to_json(
ARRAY(
NAMED_STRUCT("country", "japan", "city", "tokyo"),
NAMED_STRUCT("country", "japan", "city", "osaka")
),
array('city')
);
```
> {"name":"John","age":31} {"name":"John","age":31} {"Name":"John","age":31} {"name":"John","age":31} {"age":31} {}NULL {"ti":"123","si":"456","i":789,"bi":[314,7]} {"ti":"123","si":"456","i":789,"bi":[314,7]} {"col1":"123","col2":"456","col3":789,"col4":[314,7]} {"country":"japan","city":"tokyo"} {"city":"tokyo"} [{"country":"japan","city":"tokyo"},{"country":"japan","city":"osaka"}] [{"country":"japan","city":"tokyo"},{"country":"japan","city":"osaka"}]
```sql
select
from_json(
'{ "person" : { "name" : "makoto" , "age" : 37 } }',
'struct<name:string,age:int>',
array('person')
),
from_json(
'[0.1,1.1,2.2]',
'array<double>'
),
from_json(to_json(
ARRAY(
NAMED_STRUCT("country", "japan", "city", "tokyo"),
NAMED_STRUCT("country", "japan", "city", "osaka")
)
),'array<struct<country:string,city:string>>'),
from_json(to_json(
ARRAY(
NAMED_STRUCT("country", "japan", "city", "tokyo"),
NAMED_STRUCT("country", "japan", "city", "osaka")
),
array('city')
), 'array<struct<country:string,city:string>>'),
from_json(to_json(
ARRAY(
NAMED_STRUCT("country", "japan", "city", "tokyo"),
NAMED_STRUCT("country", "japan", "city", "osaka")
)
),'array<struct<city:string>>');
```
> {"name":"makoto","age":37} [0.1,1.1,2.2] [{"country":"japan","city":"tokyo"},{"country":"japan","city":"osaka"}] [{"country":"japan","city":"tokyo"},{"country":"japan","city":"osaka"}] [{"city":"tokyo"},{"city":"osaka"}]
---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@maropu Deprecated SubarrayUDF to use ArraySliceUDF instead. FYI
---