You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by takuti <gi...@git.apache.org> on 2017/12/13 08:28:58 UTC
[GitHub] incubator-hivemall pull request #126: [HIVEMALL-162] Support L1 normalizatio...
GitHub user takuti opened a pull request:
https://github.com/apache/incubator-hivemall/pull/126
[HIVEMALL-162] Support L1 normalization
## What changes were proposed in this pull request?
Support `l1_normalize` in a similar manner to `l2_normalize`
## What type of PR is it?
Feature
## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-162
## How was this patch tested?
Unit test
## Checklist
(Please remove this section if not needed; check `x` for YES, blank for NO)
- [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?
- [ ] Did you run system tests on Hive (or Spark)?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/takuti/incubator-hivemall l1-normalize
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hivemall/pull/126.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #126
----
commit 44d1bd9a9d2273b3d020d9dd1b4094b1a15c54f2
Author: Takuya Kitazawa <k....@gmail.com>
Date: 2017-12-13T08:26:15Z
Support L1 normalization
----
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
https://github.com/apache/incubator-hivemall/commit/2fa6fb99dd059c2003829e9c455668835e26be24 fixed CI error.
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
LGTM. Please merge it EMR testing.
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
Ah, good point. I've updated the document.
---
[GitHub] incubator-hivemall pull request #126: [HIVEMALL-162] Support L1 normalizatio...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-hivemall/pull/126
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
@takuti
`normalize` may be introduced to Hive default UDF later as unicode normalization or so.
So, `l1_normalize` is preferred.
Could you add gitbook documentation about it?
---
[GitHub] incubator-hivemall pull request #126: [HIVEMALL-162] Support L1 normalizatio...
Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/126#discussion_r157173567
--- Diff: core/src/main/java/hivemall/ftvec/scaling/L1NormalizationUDF.java ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.ftvec.scaling;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.hadoop.io.Text;
+
+import java.util.Arrays;
+import java.util.List;
+
+@Description(name = "l1_normalize", value = "_FUNC_(ftvec string) - Returned a L1 normalized value")
+@UDFType(deterministic = true, stateful = false)
+public final class L1NormalizationUDF extends UDF {
+
+ public List<Text> evaluate(final List<Text> ftvecs) {
+ if (ftvecs == null) {
+ return null;
+ }
+ double absoluteSum = 0.d;
+ final int numFeatures = ftvecs.size();
+ final String[] features = new String[numFeatures];
+ final float[] weights = new float[numFeatures];
+ for (int i = 0; i < numFeatures; i++) {
+ Text ftvec = ftvecs.get(i);
+ if (ftvec == null) {
+ continue;
+ }
+ String s = ftvec.toString();
+ final String[] ft = s.split(":");
+ final int ftlen = ft.length;
+ if (ftlen == 1) {
+ features[i] = ft[0];
+ weights[i] = 1.f;
+ absoluteSum += 1.d;
+ } else if (ftlen == 2) {
+ features[i] = ft[0];
+ float v = Float.parseFloat(ft[1]);
+ weights[i] = v;
+ absoluteSum += Math.abs(v);
+ } else {
+ throw new IllegalArgumentException("Invalid feature value representation: " + s);
--- End diff --
This is my bad...
https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/ftvec/scaling/L2NormalizationUDF.java#L62
This is my bad.
HiveException (or UDFArgumentException) is expected here. RuntimeException should not be thrown.
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by lemire <gi...@git.apache.org>.
Github user lemire commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
@myui
We had a broken release due to a new layout. This is fixed. Sorry for the problems.
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
Please link and close this issue as well.
https://issues.apache.org/jira/browse/HIVEMALL-59
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by lemire <gi...@git.apache.org>.
Github user lemire commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
@myui There are many of us.
If you need help, ping us.
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
@takuti CI error is happening. Unit test is failing.
```
[ERROR] /home/travis/build/apache/incubator-hivemall/core/src/test/java/hivemall/ftvec/scaling/L1NormalizationUDFTest.java:[63,41] unreported exception org.apache.hadoop.hive.ql.metadata.HiveException; must be caught or declared to be thrown
```
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
It seems recent roaring bitmap release caused a problem.
https://github.com/RoaringBitmap/RoaringBitmap/issues/197
We need to fix versions of depending libraries.
https://github.com/apache/incubator-hivemall/blob/master/core/pom.xml#L142
I'll fix it.
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
@lemire 👍 Thank you for creating a great library.
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
Oh, thanks! Fixed and directly pushed to master.
---
[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization
Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
Another option is to provide generic `normalize` interface like [sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html)
---