You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by takuti <gi...@git.apache.org> on 2017/12/13 08:28:58 UTC

[GitHub] incubator-hivemall pull request #126: [HIVEMALL-162] Support L1 normalizatio...

GitHub user takuti opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/126

    [HIVEMALL-162] Support L1 normalization

    ## What changes were proposed in this pull request?
    
    Support `l1_normalize` in a similar manner to `l2_normalize`
    
    ## What type of PR is it?
    
    Feature
    
    ## What is the Jira issue?
    
    https://issues.apache.org/jira/browse/HIVEMALL-162
    
    ## How was this patch tested?
    
    Unit test
    
    ## Checklist
    
    (Please remove this section if not needed; check `x` for YES, blank for NO)
    
    - [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit?
    - [ ] Did you run system tests on Hive (or Spark)?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/takuti/incubator-hivemall l1-normalize

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #126
    
----
commit 44d1bd9a9d2273b3d020d9dd1b4094b1a15c54f2
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-12-13T08:26:15Z

    Support L1 normalization

----


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    https://github.com/apache/incubator-hivemall/commit/2fa6fb99dd059c2003829e9c455668835e26be24 fixed CI error.


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    LGTM. Please merge it EMR testing.



---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    Ah, good point. I've updated the document.


---

[GitHub] incubator-hivemall pull request #126: [HIVEMALL-162] Support L1 normalizatio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-hivemall/pull/126


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    @takuti 
    
    `normalize` may be introduced to Hive default UDF later as unicode normalization or so.
    So, `l1_normalize` is preferred.
    
    Could you add gitbook documentation about it?


---

[GitHub] incubator-hivemall pull request #126: [HIVEMALL-162] Support L1 normalizatio...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/126#discussion_r157173567
  
    --- Diff: core/src/main/java/hivemall/ftvec/scaling/L1NormalizationUDF.java ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package hivemall.ftvec.scaling;
    +
    +import org.apache.hadoop.hive.ql.exec.Description;
    +import org.apache.hadoop.hive.ql.exec.UDF;
    +import org.apache.hadoop.hive.ql.udf.UDFType;
    +import org.apache.hadoop.io.Text;
    +
    +import java.util.Arrays;
    +import java.util.List;
    +
    +@Description(name = "l1_normalize", value = "_FUNC_(ftvec string) - Returned a L1 normalized value")
    +@UDFType(deterministic = true, stateful = false)
    +public final class L1NormalizationUDF extends UDF {
    +
    +    public List<Text> evaluate(final List<Text> ftvecs) {
    +        if (ftvecs == null) {
    +            return null;
    +        }
    +        double absoluteSum = 0.d;
    +        final int numFeatures = ftvecs.size();
    +        final String[] features = new String[numFeatures];
    +        final float[] weights = new float[numFeatures];
    +        for (int i = 0; i < numFeatures; i++) {
    +            Text ftvec = ftvecs.get(i);
    +            if (ftvec == null) {
    +                continue;
    +            }
    +            String s = ftvec.toString();
    +            final String[] ft = s.split(":");
    +            final int ftlen = ft.length;
    +            if (ftlen == 1) {
    +                features[i] = ft[0];
    +                weights[i] = 1.f;
    +                absoluteSum += 1.d;
    +            } else if (ftlen == 2) {
    +                features[i] = ft[0];
    +                float v = Float.parseFloat(ft[1]);
    +                weights[i] = v;
    +                absoluteSum += Math.abs(v);
    +            } else {
    +                throw new IllegalArgumentException("Invalid feature value representation: " + s);
    --- End diff --
    
    This is my bad...
    https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/ftvec/scaling/L2NormalizationUDF.java#L62
    
    This is my bad.
    
    HiveException (or UDFArgumentException) is expected here. RuntimeException should not be thrown.


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by lemire <gi...@git.apache.org>.
Github user lemire commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    @myui
    
    We had a broken release due to a new layout. This is fixed. Sorry for the problems.


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    Please link and close this issue as well. 
    https://issues.apache.org/jira/browse/HIVEMALL-59 


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by lemire <gi...@git.apache.org>.
Github user lemire commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    @myui There are many of us. 
    
    If you need help, ping us.


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    @takuti CI error is happening. Unit test is failing.
    
    ```
    [ERROR] /home/travis/build/apache/incubator-hivemall/core/src/test/java/hivemall/ftvec/scaling/L1NormalizationUDFTest.java:[63,41] unreported exception org.apache.hadoop.hive.ql.metadata.HiveException; must be caught or declared to be thrown
    ```


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    It seems recent roaring bitmap release caused a problem.
    https://github.com/RoaringBitmap/RoaringBitmap/issues/197
    
    We need to fix versions of depending libraries.
    https://github.com/apache/incubator-hivemall/blob/master/core/pom.xml#L142
    
    I'll fix it.


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    @lemire 👍  Thank you for creating a great library.


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    Oh, thanks! Fixed and directly pushed to master.


---

[GitHub] incubator-hivemall issue #126: [HIVEMALL-162] Support L1 normalization

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/126
  
    Another option is to provide generic `normalize` interface like [sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html)


---