You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by takuti <gi...@git.apache.org> on 2017/03/18 03:28:08 UTC

[GitHub] incubator-hivemall pull request #63: [HIVEMALL-90] Refine incomplete AUC UDA...

GitHub user takuti opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/63

    [HIVEMALL-90] Refine incomplete AUC UDAF implementation

    ## What changes were proposed in this pull request?
    
    Since AUC UDAF (classification) did not work correctly for some specific merge orders, this PR fixes the issue by modifying the UDAF's `merge()` and `terminate()` implementation.
    
    Moreover, unit tests are refined accordingly, and a utility method is created in **HiveUtils**.
    
    ## What type of PR is it?
    
    Bug Fix
    
    ## What is the Jira issue?
    
    https://issues.apache.org/jira/browse/HIVEMALL-90
    
    ## How was this patch tested?
    
    - Unit test
    - Manual test on EMR
    
    ## How to use this feature?
    
    Nothing has been changed from [current AUC UDAF](https://hivemall.incubator.apache.org/userguide/eval/auc.html).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/takuti/incubator-hivemall fix-auc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/63.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #63
    
----
commit da1578207fb9bc629455c503914757b90506ab66
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-16T02:15:01Z

    Update AUC UDAF test to support all of 3!=6 merge orders

commit 5cc090fa95513dcf5db3855d5c5671cf61f45dae
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-17T03:37:05Z

    Support arbitrary merge order

commit e4737fe57a555fc5719e51c1fa2881e18a44fd74
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-17T03:44:11Z

    Update test case: there are two samples which have same scores

commit 5e91bbd367708d1f1e28dc00e0c64c95dfc6a66a
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-17T04:14:22Z

    Fix typo

commit c0645fe74cbd0a1412747b470ead229083d03351
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-17T05:57:47Z

    Carefully initialize accumulated partial area and (previous) TP/FP count

commit 627192cad19857d6a6ad92dfac18893576391053
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-17T07:14:53Z

    Merge pertial result from left to right

commit a99648685a317784f5a4e2b13ed64b18cffdc4e4
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-17T14:11:15Z

    Update AUC UDAF Test w/ larger sample set

commit 92b7cbc64db00a4c9994b4693a25527790ad0cee
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-17T14:13:12Z

    Same scores should be passed to the same reducer

commit 49375798bdce6644b77da537b2501fcb303cb8bd
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-18T03:20:28Z

    Refactor

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #63: [HIVEMALL-90] Refine incomplete AUC UDA...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/63#discussion_r110553873
  
    --- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java ---
    @@ -204,21 +256,53 @@ public void merge(AggregationBuffer agg, Object partial) throws HiveException {
                     return;
                 }
     
    -            Object aObj = internalMergeOI.getStructFieldData(partial, aField);
    -            Object scorePrevObj = internalMergeOI.getStructFieldData(partial, scorePrevField);
    +            Object indexScoreObj = internalMergeOI.getStructFieldData(partial, indexScoreField);
    +            Object areaObj = internalMergeOI.getStructFieldData(partial, areaField);
                 Object fpObj = internalMergeOI.getStructFieldData(partial, fpField);
                 Object tpObj = internalMergeOI.getStructFieldData(partial, tpField);
                 Object fpPrevObj = internalMergeOI.getStructFieldData(partial, fpPrevField);
                 Object tpPrevObj = internalMergeOI.getStructFieldData(partial, tpPrevField);
    -            double a = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.get(aObj);
    -            double scorePrev = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.get(scorePrevObj);
    +            Object areaPartialMapObj = internalMergeOI.getStructFieldData(partial, areaPartialMapField);
    +            Object fpPartialMapObj = internalMergeOI.getStructFieldData(partial, fpPartialMapField);
    +            Object tpPartialMapObj = internalMergeOI.getStructFieldData(partial, tpPartialMapField);
    +            Object fpPrevPartialMapObj = internalMergeOI.getStructFieldData(partial, fpPrevPartialMapField);
    +            Object tpPrevPartialMapObj = internalMergeOI.getStructFieldData(partial, tpPrevPartialMapField);
    +
    +            double indexScore = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.get(indexScoreObj);
    +            double area = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.get(areaObj);
                 long fp = PrimitiveObjectInspectorFactory.writableLongObjectInspector.get(fpObj);
                 long tp = PrimitiveObjectInspectorFactory.writableLongObjectInspector.get(tpObj);
                 long fpPrev = PrimitiveObjectInspectorFactory.writableLongObjectInspector.get(fpPrevObj);
                 long tpPrev = PrimitiveObjectInspectorFactory.writableLongObjectInspector.get(tpPrevObj);
     
    +            Map<Double, Double> areaPartialMap = (Map<Double, Double>) ObjectInspectorFactory.getStandardMapObjectInspector(
    --- End diff --
    
    ```java
    ObjectInspectorFactory.getStandardMapObjectInspector(
      PrimitiveObjectInspectorFactory.writableDoubleObjectInspector,
      PrimitiveObjectInspectorFactory.writableLongObjectInspector
    )
    ```
    
    Invalid casting `Map<DoubleWritable, LongWritable>` to `Map<Double, Double>`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #63: [HIVEMALL-90] Refine incomplete AUC UDA...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-hivemall/pull/63


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #63: [HIVEMALL-90] Refine incomplete AUC UDA...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/63#discussion_r110553945
  
    --- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java ---
    @@ -204,21 +256,53 @@ public void merge(AggregationBuffer agg, Object partial) throws HiveException {
                     return;
                 }
     
    -            Object aObj = internalMergeOI.getStructFieldData(partial, aField);
    -            Object scorePrevObj = internalMergeOI.getStructFieldData(partial, scorePrevField);
    +            Object indexScoreObj = internalMergeOI.getStructFieldData(partial, indexScoreField);
    +            Object areaObj = internalMergeOI.getStructFieldData(partial, areaField);
                 Object fpObj = internalMergeOI.getStructFieldData(partial, fpField);
                 Object tpObj = internalMergeOI.getStructFieldData(partial, tpField);
                 Object fpPrevObj = internalMergeOI.getStructFieldData(partial, fpPrevField);
                 Object tpPrevObj = internalMergeOI.getStructFieldData(partial, tpPrevField);
    -            double a = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.get(aObj);
    -            double scorePrev = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.get(scorePrevObj);
    +            Object areaPartialMapObj = internalMergeOI.getStructFieldData(partial, areaPartialMapField);
    +            Object fpPartialMapObj = internalMergeOI.getStructFieldData(partial, fpPartialMapField);
    +            Object tpPartialMapObj = internalMergeOI.getStructFieldData(partial, tpPartialMapField);
    +            Object fpPrevPartialMapObj = internalMergeOI.getStructFieldData(partial, fpPrevPartialMapField);
    +            Object tpPrevPartialMapObj = internalMergeOI.getStructFieldData(partial, tpPrevPartialMapField);
    +
    +            double indexScore = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.get(indexScoreObj);
    +            double area = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector.get(areaObj);
                 long fp = PrimitiveObjectInspectorFactory.writableLongObjectInspector.get(fpObj);
                 long tp = PrimitiveObjectInspectorFactory.writableLongObjectInspector.get(tpObj);
                 long fpPrev = PrimitiveObjectInspectorFactory.writableLongObjectInspector.get(fpPrevObj);
                 long tpPrev = PrimitiveObjectInspectorFactory.writableLongObjectInspector.get(tpPrevObj);
     
    +            Map<Double, Double> areaPartialMap = (Map<Double, Double>) ObjectInspectorFactory.getStandardMapObjectInspector(
    +                PrimitiveObjectInspectorFactory.writableDoubleObjectInspector,
    +                PrimitiveObjectInspectorFactory.writableLongObjectInspector).getMap(
    +                HiveUtils.castLazyBinaryObject(areaPartialMapObj));
    +
    +            Map<Double, Long> fpPartialMap = (Map<Double, Long>) ObjectInspectorFactory.getStandardMapObjectInspector(
    --- End diff --
    
    TerminatePartial returns Writable objects but receiving Java objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #63: [HIVEMALL-90] Refine incomplete AUC UDAF imple...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/63
  
    @takuti terminatePartial/merge OIs are invalid ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #63: [HIVEMALL-90] Refine incomplete AUC UDA...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/63#discussion_r110553905
  
    --- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java ---
    @@ -35,7 +39,9 @@
     import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
     import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.AbstractAggregationBuffer;
     import org.apache.hadoop.hive.serde2.io.DoubleWritable;
    +import org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryMap;
    --- End diff --
    
    Unused import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #63: [HIVEMALL-90] Refine incomplete AUC UDAF imple...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/63
  
    @myui fixed. plz check them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #63: [HIVEMALL-90] Refine incomplete AUC UDA...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/63#discussion_r110554001
  
    --- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java ---
    @@ -188,13 +234,19 @@ public void iterate(AggregationBuffer agg, Object[] parameters) throws HiveExcep
             public Object terminatePartial(AggregationBuffer agg) throws HiveException {
                 ClassificationAUCAggregationBuffer myAggr = (ClassificationAUCAggregationBuffer) agg;
     
    -            Object[] partialResult = new Object[6];
    -            partialResult[0] = new DoubleWritable(myAggr.a);
    -            partialResult[1] = new DoubleWritable(myAggr.scorePrev);
    +            Object[] partialResult = new Object[11];
    +            partialResult[0] = new DoubleWritable(myAggr.indexScore);
    +            partialResult[1] = new DoubleWritable(myAggr.area);
                 partialResult[2] = new LongWritable(myAggr.fp);
                 partialResult[3] = new LongWritable(myAggr.tp);
                 partialResult[4] = new LongWritable(myAggr.fpPrev);
                 partialResult[5] = new LongWritable(myAggr.tpPrev);
    +            partialResult[6] = myAggr.areaPartialMap;
    --- End diff --
    
    revise OI. `javaDoubleObjectInspector` instead of `writableDoubleObjectInspector`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #63: [HIVEMALL-90] Refine incomplete AUC UDAF imple...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/63
  
    
    [![Coverage Status](https://coveralls.io/builds/10658042/badge)](https://coveralls.io/builds/10658042)
    
    Coverage increased (+0.2%) to 36.945% when pulling **49375798bdce6644b77da537b2501fcb303cb8bd on takuti:fix-auc** into **cb63532aa117b22092ced6b116ed2e4047cae447 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #63: [HIVEMALL-90] Refine incomplete AUC UDAF imple...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/63
  
    
    [![Coverage Status](https://coveralls.io/builds/11029894/badge)](https://coveralls.io/builds/11029894)
    
    Coverage increased (+0.6%) to 37.294% when pulling **810f5409eba8dff131e5d3b44069fb1182fa46cc on takuti:fix-auc** into **cb63532aa117b22092ced6b116ed2e4047cae447 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #63: [HIVEMALL-90] Refine incomplete AUC UDAF imple...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/63
  
    @takuti LGTM. Merged to small modifications. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---