You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by takuti <gi...@git.apache.org> on 2017/03/03 08:52:47 UTC

[GitHub] incubator-hivemall pull request #58: [WIP][HIVEMALL-24] Scalable field-aware...

GitHub user takuti opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/58

    [WIP][HIVEMALL-24] Scalable field-aware factorization machines

    ## What changes were proposed in this pull request?
    
    Make `ffm_predict` function more scalable by creating its UDAF implementation
    
    ## What type of PR is it?
    
    Improvement
    
    ## What is the Jira issue?
    
    https://issues.apache.org/jira/browse/HIVEMALL-24
    
    ## How was this patch tested?
    
    Not yet
    
    ## How to use this feature?
    
    https://gist.github.com/takuti/c49dfe2d06cb12bf69bad30213d0afc3


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/takuti/incubator-hivemall ffm-predict-udaf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/58.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #58
    
----
commit 591e3b0f255e6523167157ed2e68c9499b4ca2cd
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-03T08:48:12Z

    Implement -ffm option in `feature_pairs`

commit 343f704dc4eeee25f195e298ee014368db1fef9e
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-03-03T08:48:26Z

    Create FFMPredictGenericUDAF

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #58: [WIP][HIVEMALL-24] Scalable field-aware factor...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/58
  
    I'm working on this issue in https://github.com/apache/incubator-hivemall/pull/105 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #58: [WIP][HIVEMALL-24] Scalable field-aware...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-hivemall/pull/58


---

[GitHub] incubator-hivemall pull request #58: [WIP][HIVEMALL-24] Scalable field-aware...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/58#discussion_r129273987
  
    --- Diff: core/src/main/java/hivemall/fm/FFMPredictGenericUDAF.java ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package hivemall.fm;
    +
    +import hivemall.utils.hadoop.HiveUtils;
    +import hivemall.utils.hadoop.WritableUtils;
    +import org.apache.hadoop.hive.ql.exec.Description;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
    +import org.apache.hadoop.hive.ql.metadata.HiveException;
    +import org.apache.hadoop.hive.ql.parse.SemanticException;
    +import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver;
    +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
    +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.AbstractAggregationBuffer;
    +import org.apache.hadoop.hive.serde2.io.DoubleWritable;
    +import org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector;
    +import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo;
    +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
    +
    +import javax.annotation.Nonnull;
    +import javax.annotation.Nullable;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +@Description(
    +        name = "ffm_predict",
    +        value = "_FUNC_(Float Wi, Float Wj, array<float> Vifj, array<float> Vjfi, float Xi, float Xj)"
    +                + " - Returns a prediction value in Double")
    +public final class FFMPredictGenericUDAF extends AbstractGenericUDAFResolver {
    +
    +    private FFMPredictGenericUDAF() {}
    +
    +    @Override
    +    public Evaluator getEvaluator(TypeInfo[] typeInfo) throws SemanticException {
    +        if (typeInfo.length != 5) {
    +            throw new UDFArgumentLengthException(
    +                "Expected argument length is 6 but given argument length was " + typeInfo.length);
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[0])) {
    +            throw new UDFArgumentTypeException(0,
    +                "Number type is expected for the first argument Wi: " + typeInfo[0].getTypeName());
    +        }
    +        if (typeInfo[1].getCategory() != Category.LIST) {
    +            throw new UDFArgumentTypeException(1,
    +                "List type is expected for the second argument Vifj: " + typeInfo[1].getTypeName());
    +        }
    +        if (typeInfo[2].getCategory() != Category.LIST) {
    +            throw new UDFArgumentTypeException(2,
    +                "List type is expected for the third argument Vjfi: " + typeInfo[2].getTypeName());
    +        }
    +        ListTypeInfo typeInfo1 = (ListTypeInfo) typeInfo[1];
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo1.getListElementTypeInfo())) {
    +            throw new UDFArgumentTypeException(1,
    +                "Number type is expected for the element type of list Vifj: "
    +                        + typeInfo1.getTypeName());
    +        }
    +        ListTypeInfo typeInfo2 = (ListTypeInfo) typeInfo[2];
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo2.getListElementTypeInfo())) {
    +            throw new UDFArgumentTypeException(2,
    +                "Number type is expected for the element type of list Vjfi: "
    +                        + typeInfo1.getTypeName());
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[3])) {
    +            throw new UDFArgumentTypeException(3,
    +                "Number type is expected for the third argument Xi: " + typeInfo[3].getTypeName());
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[4])) {
    +            throw new UDFArgumentTypeException(4,
    +                "Number type is expected for the third argument Xi: " + typeInfo[4].getTypeName());
    +        }
    +        return new Evaluator();
    +    }
    +
    +    public static class Evaluator extends GenericUDAFEvaluator {
    +
    +        // input OI
    +        private PrimitiveObjectInspector wiOI;
    +        private ListObjectInspector vijOI;
    +        private ListObjectInspector vjiOI;
    +        private PrimitiveObjectInspector xiOI;
    +        private PrimitiveObjectInspector xjOI;
    +
    +        // merge OI
    +        private StructObjectInspector internalMergeOI;
    +        private StructField sumField;
    +
    +        public Evaluator() {}
    +
    +        @Override
    +        public ObjectInspector init(Mode mode, ObjectInspector[] parameters) throws HiveException {
    +            assert (parameters.length == 5);
    +            super.init(mode, parameters);
    +
    +            // initialize input
    +            if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {// from original data
    +                this.wiOI = HiveUtils.asDoubleCompatibleOI(parameters[0]);
    +                this.vijOI = HiveUtils.asListOI(parameters[1]);
    +                this.vjiOI = HiveUtils.asListOI(parameters[2]);
    +                this.xiOI = HiveUtils.asDoubleCompatibleOI(parameters[3]);
    +                this.xjOI = HiveUtils.asDoubleCompatibleOI(parameters[4]);
    +            } else {// from partial aggregation
    +                StructObjectInspector soi = (StructObjectInspector) parameters[0];
    --- End diff --
    
    doubleOI is enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #58: [WIP][HIVEMALL-24] Scalable field-aware...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/58#discussion_r129273852
  
    --- Diff: core/src/main/java/hivemall/fm/FFMPredictGenericUDAF.java ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package hivemall.fm;
    +
    +import hivemall.utils.hadoop.HiveUtils;
    +import hivemall.utils.hadoop.WritableUtils;
    +import org.apache.hadoop.hive.ql.exec.Description;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
    +import org.apache.hadoop.hive.ql.metadata.HiveException;
    +import org.apache.hadoop.hive.ql.parse.SemanticException;
    +import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver;
    +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
    +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.AbstractAggregationBuffer;
    +import org.apache.hadoop.hive.serde2.io.DoubleWritable;
    +import org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector;
    +import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo;
    +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
    +
    +import javax.annotation.Nonnull;
    +import javax.annotation.Nullable;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +@Description(
    +        name = "ffm_predict",
    +        value = "_FUNC_(Float Wi, Float Wj, array<float> Vifj, array<float> Vjfi, float Xi, float Xj)"
    +                + " - Returns a prediction value in Double")
    +public final class FFMPredictGenericUDAF extends AbstractGenericUDAFResolver {
    +
    +    private FFMPredictGenericUDAF() {}
    +
    +    @Override
    +    public Evaluator getEvaluator(TypeInfo[] typeInfo) throws SemanticException {
    +        if (typeInfo.length != 5) {
    +            throw new UDFArgumentLengthException(
    +                "Expected argument length is 6 but given argument length was " + typeInfo.length);
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[0])) {
    +            throw new UDFArgumentTypeException(0,
    +                "Number type is expected for the first argument Wi: " + typeInfo[0].getTypeName());
    +        }
    +        if (typeInfo[1].getCategory() != Category.LIST) {
    +            throw new UDFArgumentTypeException(1,
    +                "List type is expected for the second argument Vifj: " + typeInfo[1].getTypeName());
    +        }
    +        if (typeInfo[2].getCategory() != Category.LIST) {
    +            throw new UDFArgumentTypeException(2,
    +                "List type is expected for the third argument Vjfi: " + typeInfo[2].getTypeName());
    +        }
    +        ListTypeInfo typeInfo1 = (ListTypeInfo) typeInfo[1];
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo1.getListElementTypeInfo())) {
    +            throw new UDFArgumentTypeException(1,
    +                "Number type is expected for the element type of list Vifj: "
    +                        + typeInfo1.getTypeName());
    +        }
    +        ListTypeInfo typeInfo2 = (ListTypeInfo) typeInfo[2];
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo2.getListElementTypeInfo())) {
    +            throw new UDFArgumentTypeException(2,
    +                "Number type is expected for the element type of list Vjfi: "
    +                        + typeInfo1.getTypeName());
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[3])) {
    +            throw new UDFArgumentTypeException(3,
    +                "Number type is expected for the third argument Xi: " + typeInfo[3].getTypeName());
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[4])) {
    +            throw new UDFArgumentTypeException(4,
    +                "Number type is expected for the third argument Xi: " + typeInfo[4].getTypeName());
    +        }
    +        return new Evaluator();
    +    }
    +
    +    public static class Evaluator extends GenericUDAFEvaluator {
    +
    +        // input OI
    +        private PrimitiveObjectInspector wiOI;
    +        private ListObjectInspector vijOI;
    +        private ListObjectInspector vjiOI;
    +        private PrimitiveObjectInspector xiOI;
    +        private PrimitiveObjectInspector xjOI;
    +
    +        // merge OI
    +        private StructObjectInspector internalMergeOI;
    +        private StructField sumField;
    +
    +        public Evaluator() {}
    +
    +        @Override
    +        public ObjectInspector init(Mode mode, ObjectInspector[] parameters) throws HiveException {
    +            assert (parameters.length == 5);
    +            super.init(mode, parameters);
    +
    +            // initialize input
    +            if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {// from original data
    +                this.wiOI = HiveUtils.asDoubleCompatibleOI(parameters[0]);
    +                this.vijOI = HiveUtils.asListOI(parameters[1]);
    +                this.vjiOI = HiveUtils.asListOI(parameters[2]);
    +                this.xiOI = HiveUtils.asDoubleCompatibleOI(parameters[3]);
    +                this.xjOI = HiveUtils.asDoubleCompatibleOI(parameters[4]);
    +            } else {// from partial aggregation
    +                StructObjectInspector soi = (StructObjectInspector) parameters[0];
    +                this.internalMergeOI = soi;
    +                this.sumField = soi.getStructFieldRef("sum");
    +            }
    +
    +            // initialize output
    +            final ObjectInspector outputOI;
    +            if (mode == Mode.PARTIAL1 || mode == Mode.PARTIAL2) {// terminatePartial
    +                outputOI = internalMergeOI();
    +            } else {
    +                outputOI = PrimitiveObjectInspectorFactory.writableDoubleObjectInspector;
    +            }
    +            return outputOI;
    +        }
    +
    +        private static StructObjectInspector internalMergeOI() {
    +            ArrayList<String> fieldNames = new ArrayList<String>();
    +            ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
    +
    +            fieldNames.add("sum");
    +            fieldOIs.add(PrimitiveObjectInspectorFactory.writableDoubleObjectInspector);
    +
    +            return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    +        }
    +
    +        @Override
    +        public FFMPredictAggregationBuffer getNewAggregationBuffer() throws HiveException {
    +            FFMPredictAggregationBuffer myAggr = new FFMPredictAggregationBuffer();
    +            reset(myAggr);
    +            return myAggr;
    +        }
    +
    +        @Override
    +        public void reset(@SuppressWarnings("deprecation") AggregationBuffer agg)
    +                throws HiveException {
    +            FFMPredictAggregationBuffer myAggr = (FFMPredictAggregationBuffer) agg;
    +            myAggr.reset();
    +        }
    +
    +        @Override
    +        public void iterate(@SuppressWarnings("deprecation") AggregationBuffer agg,
    +                Object[] parameters) throws HiveException {
    +            if (parameters[0] == null) {
    +                return;
    +            }
    +            FFMPredictAggregationBuffer myAggr = (FFMPredictAggregationBuffer) agg;
    +
    +            double wi = PrimitiveObjectInspectorUtils.getDouble(parameters[0], wiOI);
    +            if (parameters[3] == null && parameters[4] == null) {// Xi and Xj are null => global bias `w0`
    +                myAggr.addW0(wi);
    +            } else if (parameters[4] == null) {// Only Xi is nonnull => linear combination `wi` * `xi`
    +                double xi = PrimitiveObjectInspectorUtils.getDouble(parameters[3], xiOI);
    +                myAggr.addWiXi(wi, xi);
    +            } else {// both Xi and Xj are nonnull => <Vifj, Vjfi> Xi Xj
    +                if (parameters[1] == null || parameters[2] == null) {
    +                    throw new UDFArgumentException("The second and third arguments (Vij, Vji) must not be null");
    +                }
    +
    +                List<Float> vij = (List<Float>) vijOI.getList(parameters[1]);
    +                List<Float> vji = (List<Float>) vjiOI.getList(parameters[2]);
    +
    +                if (vij.size() != vji.size()) {
    +                    throw new HiveException("Mismatch in the number of factors");
    +                }
    +
    +                double xi = PrimitiveObjectInspectorUtils.getDouble(parameters[3], xiOI);
    +                double xj = PrimitiveObjectInspectorUtils.getDouble(parameters[4], xjOI);
    +
    +                myAggr.addViVjXiXj(vij, vji, xi, xj);
    +            }
    +        }
    +
    +        @Override
    +        public Object terminatePartial(@SuppressWarnings("deprecation") AggregationBuffer agg)
    +                throws HiveException {
    +            FFMPredictAggregationBuffer myAggr = (FFMPredictAggregationBuffer) agg;
    +
    +            final Object[] partialResult = new Object[1];
    +            return partialResult;
    --- End diff --
    
    partialResult is not filled..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #58: [WIP][HIVEMALL-24] Scalable field-aware factor...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/58
  
    ```sql
    WITH testing_fm_exploded as (
      select
        t1.rowid, 
        t2.i,
        t2.j,
        t2.Xi,
        t2.Xj
      from
        testing t1
        LATERAL VIEW ffm_pairs(features) t2 as i, j, Xi, Xj
    )
    select
      ffm_predict( -- UDAF
        t1.Xi,
        t1.Xj,
        p1.Wi,
        p1.Vi, -- Vij
        p2.Vi  -- Vji
      ) as predicted
    from 
      testing_fm_exploded t1
      LEFT OUTER JOIN fm_model p1 ON (p1.i = t1.i)
      LEFT OUTER JOIN fm_model p2 ON (p2.i = t1.j and p2.modelid = p1.modelid)  
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #58: [WIP][HIVEMALL-24] Scalable field-aware factor...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/58
  
    
    [![Coverage Status](https://coveralls.io/builds/10431698/badge)](https://coveralls.io/builds/10431698)
    
    Coverage decreased (-0.3%) to 36.46% when pulling **343f704dc4eeee25f195e298ee014368db1fef9e on takuti:ffm-predict-udaf** into **1cccf66829e2c9f6b6e51046c1fc8c19c9aff51b on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #58: [WIP][HIVEMALL-24] Scalable field-aware...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/58#discussion_r129274058
  
    --- Diff: core/src/main/java/hivemall/fm/FFMPredictGenericUDAF.java ---
    @@ -0,0 +1,272 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package hivemall.fm;
    +
    +import hivemall.utils.hadoop.HiveUtils;
    +import hivemall.utils.hadoop.WritableUtils;
    +import org.apache.hadoop.hive.ql.exec.Description;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
    +import org.apache.hadoop.hive.ql.metadata.HiveException;
    +import org.apache.hadoop.hive.ql.parse.SemanticException;
    +import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver;
    +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
    +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.AbstractAggregationBuffer;
    +import org.apache.hadoop.hive.serde2.io.DoubleWritable;
    +import org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector;
    +import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo;
    +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
    +
    +import javax.annotation.Nonnull;
    +import javax.annotation.Nullable;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +@Description(
    +        name = "ffm_predict",
    +        value = "_FUNC_(Float Wi, Float Wj, array<float> Vifj, array<float> Vjfi, float Xi, float Xj)"
    +                + " - Returns a prediction value in Double")
    +public final class FFMPredictGenericUDAF extends AbstractGenericUDAFResolver {
    +
    +    private FFMPredictGenericUDAF() {}
    +
    +    @Override
    +    public Evaluator getEvaluator(TypeInfo[] typeInfo) throws SemanticException {
    +        if (typeInfo.length != 5) {
    +            throw new UDFArgumentLengthException(
    +                "Expected argument length is 6 but given argument length was " + typeInfo.length);
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[0])) {
    +            throw new UDFArgumentTypeException(0,
    +                "Number type is expected for the first argument Wi: " + typeInfo[0].getTypeName());
    +        }
    +        if (typeInfo[1].getCategory() != Category.LIST) {
    +            throw new UDFArgumentTypeException(1,
    +                "List type is expected for the second argument Vifj: " + typeInfo[1].getTypeName());
    +        }
    +        if (typeInfo[2].getCategory() != Category.LIST) {
    +            throw new UDFArgumentTypeException(2,
    +                "List type is expected for the third argument Vjfi: " + typeInfo[2].getTypeName());
    +        }
    +        ListTypeInfo typeInfo1 = (ListTypeInfo) typeInfo[1];
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo1.getListElementTypeInfo())) {
    +            throw new UDFArgumentTypeException(1,
    +                "Number type is expected for the element type of list Vifj: "
    +                        + typeInfo1.getTypeName());
    +        }
    +        ListTypeInfo typeInfo2 = (ListTypeInfo) typeInfo[2];
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo2.getListElementTypeInfo())) {
    +            throw new UDFArgumentTypeException(2,
    +                "Number type is expected for the element type of list Vjfi: "
    +                        + typeInfo1.getTypeName());
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[3])) {
    +            throw new UDFArgumentTypeException(3,
    +                "Number type is expected for the third argument Xi: " + typeInfo[3].getTypeName());
    +        }
    +        if (!HiveUtils.isNumberTypeInfo(typeInfo[4])) {
    +            throw new UDFArgumentTypeException(4,
    +                "Number type is expected for the third argument Xi: " + typeInfo[4].getTypeName());
    +        }
    +        return new Evaluator();
    +    }
    +
    +    public static class Evaluator extends GenericUDAFEvaluator {
    +
    +        // input OI
    +        private PrimitiveObjectInspector wiOI;
    +        private ListObjectInspector vijOI;
    +        private ListObjectInspector vjiOI;
    +        private PrimitiveObjectInspector xiOI;
    +        private PrimitiveObjectInspector xjOI;
    +
    +        // merge OI
    +        private StructObjectInspector internalMergeOI;
    +        private StructField sumField;
    +
    +        public Evaluator() {}
    +
    +        @Override
    +        public ObjectInspector init(Mode mode, ObjectInspector[] parameters) throws HiveException {
    +            assert (parameters.length == 5);
    +            super.init(mode, parameters);
    +
    +            // initialize input
    +            if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {// from original data
    +                this.wiOI = HiveUtils.asDoubleCompatibleOI(parameters[0]);
    +                this.vijOI = HiveUtils.asListOI(parameters[1]);
    +                this.vjiOI = HiveUtils.asListOI(parameters[2]);
    +                this.xiOI = HiveUtils.asDoubleCompatibleOI(parameters[3]);
    +                this.xjOI = HiveUtils.asDoubleCompatibleOI(parameters[4]);
    +            } else {// from partial aggregation
    +                StructObjectInspector soi = (StructObjectInspector) parameters[0];
    +                this.internalMergeOI = soi;
    +                this.sumField = soi.getStructFieldRef("sum");
    +            }
    +
    +            // initialize output
    +            final ObjectInspector outputOI;
    +            if (mode == Mode.PARTIAL1 || mode == Mode.PARTIAL2) {// terminatePartial
    +                outputOI = internalMergeOI();
    --- End diff --
    
    doubleOI is enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #58: [WIP][HIVEMALL-24] Scalable field-aware factor...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/58
  
    ### feature_pairs()
    
    ```sql
    drop temporary function if exists feature_pairs;
    create temporary function feature_pairs as 'hivemall.ftvec.pairing.FeaturePairsUDTF';
    
    select
      t1.rowid,
      t2.i,
      t2.j,
      t2.Xi,
      t2.Xj
    from (
      select 1 as rowid, array("1:3:1", "1:11:2", "2:14:3", "3:19:4") as features
    ) t1
    LATERAL VIEW feature_pairs(features, "-ffm -feature_hashing 10 -num_fields 5") t2 as i, j, Xi, Xj;
    ```
    
    Output:
    
    ```
    t1.rowid        t2.i    t2.j    t2.xi   t2.xj
    1       0       NULL    NULL    NULL
    1       0       NULL    1.0     NULL
    1       16      56      1.0     2.0
    1       17      71      1.0     3.0
    1       18      96      1.0     4.0
    1       1       NULL    2.0     NULL
    1       57      71      2.0     3.0
    1       58      96      2.0     4.0
    1       2       NULL    3.0     NULL
    1       73      97      3.0     4.0
    1       3       NULL    4.0     NULL
    ```
    
    Todo:
    
    - Support a case that `field` is not specified e.g., `11:2`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---