You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Saurabh Mishra <sa...@outlook.com> on 2012/10/20 16:41:44 UTC

Hive UDAF Limitation on Internally used Collections

Hi,

I am trying to write a UDAF for merging a group of rows, such that the resulting merged string, has no duplicates. This duplicate check is case insensitive (I am not aware if there is any inbuilt function for this or not, since i tried looking on the https://cwiki.apache.org/Hive/languagemanual-udf.html page and found nothing).
Well the problem i am facing is that, when using this  UDAF, an error is thrown :
java.lang.NoSuchMethodException: java.util.Set.<init>()
 and i noticed that i was using a Set collection object inside the UDAF to store the results. This made me think that there might be some limitation on collections which can be used inside a Hive UDAF, UDF or UDTF function. 
Kindly someone provide some information on this.

Here is the UDAF Class i created : Also attached at the end is the entire error stack trace :
/**
 * 
 */
package com.thomsonreuters.ims.util.udaf;

import java.util.Arrays;
import java.util.LinkedHashSet;
import java.util.Set;

import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;

/**
 * UDAF to return, merged and concatenated value of group of entries, after sorting out duplicates, case insensitively
 * 
 * @author Saurabh
 */
public class UDAFCaseInsensitiveDistinctMerge extends UDAF {

    /**
     * Default Separator Defined and used unless overriden.
     */
    private static final String DEFAULT_SEPARATOR = ";";

    /**
     * Nested Class to Store the Updated Set of Unique Entries and provide methods to interact with the stored Set of
     * Entries.
     * 
     * @author Saurabh
     * 
     */
    public static class UniqueEntries {
        private Set<String> uniqueEntries = new LinkedHashSet<String>();

        /**
         * Add to the Set of String, after converting the String to Upper Case
         * 
         * @param entry
         */
        public void addToSet(String entry) {
            uniqueEntries.add(entry.toUpperCase());
        }

        /**
         * Return the Unique values stored in the Set after sorting and separated by the default Separator.
         * 
         * @return mergedString
         */
        public String getMergedString() {
            String[] uniqueArray = (String[]) uniqueEntries.toArray();
            Arrays.sort(uniqueEntries.toArray());
            StringBuilder stringBuilder = new StringBuilder();
            for (String entry : uniqueArray) {
                stringBuilder.append(entry).append(DEFAULT_SEPARATOR);
            }

            return stringBuilder.substring(0, stringBuilder.length() - 1);
        }

        /**
         * Add a Collection of Strings to the Existing String Set
         * 
         * @param entriesCollection
         */
        public void addCollectionToSet(Set<String> entriesCollection) {
            uniqueEntries.addAll(entriesCollection);
        }

        /**
         * Retrieve the Stored String Set
         * 
         * @return stringSet
         */
        public Set<String> getUniqueEntriesCollection() {
            return uniqueEntries;
        }

        /**
         * Clear the Entries of the String Set.
         */
        public void clear() {
            uniqueEntries.clear();
        }
    }

    /**
     * Private Constructor to Prevent the Instantiation of the Class.
     */
    private UDAFCaseInsensitiveDistinctMerge() {
        // Prevent Instantiation
    }

    /**
     * Provided Evaluator Implementation to Process the Group of Rows Passed to the UDAF.
     * 
     * @author Saurabh
     * 
     */
    public class UDAFCaseInsensitiveDistinctMergeEvaluator implements UDAFEvaluator {

        private UniqueEntries uniqueEntries;

        /**
         * Constructor to initialize the intermediate Storage Class and invoking the init method.
         */
        public UDAFCaseInsensitiveDistinctMergeEvaluator() {
            super();
            uniqueEntries = new UniqueEntries();
            init();
        }

        /**
         * Initializing Method. Clearing the Stored String Set.
         */
        public void init() {
            uniqueEntries.clear();
        }

        /**
         * Iterating over the group of rows and passing their values to the String set for non duplicate storage.
         * 
         * @param entry
         * @return
         */
        public boolean iterate(String entry) {
            uniqueEntries.addToSet(entry);
            return true;
        }

        /**
         * Handle for Partially Terminated UDAF Evaluation Call.
         * 
         * @return
         */
        public UniqueEntries terminatePartial() {
            return uniqueEntries;
        }

        /**
         * Handler for resuming Partially Terminated Evaluation Call.
         * 
         * @param previousEntries
         * @return
         */
        public boolean merge(UniqueEntries previousEntries) {
            uniqueEntries.addCollectionToSet(previousEntries.getUniqueEntriesCollection());
            return true;
        }

        /**
         * Finalizing call to return the result of passed set of values.
         * 
         * @return
         */
        public String terminate() {
            return uniqueEntries.getMergedString();
        }
    }
}

FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.NoSuchMethodException: java.util.Set.<init>())
java.lang.RuntimeException: java.lang.NoSuchMethodException: java.util.Set.<init>()
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at org.apache.hadoop.hive.serde2.objectinspector.ReflectionStructObjectInspector.create(ReflectionStructObjectInspector.java:166)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:214)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:116)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:210)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:116)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.<init>(GenericUDFUtils.java:300)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:129)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:2181)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:2469)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3180)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5422)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6018)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6603)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.lang.NoSuchMethodException: java.util.Set.<init>()
        at java.lang.Class.getConstructor0(Class.java:2706)
        at java.lang.Class.getDeclaredConstructor(Class.java:1985)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
        ... 24 more