You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Chalcy Raja <Ch...@careerbuilder.com> on 2012/10/12 22:33:32 UTC

Hive bug in Partition

We are using Hive 8 as part of CDH4.0.1.  We noticed a bug in Partition.

Say if my hive table is partitioned by mydate and to drop a partition I mistype the name of the partition, like,
Alter table mytable drop partition (yourdate='2012-09-10') , this will drop all the mytable paritions (mydate).  The whole table is empty.  This is scary.

I could not find any Jira for that. Have anybody experienced this bug?  I can create a Jira for this bug if nobody has heard of this bug.

Thanks,
Chalcy

Hive UDAF Limitation on Internally used Collections

Posted by Saurabh Mishra <sa...@outlook.com>.
Hi,

I am trying to write a UDAF for merging a group of rows, such that the resulting merged string, has no duplicates. This duplicate check is case insensitive (I am not aware if there is any inbuilt function for this or not, since i tried looking on the https://cwiki.apache.org/Hive/languagemanual-udf.html page and found nothing).
Well the problem i am facing is that, when using this  UDAF, an error is thrown :
java.lang.NoSuchMethodException: java.util.Set.<init>()
 and i noticed that i was using a Set collection object inside the UDAF to store the results. This made me think that there might be some limitation on collections which can be used inside a Hive UDAF, UDF or UDTF function. 
Kindly someone provide some information on this.

Here is the UDAF Class i created : Also attached at the end is the entire error stack trace :
/**
 * 
 */
package com.thomsonreuters.ims.util.udaf;

import java.util.Arrays;
import java.util.LinkedHashSet;
import java.util.Set;

import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;

/**
 * UDAF to return, merged and concatenated value of group of entries, after sorting out duplicates, case insensitively
 * 
 * @author Saurabh
 */
public class UDAFCaseInsensitiveDistinctMerge extends UDAF {

    /**
     * Default Separator Defined and used unless overriden.
     */
    private static final String DEFAULT_SEPARATOR = ";";

    /**
     * Nested Class to Store the Updated Set of Unique Entries and provide methods to interact with the stored Set of
     * Entries.
     * 
     * @author Saurabh
     * 
     */
    public static class UniqueEntries {
        private Set<String> uniqueEntries = new LinkedHashSet<String>();

        /**
         * Add to the Set of String, after converting the String to Upper Case
         * 
         * @param entry
         */
        public void addToSet(String entry) {
            uniqueEntries.add(entry.toUpperCase());
        }

        /**
         * Return the Unique values stored in the Set after sorting and separated by the default Separator.
         * 
         * @return mergedString
         */
        public String getMergedString() {
            String[] uniqueArray = (String[]) uniqueEntries.toArray();
            Arrays.sort(uniqueEntries.toArray());
            StringBuilder stringBuilder = new StringBuilder();
            for (String entry : uniqueArray) {
                stringBuilder.append(entry).append(DEFAULT_SEPARATOR);
            }

            return stringBuilder.substring(0, stringBuilder.length() - 1);
        }

        /**
         * Add a Collection of Strings to the Existing String Set
         * 
         * @param entriesCollection
         */
        public void addCollectionToSet(Set<String> entriesCollection) {
            uniqueEntries.addAll(entriesCollection);
        }

        /**
         * Retrieve the Stored String Set
         * 
         * @return stringSet
         */
        public Set<String> getUniqueEntriesCollection() {
            return uniqueEntries;
        }

        /**
         * Clear the Entries of the String Set.
         */
        public void clear() {
            uniqueEntries.clear();
        }
    }

    /**
     * Private Constructor to Prevent the Instantiation of the Class.
     */
    private UDAFCaseInsensitiveDistinctMerge() {
        // Prevent Instantiation
    }

    /**
     * Provided Evaluator Implementation to Process the Group of Rows Passed to the UDAF.
     * 
     * @author Saurabh
     * 
     */
    public class UDAFCaseInsensitiveDistinctMergeEvaluator implements UDAFEvaluator {

        private UniqueEntries uniqueEntries;

        /**
         * Constructor to initialize the intermediate Storage Class and invoking the init method.
         */
        public UDAFCaseInsensitiveDistinctMergeEvaluator() {
            super();
            uniqueEntries = new UniqueEntries();
            init();
        }

        /**
         * Initializing Method. Clearing the Stored String Set.
         */
        public void init() {
            uniqueEntries.clear();
        }

        /**
         * Iterating over the group of rows and passing their values to the String set for non duplicate storage.
         * 
         * @param entry
         * @return
         */
        public boolean iterate(String entry) {
            uniqueEntries.addToSet(entry);
            return true;
        }

        /**
         * Handle for Partially Terminated UDAF Evaluation Call.
         * 
         * @return
         */
        public UniqueEntries terminatePartial() {
            return uniqueEntries;
        }

        /**
         * Handler for resuming Partially Terminated Evaluation Call.
         * 
         * @param previousEntries
         * @return
         */
        public boolean merge(UniqueEntries previousEntries) {
            uniqueEntries.addCollectionToSet(previousEntries.getUniqueEntriesCollection());
            return true;
        }

        /**
         * Finalizing call to return the result of passed set of values.
         * 
         * @return
         */
        public String terminate() {
            return uniqueEntries.getMergedString();
        }
    }
}

FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.NoSuchMethodException: java.util.Set.<init>())
java.lang.RuntimeException: java.lang.NoSuchMethodException: java.util.Set.<init>()
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at org.apache.hadoop.hive.serde2.objectinspector.ReflectionStructObjectInspector.create(ReflectionStructObjectInspector.java:166)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:214)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:116)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:210)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:116)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.<init>(GenericUDFUtils.java:300)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:129)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:2181)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:2469)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3180)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5422)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6018)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6603)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.lang.NoSuchMethodException: java.util.Set.<init>()
        at java.lang.Class.getConstructor0(Class.java:2706)
        at java.lang.Class.getDeclaredConstructor(Class.java:1985)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
        ... 24 more

 		 	   		  

RE: Hive bug in Partition

Posted by Chalcy Raja <Ch...@careerbuilder.com>.
Of course we have a trash policy in place.  So I am worried about losing data instantly.  It drops all the paritions of the table one by one.

Finding the bug was accidental and then realize how scary if we had to lose all partitions.  Fixing it may prevent some accidental dropping.

Thanks,
Chalcy

From: Viral Bajaria [mailto:viral.bajaria@gmail.com]
Sent: Friday, October 12, 2012 5:24 PM
To: user@hive.apache.org
Subject: Re: Hive bug in Partition

Not sure about this bug and don't have a test table right now on which I can try this or rather don't want to try this on a table :-)

Did you lose the entire table ? If your HDFS trash policy is the default 24 hours, you should be able to recover and run a quick load command to restore the metadata in hive metastore.

FWIW, we run drop commands using an abstract syntax and not directly from the command-line and so we avoid these typo issues by doing that.

On Fri, Oct 12, 2012 at 1:33 PM, Chalcy Raja <Ch...@careerbuilder.com>> wrote:

We are using Hive 8 as part of CDH4.0.1.  We noticed a bug in Partition.

Say if my hive table is partitioned by mydate and to drop a partition I mistype the name of the partition, like,
Alter table mytable drop partition (yourdate='2012-09-10') , this will drop all the mytable paritions (mydate).  The whole table is empty.  This is scary.

I could not find any Jira for that. Have anybody experienced this bug?  I can create a Jira for this bug if nobody has heard of this bug.

Thanks,
Chalcy


Re: Hive bug in Partition

Posted by Viral Bajaria <vi...@gmail.com>.
Not sure about this bug and don't have a test table right now on which I
can try this or rather don't want to try this on a table :-)

Did you lose the entire table ? If your HDFS trash policy is the default 24
hours, you should be able to recover and run a quick load command to
restore the metadata in hive metastore.

FWIW, we run drop commands using an abstract syntax and not directly from
the command-line and so we avoid these typo issues by doing that.

On Fri, Oct 12, 2012 at 1:33 PM, Chalcy Raja
<Ch...@careerbuilder.com>wrote:

>  ** **
>
> We are using Hive 8 as part of CDH4.0.1.  We noticed a bug in Partition.
> ****
>
> ** **
>
> Say if my hive table is partitioned by *mydate* and to drop a partition I
> mistype the name of the partition, like,****
>
> Alter table mytable drop partition (*yourdate*=’2012-09-10’) , this will
> drop all the mytable paritions (*mydate*).  The whole table is empty.
> This is scary.****
>
> ** **
>
> I could not find any Jira for that. Have anybody experienced this bug?  I
> can create a Jira for this bug if nobody has heard of this bug.****
>
> ** **
>
> Thanks,****
>
> Chalcy ****
>