You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Terje Marthinussen <tm...@gmail.com> on 2010/10/05 13:38:22 UTC

Analyze table compute statistics errors and OOM

Just tested analyze table with a trunk build (from yesterday, oct 4th).

tried various variations (with or without partitions) of it, but regardless
of what I try, I either get:
--
analyze table normalized  compute
statistics;

FAILED: Error in semantic analysis: Table is partitioned and partition
specification is needed
--
Fair enough if it is not supported, but specifying no partitions seems to be
supported according to the docs at
http://wiki.apache.org/hadoop/Hive/StatsDev ?

--
analyze table normalized  partition(intdate) compute statistics;
FAILED: Error in semantic analysis: line 1:36 Dynamic partition cannot be
the parent of a static partition intdate
--
ok, I may understand this (or maybe not :)) may be good to add some notes
about it on the wiki though

Then the OOM:
analyze table normalized
partition(intdate,country,logtype,service,hostname,filedate,filedate_ext)
compute statistics;
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.zip.InflaterInputStream.<init>(InflaterInputStream.java:71)
    at java.util.zip.ZipFile$1.<init>(ZipFile.java:212)
    at java.util.zip.ZipFile.getInputStream(ZipFile.java:212)
    at java.util.zip.ZipFile.getInputStream(ZipFile.java:180)
    at java.util.jar.JarFile.getManifestFromReference(JarFile.java:167)
    at java.util.jar.JarFile.getManifest(JarFile.java:148)
    at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:696)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:228)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at
org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:262)
    at
org.datanucleus.jdo.state.JDOStateManagerImpl.isLoaded(JDOStateManagerImpl.java:2020)
    at
org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetsortCols(MStorageDescriptor.java)
    at
org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getSortCols(MStorageDescriptor.java:206)
    at
org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:759)
    at
org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:859)
    at
org.apache.hadoop.hive.metastore.ObjectStore.convertToParts(ObjectStore.java:896)
    at
org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:886)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1333)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1330)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:234)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:1330)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_ps(HiveMetaStore.java:1760)
    at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:515)
    at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:1267)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.setupStats(SemanticAnalyzer.java:5793)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genTablePlan(SemanticAnalyzer.java:5603)


the actual stack is different from each execution of analyze.

Another version:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
    at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:597)
    at java.lang.StringBuilder.append(StringBuilder.java:212)
    at
org.datanucleus.JDOClassLoaderResolver.newCacheKey(JDOClassLoaderResolver.java:382)
    at
org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:173)
    at
org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:412)
    at
org.datanucleus.store.mapped.mapping.EmbeddedMapping.getJavaType(EmbeddedMapping.java:574)
    at
org.datanucleus.store.mapped.mapping.EmbeddedMapping.getObject(EmbeddedMapping.java:455)
    at
org.datanucleus.store.mapped.scostore.ListStoreIterator.<init>(ListStoreIterator.java:94)
    at
org.datanucleus.store.rdbms.scostore.RDBMSListStoreIterator.<init>(RDBMSListStoreIterator.java:41)
    at
org.datanucleus.store.rdbms.scostore.RDBMSJoinListStore.listIterator(RDBMSJoinListStore.java:158)
    at
org.datanucleus.store.mapped.scostore.AbstractListStore.listIterator(AbstractListStore.java:84)
    at
org.datanucleus.store.mapped.scostore.AbstractListStore.iterator(AbstractListStore.java:74)
    at
org.datanucleus.store.types.sco.backed.List.loadFromStore(List.java:241)
    at org.datanucleus.store.types.sco.backed.List.iterator(List.java:494)
    at
org.apache.hadoop.hive.metastore.ObjectStore.convertToFieldSchemas(ObjectStore.java:706)
    at
org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:759)
    at
org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:859)
    at
org.apache.hadoop.hive.metastore.ObjectStore.convertToParts(ObjectStore.java:896)
    at
org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:886)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1333)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1330)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:234)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:1330)
    at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_ps(HiveMetaStore.java:1760)
    at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:515)
    at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:1267)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.setupStats(SemanticAnalyzer.java:5793)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genTablePlan(SemanticAnalyzer.java:5603)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5834)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6432)


Makes no difference if I limit this to a single partition or not or any
other variation of the partition specification.

It is a sequence file based table, dynamic and static partitions as well as
compression.

Best regards,
Terje

Re: Analyze table compute statistics errors and OOM

Posted by Ning Zhang <nz...@facebook.com>.
On Oct 5, 2010, at 4:38 AM, Terje Marthinussen wrote:

> Just tested analyze table with a trunk build (from yesterday, oct 4th).
> 
> tried various variations (with or without partitions) of it, but regardless
> of what I try, I either get:
> --
> analyze table normalized  compute
> statistics;
> 
> FAILED: Error in semantic analysis: Table is partitioned and partition
> specification is needed
> --
> Fair enough if it is not supported, but specifying no partitions seems to be
> supported according to the docs at
> http://wiki.apache.org/hadoop/Hive/StatsDev ?
> 
Sorry the design spec was out-dated. I've updated the wiki to reflect the syntax change.

> --
> analyze table normalized  partition(intdate) compute statistics;
> FAILED: Error in semantic analysis: line 1:36 Dynamic partition cannot be
> the parent of a static partition intdate
> --
If you have multiple partition columns, the order of them are important since they reflect the hierarchical DFS directory structure. So you have to specify the parent partition first and then sub-partitions. The partition spec has to be able to be mapped a *one* HDFS directory.  So partition (parent='val', subpart) is allowed but not partition (parent, subpart='val') or without giving parent in the spec. 

> ok, I may understand this (or maybe not :)) may be good to add some notes
> about it on the wiki though
> 
This may be a bug when the partition spec includes non-partition columns. I'll verify and file a JIRA for that. So in the partition spec you can only include partition columns, in the order they appear in the CREATE TABLE DDL. 

> Then the OOM:
> analyze table normalized
> partition(intdate,country,logtype,service,hostname,filedate,filedate_ext)
> compute statistics;
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>    at java.util.zip.InflaterInputStream.<init>(InflaterInputStream.java:71)
>    at java.util.zip.ZipFile$1.<init>(ZipFile.java:212)
>    at java.util.zip.ZipFile.getInputStream(ZipFile.java:212)
>    at java.util.zip.ZipFile.getInputStream(ZipFile.java:180)
>    at java.util.jar.JarFile.getManifestFromReference(JarFile.java:167)
>    at java.util.jar.JarFile.getManifest(JarFile.java:148)
>    at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:696)
>    at java.net.URLClassLoader.defineClass(URLClassLoader.java:228)
>    at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>    at
> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:262)
>    at
> org.datanucleus.jdo.state.JDOStateManagerImpl.isLoaded(JDOStateManagerImpl.java:2020)
>    at
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetsortCols(MStorageDescriptor.java)
>    at
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getSortCols(MStorageDescriptor.java:206)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:759)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:859)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToParts(ObjectStore.java:896)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:886)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1333)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1330)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:234)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:1330)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_ps(HiveMetaStore.java:1760)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:515)
>    at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:1267)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.setupStats(SemanticAnalyzer.java:5793)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genTablePlan(SemanticAnalyzer.java:5603)
> 
> 
> the actual stack is different from each execution of analyze.
> 
> Another version:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>    at java.util.Arrays.copyOf(Arrays.java:2882)
>    at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>    at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:597)
>    at java.lang.StringBuilder.append(StringBuilder.java:212)
>    at
> org.datanucleus.JDOClassLoaderResolver.newCacheKey(JDOClassLoaderResolver.java:382)
>    at
> org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:173)
>    at
> org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:412)
>    at
> org.datanucleus.store.mapped.mapping.EmbeddedMapping.getJavaType(EmbeddedMapping.java:574)
>    at
> org.datanucleus.store.mapped.mapping.EmbeddedMapping.getObject(EmbeddedMapping.java:455)
>    at
> org.datanucleus.store.mapped.scostore.ListStoreIterator.<init>(ListStoreIterator.java:94)
>    at
> org.datanucleus.store.rdbms.scostore.RDBMSListStoreIterator.<init>(RDBMSListStoreIterator.java:41)
>    at
> org.datanucleus.store.rdbms.scostore.RDBMSJoinListStore.listIterator(RDBMSJoinListStore.java:158)
>    at
> org.datanucleus.store.mapped.scostore.AbstractListStore.listIterator(AbstractListStore.java:84)
>    at
> org.datanucleus.store.mapped.scostore.AbstractListStore.iterator(AbstractListStore.java:74)
>    at
> org.datanucleus.store.types.sco.backed.List.loadFromStore(List.java:241)
>    at org.datanucleus.store.types.sco.backed.List.iterator(List.java:494)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToFieldSchemas(ObjectStore.java:706)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:759)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:859)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToParts(ObjectStore.java:896)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:886)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1333)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1330)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:234)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:1330)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_ps(HiveMetaStore.java:1760)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:515)
>    at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:1267)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.setupStats(SemanticAnalyzer.java:5793)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genTablePlan(SemanticAnalyzer.java:5603)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5834)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6432)
> 
> 
> Makes no difference if I limit this to a single partition or not or any
> other variation of the partition specification.
> 
> It is a sequence file based table, dynamic and static partitions as well as
> compression.
> 
> Best regards,
> Terje