You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by xm_zzc <44...@qq.com> on 2017/07/28 09:52:10 UTC
org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
execute, tree:
Hi guys:
I run CarbonData(branch master ) + Spark 2.1.1 with on yarn-client mode,
there is en error when i execute select sql, the details are as follows:
My env: CarbonData(branch master, 2456 commits) + Spark 2.1.1, run on
yarn-client mode;
spark shell: */opt/spark2/bin/spark-shell --master yarn --deploy-mode
client --files
/opt/spark2/conf/log4j_all.properties#log4j.properties,/opt/spark2/conf/carbon.properties
--driver-memory 6g --num-executors 6 --executor-memory 5g --executor-cores 1
--driver-library-path :/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
--jars
/opt/spark2/carbonlib/carbondata_2.11-1.2.0-shade-hadoop2.6.0-cdh5.7.1.jar*;
carbon.properties:
* carbon.storelocation=hdfs://hdtcluster/carbon_store
carbon.ddl.base.hdfs.url=hdfs://hdtcluster/carbon_base_path
carbon.bad.records.action=FORCE
carbon.badRecords.location=/opt/carbondata/badrecords
carbon.use.local.dir=true
carbon.use.multiple.temp.dir=true
carbon.sort.file.buffer.size=20
carbon.graph.rowset.size=100000
carbon.number.of.cores.while.loading=6
carbon.sort.size=500000
carbon.enableXXHash=true
carbon.number.of.cores.while.compacting=2
carbon.compaction.level.threshold=2,4
carbon.major.compaction.size=1024
carbon.enable.auto.load.merge=true
carbon.number.of.cores=4
carbon.inmemory.record.size=120000
carbon.enable.quick.filter=false
carbon.timestamp.format=yyyy-MM-dd HH:mm:ss
carbon.date.format=yyyy-MM-dd
carbon.lock.type=HDFSLOCK
enable.unsafe.columnpage=true*
my code:
* import org.apache.spark.sql.SaveMode
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
sc.setLogLevel("DEBUG")
val carbon =
SparkSession.builder().appName("TestCarbonData").config(sc.getConf)
.getOrCreateCarbonSession("hdfs://hdtcluster/carbon_store",
"/opt/carbondata/carbon.metastore")
carbon.conf.set("spark.sql.parquet.binaryAsString", true)
val testParquet = carbon.read.parquet("/tmp/cp_hundred_million")
testParquet.createOrReplaceTempView("test_distinct")
val orderedCols = carbon.sql("""
select chan, acarea, cache, code, rt, ts, fcip, url, size, host,
bsize, upsize, fvarf, fratio,
ua, uabro, uabrov, uaos, uaptfm, uadvc, msecdl, refer, pdate,
ptime, ftype
from test_distinct
""")
println(orderedCols.count())
carbon.sql("""
| CREATE TABLE IF NOT EXISTS carbondata_hundred_million_pr1198 (
| chan string,
| acarea string,
| cache string,
| code int,
| rt string,
| ts int,
| fcip string,
| url string,
| size bigint,
| host string,
| bsize bigint,
| upsize bigint,
| fvarf string,
| fratio int,
| ua string,
| uabro string,
| uabrov string,
| uaos string,
| uaptfm string,
| uadvc string,
| msecdl bigint,
| refer string,
| pdate string,
| ptime string,
| ftype string
| )
| STORED BY 'carbondata'
| TBLPROPERTIES('DICTIONARY_INCLUDE'='chan, acarea, cache, rt,
ts, fcip, ua, uabro, uabrov, uaos, uaptfm, uadvc, refer, ftype',
| 'NO_INVERTED_INDEX'='pdate, ptime',
| 'TABLE_BLOCKSIZE'='512'
| )
""".stripMargin)
carbon.catalog.listDatabases.show(false)
carbon.catalog.listTables.show(false)
orderedCols.write
.format("carbondata")
.option("tableName", "carbondata_hundred_million_pr1198")
.option("tempCSV", "false")
.option("compress", "true")
.option("single_pass", "true")
.mode(SaveMode.Append)
.save()
carbon.sql("""
select count(1) from
default.carbondata_hundred_million_pr1198
""").show(100)
carbon.sql("""
SHOW SEGMENTS FOR TABLE
default.carbondata_hundred_million_pr1198 limit 100
""").show*
data loading is successful, but when execute select sql, an error
occurred:
*org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
tree:
Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)],
output=[count#253L])
+- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default,
Table name :carbondata_hundred_million_pr1198, Schema
:Some(StructType(StructField(chan,StringType,true),
StructField(acarea,StringType,true), StructField(cache,StringType,true),
StructField(code,IntegerType,true), StructField(rt,StringType,true),
StructField(ts,IntegerType,true), StructField(fcip,StringType,true),
StructField(url,StringType,true), StructField(size,LongType,true),
StructField(host,StringType,true), StructField(bsize,LongType,true),
StructField(upsize,LongType,true), StructField(fvarf,StringType,true),
StructField(fratio,IntegerType,true), StructField(ua,StringType,true),
StructField(uabro,StringType,true), StructField(uabrov,StringType,true),
StructField(uaos,StringType,true), StructField(uaptfm,StringType,true),
StructField(uadvc,StringType,true), StructField(msecdl,LongType,true),
StructField(refer,StringType,true), StructField(pdate,StringType,true),
StructField(ptime,StringType,true), StructField(ftype,StringType,true))) ]
default.carbondata_hundred_million_pr1198[]
at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at
org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:112)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235)
at
org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
at
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:368)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at
org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
at
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2128)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2127)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
at org.apache.spark.sql.Dataset.show(Dataset.scala:638)
at org.apache.spark.sql.Dataset.show(Dataset.scala:597)
... 53 elided
Caused by: java.lang.ClassCastException: cannot assign instance of
scala.collection.immutable.List$SerializationProxy to field
scala.collection.convert.Wrappers$SeqWrapper.underlying of type
scala.collection.Seq in instance of
scala.collection.convert.Wrappers$SeqWrapper
at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at
org.apache.carbondata.hadoop.util.ObjectSerializationUtil.convertStringToObject(ObjectSerializationUtil.java:99)
at
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getTableInfo(CarbonTableInputFormat.java:124)
at
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:134)
at
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:268)
at
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:82)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
at
org.apache.spark.sql.execution.exchange.ShuffleExchange$.prepareShuffleDependency(ShuffleExchange.scala:261)
at
org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:84)
at
org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:121)
at
org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:112)
at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
... 85 more*
When I ran CarbonData(branch master, 2445 commits) + Spark 2.1.1, it was
successful. Please help me.?
Thanks.
--
View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/org-apache-spark-sql-catalyst-errors-package-TreeNodeException-execute-tree-tp19021.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.
Re: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
execute, tree:
Posted by xm_zzc <44...@qq.com>.
This problem has been resolved by jacky, please see pr:
https://github.com/apache/carbondata/pull/1211.
--
View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/org-apache-spark-sql-catalyst-errors-package-TreeNodeException-execute-tree-tp19021p19129.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.