You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anand Nalya <an...@gmail.com> on 2015/10/27 16:35:09 UTC

SparkSQL on hive error

Hi,

I've a partitioned table in Hive (Avro) that I can query alright from hive
cli.

When using SparkSQL, I'm able to query some of the partitions, but getting
exception on some of the partitions.

The query is:

sqlContext.sql("select * from myTable where source='http' and date =
20150812").take(5).foreach(println)

The exception is:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
2.0 (TID 5, node1): java.lang.IllegalArgumentException: Error: type
expected at the position 0 of
'BIGINT:INT:INT:INT:INT:string:INT:string:string:string:string:string:string:string:string:string:string:string:string:string:string:INT:INT:string:BIGINT:string:string:BIGINT:BIGINT:string:string:string:string:string:FLOAT:FLOAT:string:string:string:string:BIGINT:BIGINT:string:string:string:string:string:string:BIGINT:string:string'
but 'BIGINT' is found.
    at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348)
    at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
    at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392)
    at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
    at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:762)
    at
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:105)
    at
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$4$$anonfun$9.apply(TableReader.scala:191)
    at
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$4$$anonfun$9.apply(TableReader.scala:188)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Any pointers, what might be wrong here?

Regards,
Anand

RE: SparkSQL on hive error

Posted by "Cheng, Hao" <ha...@intel.com>.
Hi Anand, can you paste the table creating statement? I’d like to reproduce that in my local first, and BTW, which version are you using?

Hao

From: Anand Nalya [mailto:anand.nalya@gmail.com]
Sent: Tuesday, October 27, 2015 11:35 PM
To: spark users
Subject: SparkSQL on hive error

Hi,
I've a partitioned table in Hive (Avro) that I can query alright from hive cli.
When using SparkSQL, I'm able to query some of the partitions, but getting  exception on some of the partitions.
The query is:
sqlContext.sql("select * from myTable where source='http' and date = 20150812").take(5).foreach(println)
The exception is:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, node1): java.lang.IllegalArgumentException: Error: type expected at the position 0 of 'BIGINT:INT:INT:INT:INT:string:INT:string:string:string:string:string:string:string:string:string:string:string:string:string:string:INT:INT:string:BIGINT:string:string:BIGINT:BIGINT:string:string:string:string:string:FLOAT:FLOAT:string:string:string:string:BIGINT:BIGINT:string:string:string:string:string:string:BIGINT:string:string' but 'BIGINT' is found.
    at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348)
    at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
    at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392)
    at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
    at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:762)
    at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:105)
    at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$4$$anonfun$9.apply(TableReader.scala:191)
    at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$4$$anonfun$9.apply(TableReader.scala:188)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Any pointers, what might be wrong here?

Regards,
Anand