You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2014/10/13 22:48:33 UTC

[jira] [Resolved] (SPARK-3807) SparkSql does not work for tables created using custom serde

     [ https://issues.apache.org/jira/browse/SPARK-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Armbrust resolved SPARK-3807.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.2.0

Issue resolved by pull request 2674
[https://github.com/apache/spark/pull/2674]

> SparkSql does not work for tables created using custom serde
> ------------------------------------------------------------
>
>                 Key: SPARK-3807
>                 URL: https://issues.apache.org/jira/browse/SPARK-3807
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: chirag aggarwal
>             Fix For: 1.2.0, 1.1.1
>
>
> SparkSql crashes on selecting tables using custom serde. 
> Example:
> ----------------
> CREATE EXTERNAL TABLE table_name PARTITIONED BY ( a int) ROW FORMAT 'SERDE "org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer" with serdeproperties("serialization.format"="org.apache.thrift.protocol.TBinaryProtocol","serialization.class"="ser_class") STORED AS SEQUENCEFILE;
> The following exception is seen on running a query like 'select * from table_name limit 1': 
> ERROR CliDriver: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException 
> at org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer.initialize(ThriftDeserializer.java:68) 
> at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:80) 
> at org.apache.spark.sql.hive.execution.HiveTableScan.addColumnMetadataToConf(HiveTableScan.scala:86) 
> at org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:100) 
> at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) 
> at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) 
> at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) 
> at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) 
> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) 
> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) 
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 
> at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) 
> at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) 
> at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:280) 
> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) 
> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) 
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 
> at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) 
> at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) 
> at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) 
> at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) 
> at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) 
> at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406) 
> at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59) 
> at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) 
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) 
> at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) 
> at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) 
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) 
> at java.lang.reflect.Method.invoke(Unknown Source) 
> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) 
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
> Caused by: java.lang.NullPointerException
> After fixing this issue, when some columns in the table were referred in the query, sparksql could not resolve those references.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org