You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by AlexModestov <Al...@gmail.com> on 2016/02/19 13:59:40 UTC

an error when I read data from parquet

Hello everybody,

I use Python API and Scala API. I read data without problem with Python API:

"sqlContext = SQLContext(sc)
data_full = sqlContext.read.parquet("---")"

But when I use Scala:

"val sqlContext = new SQLContext(sc)
val data_full = sqlContext.read.parquet("---")"

I get the error (I use Spark-Notebook may be it is important):
"java.lang.ExceptionInInitializerError
        at sun.misc.Unsafe.ensureClassInitialized(Native Method)
        at
sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
        at
sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
        at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
        at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
        at java.lang.reflect.Field.get(Field.java:379)
        at notebook.kernel.Repl.getModule$1(Repl.scala:203)
        at notebook.kernel.Repl.iws$1(Repl.scala:212)
        at notebook.kernel.Repl.liftedTree1$1(Repl.scala:219)
        at notebook.kernel.Repl.evaluate(Repl.scala:199)
        at
notebook.client.ReplCalculator$$anonfun$15$$anon$1$$anonfun$29.apply(ReplCalculator.scala:378)
        at
notebook.client.ReplCalculator$$anonfun$15$$anon$1$$anonfun$29.apply(ReplCalculator.scala:375)
        at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.NoSuchMethodException:
org.apache.spark.io.SnappyCompressionCodec.<init>(org.apache.spark.SparkConf)
        at java.lang.Class.getConstructor0(Class.java:2892)
        at java.lang.Class.getConstructor(Class.java:1723)
        at
org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:71)
        at
org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:65)
        at
org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
        at
org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:80)
        at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
        at
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
        at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1326)
        at
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:108)
        at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
        at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
        at
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
        at
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
        at
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
        at
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
        at
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
        at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.toJSON(DataFrame.scala:1724)
        at
notebook.front.widgets.DataFrameView$class.notebook$front$widgets$DataFrameView$$json(DataFrame.scala:40)
        at
notebook.front.widgets.DataFrameWidget.notebook$front$widgets$DataFrameView$$json$lzycompute(DataFrame.scala:64)
        at
notebook.front.widgets.DataFrameWidget.notebook$front$widgets$DataFrameView$$json(DataFrame.scala:64)
        at
notebook.front.widgets.DataFrameView$class.$init$(DataFrame.scala:41)
        at notebook.front.widgets.DataFrameWidget.<init>(DataFrame.scala:69)
        at
notebook.front.ExtraLowPriorityRenderers$dataFrameAsTable$.render(renderer.scala:13)
        at
notebook.front.ExtraLowPriorityRenderers$dataFrameAsTable$.render(renderer.scala:12)
        at notebook.front.Widget$.fromRenderer(Widget.scala:32)
        at
$line19.$rendered$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$.<init>(<console>:92)
        at
$line19.$rendered$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$.<clinit>(<console>)
        ... 20 more"



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/an-error-when-I-read-data-from-parquet-tp26277.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: an error when I read data from parquet

Posted by Jorge Sánchez <jo...@gmail.com>.
Hi Alex,

it seems there is a problem with Spark Notebook, I suggest you follow the
issue there (Or you could try Apache Zeppelin or Spark-Shell directly if
notebooks are not a requirement):

https://github.com/andypetrella/spark-notebook/issues/380

Regards.

2016-02-19 12:59 GMT+00:00 AlexModestov <Al...@gmail.com>:

> Hello everybody,
>
> I use Python API and Scala API. I read data without problem with Python
> API:
>
> "sqlContext = SQLContext(sc)
> data_full = sqlContext.read.parquet("---")"
>
> But when I use Scala:
>
> "val sqlContext = new SQLContext(sc)
> val data_full = sqlContext.read.parquet("---")"
>
> I get the error (I use Spark-Notebook may be it is important):
> "java.lang.ExceptionInInitializerError
>         at sun.misc.Unsafe.ensureClassInitialized(Native Method)
>         at
>
> sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
>         at
> sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
>         at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
>         at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
>         at java.lang.reflect.Field.get(Field.java:379)
>         at notebook.kernel.Repl.getModule$1(Repl.scala:203)
>         at notebook.kernel.Repl.iws$1(Repl.scala:212)
>         at notebook.kernel.Repl.liftedTree1$1(Repl.scala:219)
>         at notebook.kernel.Repl.evaluate(Repl.scala:199)
>         at
>
> notebook.client.ReplCalculator$$anonfun$15$$anon$1$$anonfun$29.apply(ReplCalculator.scala:378)
>         at
>
> notebook.client.ReplCalculator$$anonfun$15$$anon$1$$anonfun$29.apply(ReplCalculator.scala:375)
>         at
>
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>         at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>         at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>         at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NoSuchMethodException:
>
> org.apache.spark.io.SnappyCompressionCodec.<init>(org.apache.spark.SparkConf)
>         at java.lang.Class.getConstructor0(Class.java:2892)
>         at java.lang.Class.getConstructor(Class.java:1723)
>         at
>
> org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:71)
>         at
>
> org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:65)
>         at
> org.apache.spark.broadcast.TorrentBroadcast.org
> $apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
>         at
>
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:80)
>         at
>
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>         at
>
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
>         at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1326)
>         at
>
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:108)
>         at
>
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>         at
>
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>         at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
>         at
>
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>         at
>
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
>         at
>
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
>         at
>
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
>         at
>
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
>         at
>
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>         at
>
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>         at org.apache.spark.sql.DataFrame.toJSON(DataFrame.scala:1724)
>         at
>
> notebook.front.widgets.DataFrameView$class.notebook$front$widgets$DataFrameView$$json(DataFrame.scala:40)
>         at
>
> notebook.front.widgets.DataFrameWidget.notebook$front$widgets$DataFrameView$$json$lzycompute(DataFrame.scala:64)
>         at
>
> notebook.front.widgets.DataFrameWidget.notebook$front$widgets$DataFrameView$$json(DataFrame.scala:64)
>         at
> notebook.front.widgets.DataFrameView$class.$init$(DataFrame.scala:41)
>         at
> notebook.front.widgets.DataFrameWidget.<init>(DataFrame.scala:69)
>         at
>
> notebook.front.ExtraLowPriorityRenderers$dataFrameAsTable$.render(renderer.scala:13)
>         at
>
> notebook.front.ExtraLowPriorityRenderers$dataFrameAsTable$.render(renderer.scala:12)
>         at notebook.front.Widget$.fromRenderer(Widget.scala:32)
>         at
>
> $line19.$rendered$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$.<init>(<console>:92)
>         at
>
> $line19.$rendered$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$.<clinit>(<console>)
>         ... 20 more"
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/an-error-when-I-read-data-from-parquet-tp26277.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>