You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Lin, Yunfeng " <yu...@citi.com> on 2016/02/02 21:28:28 UTC

csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Hi guys,

I load spark-csv dependencies in %spark, but not in %sql using apache zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin 0.5.5 with spark 1.5 through

Do you have similar problems?

I am loading spark csv dependencies (https://github.com/databricks/spark-csv)

Using:
%dep
z.load("PATH/commons-csv-1.1.jar")
z.load("PATH/spark-csv_2.10-1.3.0.jar")
z.load("PATH/univocity-parsers-1.5.1.jar:)
z.load("PATH/scala-library-2.10.5.jar")

I am able to load a csv from hdfs using data frame API in spark. It is running perfect fine.
%spark
val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "false") // Use finrst line of all files as header
    .option("inferSchema", "true") // Automatically infer data types
    .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file in HDFS
df.registerTempTable("people")
df.show()

This also work:
%spark
val df2=sqlContext.sql("select * from people")
df2.show()

But this doesn't work....
%sql
select * from people

java.lang.ClassNotFoundException: com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435) at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at org.apache.xbean.asm5.ClassReader.b(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297) at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)



I notice that in the source code of spark interpreter (class: org.apache.zeppelin.spark.ZeppelinContext. There are something different between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark 1.6 is HiveContext. Unfortunately, no matter true or false are set in zeppelin.spark.useHiveContext, %sql just can't find csv dependencies ...


try {
  // Use reflection because of classname returned by queryExecution changes from
  // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
  // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
  Object qe = df.getClass().getMethod("queryExecution").invoke(df);
  Object a = qe.getClass().getMethod("analyzed").invoke(qe);
  scala.collection.Seq seq = (scala.collection.Seq) a.getClass().getMethod("output").invoke(a);

  columns = (List<Attribute>) scala.collection.JavaConverters.seqAsJavaListConverter(seq)
                                                             .asJava();
} catch (NoSuchMethodException | SecurityException | IllegalAccessException
    | IllegalArgumentException | InvocationTargetException e) {
  throw new InterpreterException(e);
}

Yunfeng

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Jonathan Kelly <jo...@gmail.com>.
BTW, this sounds very similar to
https://issues.apache.org/jira/browse/ZEPPELIN-297, which affects %pyspark
and was fixed in Zeppelin 0.5.5.

On Tue, Feb 2, 2016 at 12:32 PM Jonathan Kelly <jo...@gmail.com>
wrote:

> Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> was doing something wrong or what. Glad to know it's not just me!
> Unfortunately I have not yet had the time to look any deeper into it. Would
> you mind filing a JIRA if there isn't already one?
>
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com> wrote:
>
>> Hi guys,
>>
>>
>>
>> I load spark-csv dependencies in %spark, but not in %sql using apache
>> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
>> 0.5.5 with spark 1.5 through
>>
>>
>>
>> Do you have similar problems?
>>
>>
>>
>> I am loading spark csv dependencies (
>> https://github.com/databricks/spark-csv)
>>
>>
>>
>> Using:
>>
>> %dep
>>
>> z.load(“PATH/commons-csv-1.1.jar”)
>>
>> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>>
>> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>>
>> z.load(“PATH/scala-library-2.10.5.jar”)
>>
>>
>>
>> I am able to load a csv from hdfs using data frame API in spark. It is
>> running perfect fine.
>>
>> %spark
>>
>> val df = sqlContext.read
>>
>>     .format("com.databricks.spark.csv")
>>
>>     .option("header", "false") // Use finrst line of all files as header
>>
>>     .option("inferSchema", "true") // Automatically infer data types
>>
>>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
>> in HDFS
>>
>> df.registerTempTable("people")
>>
>> df.show()
>>
>>
>>
>> This also work:
>>
>> %spark
>>
>> val df2=sqlContext.sql(“select * from people”)
>>
>> df2.show()
>>
>>
>>
>> But this doesn’t work….
>>
>> %sql
>>
>> select * from people
>>
>>
>>
>> java.lang.ClassNotFoundException:
>> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
>> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
>> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
>> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
>> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
>> java.lang.Class.forName0(Native Method) at
>> java.lang.Class.forName(Class.java:270) at
>> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
>> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
>> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
>> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
>> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
>> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
>> at
>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at
>> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
>> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
>> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
>> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
>> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>> at
>> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>
>> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>> at
>> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
>> at
>> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
>> at
>> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
>> at
>> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
>> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
>> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
>> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
>> at
>> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
>> at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>>
>>
>>
>>
>> I notice that in the source code of spark interpreter (class:
>> org.apache.zeppelin.spark.ZeppelinContext. There are something different
>> between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark
>> 1.6 is HiveContext. Unfortunately, no matter true or false are set in
>> zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
>>
>>
>>
>>
>>
>> try {
>>   // Use reflection because of classname returned by queryExecution
>> changes from
>>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>>   scala.collection.Seq seq = (scala.collection.Seq)
>> a.getClass().getMethod("output").invoke(a);
>>
>>   columns = (List<Attribute>) scala.collection.JavaConverters.
>> *seqAsJavaListConverter*(seq)
>>                                                              .asJava();
>> } catch (NoSuchMethodException | SecurityException |
>> IllegalAccessException
>>     | IllegalArgumentException | InvocationTargetException e) {
>>   throw new InterpreterException(e);
>> }
>>
>>
>>
>> Yunfeng
>>
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Benjamin Kim <bb...@gmail.com>.
Hi Mina,

I pulled the latest and saw the new artifacts entry section for interpreters. I like that solution better. It’s a nice addition.

Thanks,
Ben


> On Feb 2, 2016, at 7:36 PM, Lin, Yunfeng <yu...@citi.com> wrote:
> 
> Thanks, Mina, I confirm that the workaround works!
> 
> -----Original Message-----
> From: Mina Lee [mailto:minalee@apache.org <ma...@apache.org>] 
> Sent: Tuesday, February 02, 2016 6:22 PM
> To: users
> Cc: dev@zeppelin.incubator.apache.org <ma...@zeppelin.incubator.apache.org>
> Subject: Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6
> 
> @Janathan Zeppelin community plans to release every 3 months so I expect next release will be around end of April.
> 
> @Lin The workaround I can think of right now is adding libraries to ZEPPELIN_CLASSPATH in bin/interpreter.sh To do this,
>  1. place all libraries you need(commons-csv-1.1.jar, spark-csv_2.10-1.
> 3.0.jar, univocity-parsers-1.5.1.jar) under one specific directory. Note that there should be only files not subdirectories. I will name it as "/my/path/spark-csv" for the ease of explanation.
>  2. open bin/interpreter.sh
>  3. Add following code `addJarInDir "/mydir/path/spark-csv"` to line 130.(I assume that you are using zeppelin-0.5.6-incubating)
>  4. restart Zeppelin
> 
> Hope this helps
> 
> 
> On Tue, Feb 2, 2016 at 1:30 PM, Lin, Yunfeng <yu...@citi.com> wrote:
> 
>> Thanks! Is there a possible workaround for 0.5.6 before releasing 0.6.0?
>> 
>> 
>> 
>> *From:* Jonathan Kelly [mailto:jonathakamzn@gmail.com]
>> *Sent:* Tuesday, February 02, 2016 4:19 PM
>> *To:* dev@zeppelin.incubator.apache.org; users
>> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark 
>> 1.6/zeppelin 0.5.6
>> 
>> 
>> 
>> Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was 
>> only very recently, but do you happen to know yet when you plan on 
>> releasing 0.6.0?
>> 
>> 
>> 
>> On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org> wrote:
>> 
>> This issue has been fixed few days ago in master branch.
>> 
>> Here is the PR
>> https://github.com/apache/incubator-zeppelin/pull/673
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apach <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apach>
>> e_incubator-2Dzeppelin_pull_673&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2
>> BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI
>> 5rpP5SoqI8JJ2UWEY&s=hkFOUihaS8xvGHC2psZm_eTCpaIbp9JvGoR0Y6XV2wc&e=>
>> 
>> And related issues filed in JIRA before
>> https://issues.apache.org/jira/browse/ZEPPELIN-194 <https://issues.apache.org/jira/browse/ZEPPELIN-194>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or>
>> g_jira_browse_ZEPPELIN-2D194&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
>> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
>> P5SoqI8JJ2UWEY&s=AvJcgiZOLR_OmESYH5osZSMnYd9_A6DBRo9CxjYsxuY&e=>
>> https://issues.apache.org/jira/browse/ZEPPELIN-381 <https://issues.apache.org/jira/browse/ZEPPELIN-381>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or>
>> g_jira_browse_ZEPPELIN-2D381&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
>> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
>> P5SoqI8JJ2UWEY&s=6iFF2WKrNwEJZ68upi94jfS_2wCVPZHeZE0aTzS4slc&e=>
>> 
>> With the latest master branch, we recommend you to load dependencies 
>> via interpreter setting menu instead of %dep interpreter.
>> 
>> If you want to know how to set dependencies with latest master branch, 
>> please check doc < 
>> https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/m <https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/m>
>> anual/dependencymanagement.html 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__zeppelin.incubat <https://urldefense.proofpoint.com/v2/url?u=https-3A__zeppelin.incubat>
>> or.apache.org <http://or.apache.org/>_docs_0.6.0-2Dincubating-2DSNAPSHOT_manual_dependencymana
>> gement.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M
>> 6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=
>> 8vqDxxyOWAchE7NS0L2BLpav1-tIPMXDwE0ndsqln-E&e=>
>>> 
>> and
>> let me know if it works.
>> 
>> Cheers,
>> Mina
>> 
>> On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yunfeng.lin@citi.com <ma...@citi.com>>
>> wrote:
>> 
>>> I’ve created an issue in jira
>>> 
>>> 
>>> 
>>> https://issues.apache.org/jira/browse/ZEPPELIN-648 <https://issues.apache.org/jira/browse/ZEPPELIN-648>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or>
>> g_jira_browse_ZEPPELIN-2D648&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
>> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
>> P5SoqI8JJ2UWEY&s=hlbpHl6qhaZZD5ZWy6DeFySuZEZt3JysxHFYer2shG8&e=>
>>> 
>>> 
>>> 
>>> *From:* Benjamin Kim [mailto:bbuild11@gmail.com <ma...@gmail.com>]
>>> *Sent:* Tuesday, February 02, 2016 3:34 PM
>>> *To:* users@zeppelin.incubator.apache.org <ma...@zeppelin.incubator.apache.org>
>>> *Cc:* dev@zeppelin.incubator.apache.org <ma...@zeppelin.incubator.apache.org>
>>> *Subject:* Re: csv dependencies loaded in %spark but not %sql in 
>>> spark 1.6/zeppelin 0.5.6
>>> 
>>> 
>>> 
>>> Same here. I want to know the answer too.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jonathakamzn@gmail.com <ma...@gmail.com>>
>>> wrote:
>>> 
>>> 
>>> 
>>> Hey, I just ran into that same exact issue yesterday and wasn't sure 
>>> if I was doing something wrong or what. Glad to know it's not just me!
>>> Unfortunately I have not yet had the time to look any deeper into it.
>> Would
>>> you mind filing a JIRA if there isn't already one?
>>> 
>>> 
>>> 
>>> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yunfeng.lin@citi.com <ma...@citi.com>>
>> wrote:
>>> 
>>> Hi guys,
>>> 
>>> 
>>> 
>>> I load spark-csv dependencies in %spark, but not in %sql using 
>>> apache zeppelin 0.5.6 with spark 1.6.0. Everything is working fine 
>>> in zeppelin
>>> 0.5.5 with spark 1.5 through
>>> 
>>> 
>>> 
>>> Do you have similar problems?
>>> 
>>> 
>>> 
>>> I am loading spark csv dependencies ( 
>>> https://github.com/databricks/spark-csv <https://github.com/databricks/spark-csv>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_datab <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_datab>
>> ricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk
>> 5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2U
>> WEY&s=gsqKqOCjZHR_BW221s_VLo9A7XweGQR50Me_YKEH0UM&e=>
>>> <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databr <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databr>
>> icks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5
>> P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP
>> -o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=
>>> 
>>> )
>>> 
>>> 
>>> 
>>> Using:
>>> 
>>> %dep
>>> 
>>> z.load(“PATH/commons-csv-1.1.jar”)
>>> 
>>> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>>> 
>>> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>>> 
>>> z.load(“PATH/scala-library-2.10.5.jar”)
>>> 
>>> 
>>> 
>>> I am able to load a csv from hdfs using data frame API in spark. It 
>>> is running perfect fine.
>>> 
>>> %spark
>>> 
>>> val df = sqlContext.read
>>> 
>>>    .format("com.databricks.spark.csv")
>>> 
>>>    .option("header", "false") // Use finrst line of all files as 
>>> header
>>> 
>>>    .option("inferSchema", "true") // Automatically infer data types
>>> 
>>>    .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt <hdfs://sd-6f48-7fe6:8020/tmp/people.txt>")   // this is a file
>>> in HDFS
>>> 
>>> df.registerTempTable("people")
>>> 
>>> df.show()
>>> 
>>> 
>>> 
>>> This also work:
>>> 
>>> %spark
>>> 
>>> val df2=sqlContext.sql(“select * from people”)
>>> 
>>> df2.show()
>>> 
>>> 
>>> 
>>> But this doesn’t work….
>>> 
>>> %sql
>>> 
>>> select * from people
>>> 
>>> 
>>> 
>>> java.lang.ClassNotFoundException:
>>> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
>>> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
>>> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
>>> java.security.AccessController.doPrivileged(Native Method) at
>>> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
>>> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
>>> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
>>> java.lang.Class.forName0(Native Method) at
>>> java.lang.Class.forName(Class.java:270) at
>>> 
>> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(Closu
>> reCleaner.scala:435)
>>> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at 
>>> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at 
>>> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at 
>>> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
>>> 
>> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCl
>> eaner.scala:84)
>>> at
>>> 
>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCle
>> aner$$clean(ClosureCleaner.scala:187)
>>> at 
>>> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122
>>> )
>> at
>>> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:70
>>> 7) at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:70
>>> 6) at
>>> 
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.sc
>> ala:150)
>>> at
>>> 
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.sc
>> ala:111)
>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
>>> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
>>> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) 
>>> at
>>> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104
>>> ) at
>>> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152
>>> ) at
>>> 
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
>> $4.apply(DataSourceStrategy.scala:64)
>>> at
>>> 
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
>> $4.apply(DataSourceStrategy.scala:64)
>>> at
>>> 
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
>> $pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
>>> at
>>> 
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
>> $pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
>>> at
>>> 
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFi
>> lterProjectRaw(DataSourceStrategy.scala:352)
>>> at
>>> 
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFi
>> lterProject(DataSourceStrategy.scala:269)
>>> at
>>> 
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(D
>> ataSourceStrategy.scala:60)
>>> at
>>> 
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
>> ueryPlanner.scala:58)
>>> at
>>> 
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
>> ueryPlanner.scala:58)
>>> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
>>> 
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.
>> scala:59)
>>> at
>>> 
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPla
>> nner.scala:54)
>>> at
>>> 
>> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(S
>> parkStrategies.scala:349)
>>> at
>>> 
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
>> ueryPlanner.scala:58)
>>> at
>>> 
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
>> ueryPlanner.scala:58)
>>> 
>>> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
>>> 
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.
>> scala:59)
>>> at
>>> 
>> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(Que
>> ryExecution.scala:47)
>>> at
>>> 
>> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution
>> .scala:45)
>>> at
>>> 
>> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(
>> QueryExecution.scala:52)
>>> at
>>> 
>> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecut
>> ion.scala:52)
>>> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) 
>>> at
>>> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
>>> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>>> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> ava:57)
>>> at
>>> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> orImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606) at
>> org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:
>> 297)
>>> at
>>> 
>> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterp
>> reter.java:144)
>>> at
>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(Class
>> loaderInterpreter.java:57)
>>> at
>>> 
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpen
>> Interpreter.java:93)
>>> at
>>> 
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$Interpr
>> etJob.jobRun(RemoteInterpreterServer.java:300)
>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java
>>> :134)
>>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471
>> )
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>> ccess$201(ScheduledThreadPoolExecutor.java:178)
>>> at
>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>> un(ScheduledThreadPoolExecutor.java:292)
>>> at
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>> ava:1145)
>>> at
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>> java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> I notice that in the source code of spark interpreter (class:
>>> org.apache.zeppelin.spark.ZeppelinContext. There are something 
>>> different between spark 1.5 and spark 1.6. Spark 1.5 is using 
>>> SQLContext while
>> Spark
>>> 1.6 is HiveContext. Unfortunately, no matter true or false are set 
>>> in zeppelin.spark.useHiveContext, %sql just can’t find csv 
>>> dependencies …
>>> 
>>> 
>>> 
>>> 
>>> 
>>> try {
>>>  // Use reflection because of classname returned by queryExecution 
>>> changes from
>>>  // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>>>  // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>>>  Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>>>  Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>>>  scala.collection.Seq seq = (scala.collection.Seq) 
>>> a.getClass().getMethod("output").invoke(a);
>>> 
>>>  columns = (List<Attribute>) scala.collection.JavaConverters.
>>> *seqAsJavaListConverter*(seq)
>>> 
>>> .asJava(); } catch (NoSuchMethodException | SecurityException | 
>>> IllegalAccessException
>>>    | IllegalArgumentException | InvocationTargetException e) {
>>>  throw new InterpreterException(e); }
>>> 
>>> 
>>> 
>>> Yunfeng


RE: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by "Lin, Yunfeng " <yu...@citi.com>.
Thanks, Mina, I confirm that the workaround works!

-----Original Message-----
From: Mina Lee [mailto:minalee@apache.org] 
Sent: Tuesday, February 02, 2016 6:22 PM
To: users
Cc: dev@zeppelin.incubator.apache.org
Subject: Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

@Janathan Zeppelin community plans to release every 3 months so I expect next release will be around end of April.

@Lin The workaround I can think of right now is adding libraries to ZEPPELIN_CLASSPATH in bin/interpreter.sh To do this,
  1. place all libraries you need(commons-csv-1.1.jar, spark-csv_2.10-1.
3.0.jar, univocity-parsers-1.5.1.jar) under one specific directory. Note that there should be only files not subdirectories. I will name it as "/my/path/spark-csv" for the ease of explanation.
  2. open bin/interpreter.sh
  3. Add following code `addJarInDir "/mydir/path/spark-csv"` to line 130.(I assume that you are using zeppelin-0.5.6-incubating)
  4. restart Zeppelin

Hope this helps


On Tue, Feb 2, 2016 at 1:30 PM, Lin, Yunfeng <yu...@citi.com> wrote:

> Thanks! Is there a possible workaround for 0.5.6 before releasing 0.6.0?
>
>
>
> *From:* Jonathan Kelly [mailto:jonathakamzn@gmail.com]
> *Sent:* Tuesday, February 02, 2016 4:19 PM
> *To:* dev@zeppelin.incubator.apache.org; users
> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark 
> 1.6/zeppelin 0.5.6
>
>
>
> Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was 
> only very recently, but do you happen to know yet when you plan on 
> releasing 0.6.0?
>
>
>
> On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org> wrote:
>
> This issue has been fixed few days ago in master branch.
>
> Here is the PR
> https://github.com/apache/incubator-zeppelin/pull/673
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apach
> e_incubator-2Dzeppelin_pull_673&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2
> BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI
> 5rpP5SoqI8JJ2UWEY&s=hkFOUihaS8xvGHC2psZm_eTCpaIbp9JvGoR0Y6XV2wc&e=>
>
> And related issues filed in JIRA before
> https://issues.apache.org/jira/browse/ZEPPELIN-194
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or
> g_jira_browse_ZEPPELIN-2D194&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
> P5SoqI8JJ2UWEY&s=AvJcgiZOLR_OmESYH5osZSMnYd9_A6DBRo9CxjYsxuY&e=>
> https://issues.apache.org/jira/browse/ZEPPELIN-381
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or
> g_jira_browse_ZEPPELIN-2D381&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
> P5SoqI8JJ2UWEY&s=6iFF2WKrNwEJZ68upi94jfS_2wCVPZHeZE0aTzS4slc&e=>
>
> With the latest master branch, we recommend you to load dependencies 
> via interpreter setting menu instead of %dep interpreter.
>
> If you want to know how to set dependencies with latest master branch, 
> please check doc < 
> https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/m
> anual/dependencymanagement.html 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__zeppelin.incubat
> or.apache.org_docs_0.6.0-2Dincubating-2DSNAPSHOT_manual_dependencymana
> gement.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M
> 6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=
> 8vqDxxyOWAchE7NS0L2BLpav1-tIPMXDwE0ndsqln-E&e=>
> >
> and
> let me know if it works.
>
> Cheers,
> Mina
>
> On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com>
> wrote:
>
> > I’ve created an issue in jira
> >
> >
> >
> > https://issues.apache.org/jira/browse/ZEPPELIN-648
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or
> g_jira_browse_ZEPPELIN-2D648&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
> P5SoqI8JJ2UWEY&s=hlbpHl6qhaZZD5ZWy6DeFySuZEZt3JysxHFYer2shG8&e=>
> >
> >
> >
> > *From:* Benjamin Kim [mailto:bbuild11@gmail.com]
> > *Sent:* Tuesday, February 02, 2016 3:34 PM
> > *To:* users@zeppelin.incubator.apache.org
> > *Cc:* dev@zeppelin.incubator.apache.org
> > *Subject:* Re: csv dependencies loaded in %spark but not %sql in 
> > spark 1.6/zeppelin 0.5.6
> >
> >
> >
> > Same here. I want to know the answer too.
> >
> >
> >
> >
> >
> > On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>
> > wrote:
> >
> >
> >
> > Hey, I just ran into that same exact issue yesterday and wasn't sure 
> > if I was doing something wrong or what. Glad to know it's not just me!
> > Unfortunately I have not yet had the time to look any deeper into it.
> Would
> > you mind filing a JIRA if there isn't already one?
> >
> >
> >
> > On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>
> wrote:
> >
> > Hi guys,
> >
> >
> >
> > I load spark-csv dependencies in %spark, but not in %sql using 
> > apache zeppelin 0.5.6 with spark 1.6.0. Everything is working fine 
> > in zeppelin
> > 0.5.5 with spark 1.5 through
> >
> >
> >
> > Do you have similar problems?
> >
> >
> >
> > I am loading spark csv dependencies ( 
> > https://github.com/databricks/spark-csv
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_datab
> ricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk
> 5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2U
> WEY&s=gsqKqOCjZHR_BW221s_VLo9A7XweGQR50Me_YKEH0UM&e=>
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databr
> icks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5
> P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP
> -o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=
> >
> > )
> >
> >
> >
> > Using:
> >
> > %dep
> >
> > z.load(“PATH/commons-csv-1.1.jar”)
> >
> > z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> >
> > z.load(“PATH/univocity-parsers-1.5.1.jar:)
> >
> > z.load(“PATH/scala-library-2.10.5.jar”)
> >
> >
> >
> > I am able to load a csv from hdfs using data frame API in spark. It 
> > is running perfect fine.
> >
> > %spark
> >
> > val df = sqlContext.read
> >
> >     .format("com.databricks.spark.csv")
> >
> >     .option("header", "false") // Use finrst line of all files as 
> > header
> >
> >     .option("inferSchema", "true") // Automatically infer data types
> >
> >     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> > in HDFS
> >
> > df.registerTempTable("people")
> >
> > df.show()
> >
> >
> >
> > This also work:
> >
> > %spark
> >
> > val df2=sqlContext.sql(“select * from people”)
> >
> > df2.show()
> >
> >
> >
> > But this doesn’t work….
> >
> > %sql
> >
> > select * from people
> >
> >
> >
> > java.lang.ClassNotFoundException:
> > com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> > java.security.AccessController.doPrivileged(Native Method) at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> > java.lang.Class.forName0(Native Method) at
> > java.lang.Class.forName(Class.java:270) at
> >
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(Closu
> reCleaner.scala:435)
> > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at 
> > org.apache.xbean.asm5.ClassReader.b(Unknown Source) at 
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at 
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> >
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCl
> eaner.scala:84)
> > at
> >
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCle
> aner$$clean(ClosureCleaner.scala:187)
> > at 
> > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122
> > )
> at
> > org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:70
> > 7) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:70
> > 6) at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.sc
> ala:150)
> > at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.sc
> ala:111)
> > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> > org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> > com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) 
> > at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104
> > ) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152
> > ) at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
> $4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
> $4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
> $pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
> $pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFi
> lterProjectRaw(DataSourceStrategy.scala:352)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFi
> lterProject(DataSourceStrategy.scala:269)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(D
> ataSourceStrategy.scala:60)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
> ueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
> ueryPlanner.scala:58)
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.
> scala:59)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPla
> nner.scala:54)
> > at
> >
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(S
> parkStrategies.scala:349)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
> ueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
> ueryPlanner.scala:58)
> >
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.
> scala:59)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(Que
> ryExecution.scala:47)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution
> .scala:45)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(
> QueryExecution.scala:52)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecut
> ion.scala:52)
> > at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) 
> > at
> > org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> > org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at 
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> ava:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> orImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:
> 297)
> > at
> >
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterp
> reter.java:144)
> > at
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(Class
> loaderInterpreter.java:57)
> > at
> >
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpen
> Interpreter.java:93)
> > at
> >
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$Interpr
> etJob.jobRun(RemoteInterpreterServer.java:300)
> > at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> > org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java
> > :134)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471
> )
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> >
> >
> >
> >
> > I notice that in the source code of spark interpreter (class:
> > org.apache.zeppelin.spark.ZeppelinContext. There are something 
> > different between spark 1.5 and spark 1.6. Spark 1.5 is using 
> > SQLContext while
> Spark
> > 1.6 is HiveContext. Unfortunately, no matter true or false are set 
> > in zeppelin.spark.useHiveContext, %sql just can’t find csv 
> > dependencies …
> >
> >
> >
> >
> >
> > try {
> >   // Use reflection because of classname returned by queryExecution 
> > changes from
> >   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
> >   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
> >   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
> >   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
> >   scala.collection.Seq seq = (scala.collection.Seq) 
> > a.getClass().getMethod("output").invoke(a);
> >
> >   columns = (List<Attribute>) scala.collection.JavaConverters.
> > *seqAsJavaListConverter*(seq)
> >                                                              
> > .asJava(); } catch (NoSuchMethodException | SecurityException | 
> > IllegalAccessException
> >     | IllegalArgumentException | InvocationTargetException e) {
> >   throw new InterpreterException(e); }
> >
> >
> >
> > Yunfeng
> >
> >
> >
>
>

RE: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by "Lin, Yunfeng " <yu...@citi.com>.
Thanks, Mina, I confirm that the workaround works!

-----Original Message-----
From: Mina Lee [mailto:minalee@apache.org] 
Sent: Tuesday, February 02, 2016 6:22 PM
To: users
Cc: dev@zeppelin.incubator.apache.org
Subject: Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

@Janathan Zeppelin community plans to release every 3 months so I expect next release will be around end of April.

@Lin The workaround I can think of right now is adding libraries to ZEPPELIN_CLASSPATH in bin/interpreter.sh To do this,
  1. place all libraries you need(commons-csv-1.1.jar, spark-csv_2.10-1.
3.0.jar, univocity-parsers-1.5.1.jar) under one specific directory. Note that there should be only files not subdirectories. I will name it as "/my/path/spark-csv" for the ease of explanation.
  2. open bin/interpreter.sh
  3. Add following code `addJarInDir "/mydir/path/spark-csv"` to line 130.(I assume that you are using zeppelin-0.5.6-incubating)
  4. restart Zeppelin

Hope this helps


On Tue, Feb 2, 2016 at 1:30 PM, Lin, Yunfeng <yu...@citi.com> wrote:

> Thanks! Is there a possible workaround for 0.5.6 before releasing 0.6.0?
>
>
>
> *From:* Jonathan Kelly [mailto:jonathakamzn@gmail.com]
> *Sent:* Tuesday, February 02, 2016 4:19 PM
> *To:* dev@zeppelin.incubator.apache.org; users
> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark 
> 1.6/zeppelin 0.5.6
>
>
>
> Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was 
> only very recently, but do you happen to know yet when you plan on 
> releasing 0.6.0?
>
>
>
> On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org> wrote:
>
> This issue has been fixed few days ago in master branch.
>
> Here is the PR
> https://github.com/apache/incubator-zeppelin/pull/673
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apach
> e_incubator-2Dzeppelin_pull_673&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2
> BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI
> 5rpP5SoqI8JJ2UWEY&s=hkFOUihaS8xvGHC2psZm_eTCpaIbp9JvGoR0Y6XV2wc&e=>
>
> And related issues filed in JIRA before
> https://issues.apache.org/jira/browse/ZEPPELIN-194
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or
> g_jira_browse_ZEPPELIN-2D194&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
> P5SoqI8JJ2UWEY&s=AvJcgiZOLR_OmESYH5osZSMnYd9_A6DBRo9CxjYsxuY&e=>
> https://issues.apache.org/jira/browse/ZEPPELIN-381
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or
> g_jira_browse_ZEPPELIN-2D381&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
> P5SoqI8JJ2UWEY&s=6iFF2WKrNwEJZ68upi94jfS_2wCVPZHeZE0aTzS4slc&e=>
>
> With the latest master branch, we recommend you to load dependencies 
> via interpreter setting menu instead of %dep interpreter.
>
> If you want to know how to set dependencies with latest master branch, 
> please check doc < 
> https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/m
> anual/dependencymanagement.html 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__zeppelin.incubat
> or.apache.org_docs_0.6.0-2Dincubating-2DSNAPSHOT_manual_dependencymana
> gement.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M
> 6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=
> 8vqDxxyOWAchE7NS0L2BLpav1-tIPMXDwE0ndsqln-E&e=>
> >
> and
> let me know if it works.
>
> Cheers,
> Mina
>
> On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com>
> wrote:
>
> > I’ve created an issue in jira
> >
> >
> >
> > https://issues.apache.org/jira/browse/ZEPPELIN-648
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or
> g_jira_browse_ZEPPELIN-2D648&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXW
> a66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rp
> P5SoqI8JJ2UWEY&s=hlbpHl6qhaZZD5ZWy6DeFySuZEZt3JysxHFYer2shG8&e=>
> >
> >
> >
> > *From:* Benjamin Kim [mailto:bbuild11@gmail.com]
> > *Sent:* Tuesday, February 02, 2016 3:34 PM
> > *To:* users@zeppelin.incubator.apache.org
> > *Cc:* dev@zeppelin.incubator.apache.org
> > *Subject:* Re: csv dependencies loaded in %spark but not %sql in 
> > spark 1.6/zeppelin 0.5.6
> >
> >
> >
> > Same here. I want to know the answer too.
> >
> >
> >
> >
> >
> > On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>
> > wrote:
> >
> >
> >
> > Hey, I just ran into that same exact issue yesterday and wasn't sure 
> > if I was doing something wrong or what. Glad to know it's not just me!
> > Unfortunately I have not yet had the time to look any deeper into it.
> Would
> > you mind filing a JIRA if there isn't already one?
> >
> >
> >
> > On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>
> wrote:
> >
> > Hi guys,
> >
> >
> >
> > I load spark-csv dependencies in %spark, but not in %sql using 
> > apache zeppelin 0.5.6 with spark 1.6.0. Everything is working fine 
> > in zeppelin
> > 0.5.5 with spark 1.5 through
> >
> >
> >
> > Do you have similar problems?
> >
> >
> >
> > I am loading spark csv dependencies ( 
> > https://github.com/databricks/spark-csv
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_datab
> ricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk
> 5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2U
> WEY&s=gsqKqOCjZHR_BW221s_VLo9A7XweGQR50Me_YKEH0UM&e=>
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databr
> icks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5
> P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP
> -o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=
> >
> > )
> >
> >
> >
> > Using:
> >
> > %dep
> >
> > z.load(“PATH/commons-csv-1.1.jar”)
> >
> > z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> >
> > z.load(“PATH/univocity-parsers-1.5.1.jar:)
> >
> > z.load(“PATH/scala-library-2.10.5.jar”)
> >
> >
> >
> > I am able to load a csv from hdfs using data frame API in spark. It 
> > is running perfect fine.
> >
> > %spark
> >
> > val df = sqlContext.read
> >
> >     .format("com.databricks.spark.csv")
> >
> >     .option("header", "false") // Use finrst line of all files as 
> > header
> >
> >     .option("inferSchema", "true") // Automatically infer data types
> >
> >     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> > in HDFS
> >
> > df.registerTempTable("people")
> >
> > df.show()
> >
> >
> >
> > This also work:
> >
> > %spark
> >
> > val df2=sqlContext.sql(“select * from people”)
> >
> > df2.show()
> >
> >
> >
> > But this doesn’t work….
> >
> > %sql
> >
> > select * from people
> >
> >
> >
> > java.lang.ClassNotFoundException:
> > com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> > java.security.AccessController.doPrivileged(Native Method) at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> > java.lang.Class.forName0(Native Method) at
> > java.lang.Class.forName(Class.java:270) at
> >
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(Closu
> reCleaner.scala:435)
> > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at 
> > org.apache.xbean.asm5.ClassReader.b(Unknown Source) at 
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at 
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> >
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCl
> eaner.scala:84)
> > at
> >
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCle
> aner$$clean(ClosureCleaner.scala:187)
> > at 
> > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122
> > )
> at
> > org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:70
> > 7) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:70
> > 6) at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.sc
> ala:150)
> > at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.sc
> ala:111)
> > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> > org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> > com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) 
> > at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104
> > ) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152
> > ) at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
> $4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
> $4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
> $pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun
> $pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFi
> lterProjectRaw(DataSourceStrategy.scala:352)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFi
> lterProject(DataSourceStrategy.scala:269)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(D
> ataSourceStrategy.scala:60)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
> ueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
> ueryPlanner.scala:58)
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.
> scala:59)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPla
> nner.scala:54)
> > at
> >
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(S
> parkStrategies.scala:349)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
> ueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Q
> ueryPlanner.scala:58)
> >
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.
> scala:59)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(Que
> ryExecution.scala:47)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution
> .scala:45)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(
> QueryExecution.scala:52)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecut
> ion.scala:52)
> > at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) 
> > at
> > org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> > org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at 
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> ava:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> orImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:
> 297)
> > at
> >
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterp
> reter.java:144)
> > at
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(Class
> loaderInterpreter.java:57)
> > at
> >
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpen
> Interpreter.java:93)
> > at
> >
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$Interpr
> etJob.jobRun(RemoteInterpreterServer.java:300)
> > at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> > org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java
> > :134)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471
> )
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
> ccess$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
> un(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> >
> >
> >
> >
> > I notice that in the source code of spark interpreter (class:
> > org.apache.zeppelin.spark.ZeppelinContext. There are something 
> > different between spark 1.5 and spark 1.6. Spark 1.5 is using 
> > SQLContext while
> Spark
> > 1.6 is HiveContext. Unfortunately, no matter true or false are set 
> > in zeppelin.spark.useHiveContext, %sql just can’t find csv 
> > dependencies …
> >
> >
> >
> >
> >
> > try {
> >   // Use reflection because of classname returned by queryExecution 
> > changes from
> >   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
> >   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
> >   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
> >   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
> >   scala.collection.Seq seq = (scala.collection.Seq) 
> > a.getClass().getMethod("output").invoke(a);
> >
> >   columns = (List<Attribute>) scala.collection.JavaConverters.
> > *seqAsJavaListConverter*(seq)
> >                                                              
> > .asJava(); } catch (NoSuchMethodException | SecurityException | 
> > IllegalAccessException
> >     | IllegalArgumentException | InvocationTargetException e) {
> >   throw new InterpreterException(e); }
> >
> >
> >
> > Yunfeng
> >
> >
> >
>
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Mina Lee <mi...@apache.org>.
@Janathan Zeppelin community plans to release every 3 months so I expect
next release will be around end of April.

@Lin The workaround I can think of right now is adding libraries to
ZEPPELIN_CLASSPATH in bin/interpreter.sh
To do this,
  1. place all libraries you need(commons-csv-1.1.jar, spark-csv_2.10-1.
3.0.jar, univocity-parsers-1.5.1.jar) under one specific directory. Note
that there should be only files not subdirectories. I will name it as
"/my/path/spark-csv" for the ease of explanation.
  2. open bin/interpreter.sh
  3. Add following code `addJarInDir "/mydir/path/spark-csv"` to line
130.(I assume that you are using zeppelin-0.5.6-incubating)
  4. restart Zeppelin

Hope this helps


On Tue, Feb 2, 2016 at 1:30 PM, Lin, Yunfeng <yu...@citi.com> wrote:

> Thanks! Is there a possible workaround for 0.5.6 before releasing 0.6.0?
>
>
>
> *From:* Jonathan Kelly [mailto:jonathakamzn@gmail.com]
> *Sent:* Tuesday, February 02, 2016 4:19 PM
> *To:* dev@zeppelin.incubator.apache.org; users
> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> 1.6/zeppelin 0.5.6
>
>
>
> Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was only
> very recently, but do you happen to know yet when you plan on releasing
> 0.6.0?
>
>
>
> On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org> wrote:
>
> This issue has been fixed few days ago in master branch.
>
> Here is the PR
> https://github.com/apache/incubator-zeppelin/pull/673
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dzeppelin_pull_673&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=hkFOUihaS8xvGHC2psZm_eTCpaIbp9JvGoR0Y6XV2wc&e=>
>
> And related issues filed in JIRA before
> https://issues.apache.org/jira/browse/ZEPPELIN-194
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D194&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=AvJcgiZOLR_OmESYH5osZSMnYd9_A6DBRo9CxjYsxuY&e=>
> https://issues.apache.org/jira/browse/ZEPPELIN-381
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D381&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=6iFF2WKrNwEJZ68upi94jfS_2wCVPZHeZE0aTzS4slc&e=>
>
> With the latest master branch, we recommend you to load dependencies via
> interpreter setting menu instead of %dep interpreter.
>
> If you want to know how to set dependencies with latest master branch,
> please check doc
> <
> https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__zeppelin.incubator.apache.org_docs_0.6.0-2Dincubating-2DSNAPSHOT_manual_dependencymanagement.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=8vqDxxyOWAchE7NS0L2BLpav1-tIPMXDwE0ndsqln-E&e=>
> >
> and
> let me know if it works.
>
> Cheers,
> Mina
>
> On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com>
> wrote:
>
> > I’ve created an issue in jira
> >
> >
> >
> > https://issues.apache.org/jira/browse/ZEPPELIN-648
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D648&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=hlbpHl6qhaZZD5ZWy6DeFySuZEZt3JysxHFYer2shG8&e=>
> >
> >
> >
> > *From:* Benjamin Kim [mailto:bbuild11@gmail.com]
> > *Sent:* Tuesday, February 02, 2016 3:34 PM
> > *To:* users@zeppelin.incubator.apache.org
> > *Cc:* dev@zeppelin.incubator.apache.org
> > *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> > 1.6/zeppelin 0.5.6
> >
> >
> >
> > Same here. I want to know the answer too.
> >
> >
> >
> >
> >
> > On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>
> > wrote:
> >
> >
> >
> > Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> > was doing something wrong or what. Glad to know it's not just me!
> > Unfortunately I have not yet had the time to look any deeper into it.
> Would
> > you mind filing a JIRA if there isn't already one?
> >
> >
> >
> > On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>
> wrote:
> >
> > Hi guys,
> >
> >
> >
> > I load spark-csv dependencies in %spark, but not in %sql using apache
> > zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> > 0.5.5 with spark 1.5 through
> >
> >
> >
> > Do you have similar problems?
> >
> >
> >
> > I am loading spark csv dependencies (
> > https://github.com/databricks/spark-csv
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=gsqKqOCjZHR_BW221s_VLo9A7XweGQR50Me_YKEH0UM&e=>
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=
> >
> > )
> >
> >
> >
> > Using:
> >
> > %dep
> >
> > z.load(“PATH/commons-csv-1.1.jar”)
> >
> > z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> >
> > z.load(“PATH/univocity-parsers-1.5.1.jar:)
> >
> > z.load(“PATH/scala-library-2.10.5.jar”)
> >
> >
> >
> > I am able to load a csv from hdfs using data frame API in spark. It is
> > running perfect fine.
> >
> > %spark
> >
> > val df = sqlContext.read
> >
> >     .format("com.databricks.spark.csv")
> >
> >     .option("header", "false") // Use finrst line of all files as header
> >
> >     .option("inferSchema", "true") // Automatically infer data types
> >
> >     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> > in HDFS
> >
> > df.registerTempTable("people")
> >
> > df.show()
> >
> >
> >
> > This also work:
> >
> > %spark
> >
> > val df2=sqlContext.sql(“select * from people”)
> >
> > df2.show()
> >
> >
> >
> > But this doesn’t work….
> >
> > %sql
> >
> > select * from people
> >
> >
> >
> > java.lang.ClassNotFoundException:
> > com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> > java.security.AccessController.doPrivileged(Native Method) at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> > java.lang.Class.forName0(Native Method) at
> > java.lang.Class.forName(Class.java:270) at
> >
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> >
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> > at
> >
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
> at
> > org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> > at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> > org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> > com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> > at
> >
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> >
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> > at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> > org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> > org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> > at
> >
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> > at
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> > at
> >
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> > at
> >
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> > at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> > org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> >
> >
> >
> >
> > I notice that in the source code of spark interpreter (class:
> > org.apache.zeppelin.spark.ZeppelinContext. There are something different
> > between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while
> Spark
> > 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> > zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
> >
> >
> >
> >
> >
> > try {
> >   // Use reflection because of classname returned by queryExecution
> > changes from
> >   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
> >   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
> >   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
> >   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
> >   scala.collection.Seq seq = (scala.collection.Seq)
> > a.getClass().getMethod("output").invoke(a);
> >
> >   columns = (List<Attribute>) scala.collection.JavaConverters.
> > *seqAsJavaListConverter*(seq)
> >                                                              .asJava();
> > } catch (NoSuchMethodException | SecurityException |
> > IllegalAccessException
> >     | IllegalArgumentException | InvocationTargetException e) {
> >   throw new InterpreterException(e);
> > }
> >
> >
> >
> > Yunfeng
> >
> >
> >
>
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Mina Lee <mi...@apache.org>.
@Janathan Zeppelin community plans to release every 3 months so I expect
next release will be around end of April.

@Lin The workaround I can think of right now is adding libraries to
ZEPPELIN_CLASSPATH in bin/interpreter.sh
To do this,
  1. place all libraries you need(commons-csv-1.1.jar, spark-csv_2.10-1.
3.0.jar, univocity-parsers-1.5.1.jar) under one specific directory. Note
that there should be only files not subdirectories. I will name it as
"/my/path/spark-csv" for the ease of explanation.
  2. open bin/interpreter.sh
  3. Add following code `addJarInDir "/mydir/path/spark-csv"` to line
130.(I assume that you are using zeppelin-0.5.6-incubating)
  4. restart Zeppelin

Hope this helps


On Tue, Feb 2, 2016 at 1:30 PM, Lin, Yunfeng <yu...@citi.com> wrote:

> Thanks! Is there a possible workaround for 0.5.6 before releasing 0.6.0?
>
>
>
> *From:* Jonathan Kelly [mailto:jonathakamzn@gmail.com]
> *Sent:* Tuesday, February 02, 2016 4:19 PM
> *To:* dev@zeppelin.incubator.apache.org; users
> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> 1.6/zeppelin 0.5.6
>
>
>
> Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was only
> very recently, but do you happen to know yet when you plan on releasing
> 0.6.0?
>
>
>
> On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org> wrote:
>
> This issue has been fixed few days ago in master branch.
>
> Here is the PR
> https://github.com/apache/incubator-zeppelin/pull/673
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dzeppelin_pull_673&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=hkFOUihaS8xvGHC2psZm_eTCpaIbp9JvGoR0Y6XV2wc&e=>
>
> And related issues filed in JIRA before
> https://issues.apache.org/jira/browse/ZEPPELIN-194
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D194&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=AvJcgiZOLR_OmESYH5osZSMnYd9_A6DBRo9CxjYsxuY&e=>
> https://issues.apache.org/jira/browse/ZEPPELIN-381
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D381&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=6iFF2WKrNwEJZ68upi94jfS_2wCVPZHeZE0aTzS4slc&e=>
>
> With the latest master branch, we recommend you to load dependencies via
> interpreter setting menu instead of %dep interpreter.
>
> If you want to know how to set dependencies with latest master branch,
> please check doc
> <
> https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__zeppelin.incubator.apache.org_docs_0.6.0-2Dincubating-2DSNAPSHOT_manual_dependencymanagement.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=8vqDxxyOWAchE7NS0L2BLpav1-tIPMXDwE0ndsqln-E&e=>
> >
> and
> let me know if it works.
>
> Cheers,
> Mina
>
> On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com>
> wrote:
>
> > I’ve created an issue in jira
> >
> >
> >
> > https://issues.apache.org/jira/browse/ZEPPELIN-648
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D648&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=hlbpHl6qhaZZD5ZWy6DeFySuZEZt3JysxHFYer2shG8&e=>
> >
> >
> >
> > *From:* Benjamin Kim [mailto:bbuild11@gmail.com]
> > *Sent:* Tuesday, February 02, 2016 3:34 PM
> > *To:* users@zeppelin.incubator.apache.org
> > *Cc:* dev@zeppelin.incubator.apache.org
> > *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> > 1.6/zeppelin 0.5.6
> >
> >
> >
> > Same here. I want to know the answer too.
> >
> >
> >
> >
> >
> > On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>
> > wrote:
> >
> >
> >
> > Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> > was doing something wrong or what. Glad to know it's not just me!
> > Unfortunately I have not yet had the time to look any deeper into it.
> Would
> > you mind filing a JIRA if there isn't already one?
> >
> >
> >
> > On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>
> wrote:
> >
> > Hi guys,
> >
> >
> >
> > I load spark-csv dependencies in %spark, but not in %sql using apache
> > zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> > 0.5.5 with spark 1.5 through
> >
> >
> >
> > Do you have similar problems?
> >
> >
> >
> > I am loading spark csv dependencies (
> > https://github.com/databricks/spark-csv
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=gsqKqOCjZHR_BW221s_VLo9A7XweGQR50Me_YKEH0UM&e=>
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=
> >
> > )
> >
> >
> >
> > Using:
> >
> > %dep
> >
> > z.load(“PATH/commons-csv-1.1.jar”)
> >
> > z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> >
> > z.load(“PATH/univocity-parsers-1.5.1.jar:)
> >
> > z.load(“PATH/scala-library-2.10.5.jar”)
> >
> >
> >
> > I am able to load a csv from hdfs using data frame API in spark. It is
> > running perfect fine.
> >
> > %spark
> >
> > val df = sqlContext.read
> >
> >     .format("com.databricks.spark.csv")
> >
> >     .option("header", "false") // Use finrst line of all files as header
> >
> >     .option("inferSchema", "true") // Automatically infer data types
> >
> >     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> > in HDFS
> >
> > df.registerTempTable("people")
> >
> > df.show()
> >
> >
> >
> > This also work:
> >
> > %spark
> >
> > val df2=sqlContext.sql(“select * from people”)
> >
> > df2.show()
> >
> >
> >
> > But this doesn’t work….
> >
> > %sql
> >
> > select * from people
> >
> >
> >
> > java.lang.ClassNotFoundException:
> > com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> > java.security.AccessController.doPrivileged(Native Method) at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> > java.lang.Class.forName0(Native Method) at
> > java.lang.Class.forName(Class.java:270) at
> >
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> >
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> > at
> >
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
> at
> > org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> > at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> > org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> > com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> > at
> >
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> >
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> > at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> > org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> > org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> > at
> >
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> > at
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> > at
> >
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> > at
> >
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> > at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> > org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> >
> >
> >
> >
> > I notice that in the source code of spark interpreter (class:
> > org.apache.zeppelin.spark.ZeppelinContext. There are something different
> > between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while
> Spark
> > 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> > zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
> >
> >
> >
> >
> >
> > try {
> >   // Use reflection because of classname returned by queryExecution
> > changes from
> >   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
> >   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
> >   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
> >   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
> >   scala.collection.Seq seq = (scala.collection.Seq)
> > a.getClass().getMethod("output").invoke(a);
> >
> >   columns = (List<Attribute>) scala.collection.JavaConverters.
> > *seqAsJavaListConverter*(seq)
> >                                                              .asJava();
> > } catch (NoSuchMethodException | SecurityException |
> > IllegalAccessException
> >     | IllegalArgumentException | InvocationTargetException e) {
> >   throw new InterpreterException(e);
> > }
> >
> >
> >
> > Yunfeng
> >
> >
> >
>
>

RE: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by "Lin, Yunfeng " <yu...@citi.com>.
Thanks! Is there a possible workaround for 0.5.6 before releasing 0.6.0?

From: Jonathan Kelly [mailto:jonathakamzn@gmail.com]
Sent: Tuesday, February 02, 2016 4:19 PM
To: dev@zeppelin.incubator.apache.org; users
Subject: Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was only very recently, but do you happen to know yet when you plan on releasing 0.6.0?

On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org>> wrote:
This issue has been fixed few days ago in master branch.

Here is the PR
https://github.com/apache/incubator-zeppelin/pull/673<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dzeppelin_pull_673&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=hkFOUihaS8xvGHC2psZm_eTCpaIbp9JvGoR0Y6XV2wc&e=>

And related issues filed in JIRA before
https://issues.apache.org/jira/browse/ZEPPELIN-194<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D194&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=AvJcgiZOLR_OmESYH5osZSMnYd9_A6DBRo9CxjYsxuY&e=>
https://issues.apache.org/jira/browse/ZEPPELIN-381<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D381&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=6iFF2WKrNwEJZ68upi94jfS_2wCVPZHeZE0aTzS4slc&e=>

With the latest master branch, we recommend you to load dependencies via
interpreter setting menu instead of %dep interpreter.

If you want to know how to set dependencies with latest master branch,
please check doc
<https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__zeppelin.incubator.apache.org_docs_0.6.0-2Dincubating-2DSNAPSHOT_manual_dependencymanagement.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=8vqDxxyOWAchE7NS0L2BLpav1-tIPMXDwE0ndsqln-E&e=>>
and
let me know if it works.

Cheers,
Mina

On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com>> wrote:

> I’ve created an issue in jira
>
>
>
> https://issues.apache.org/jira/browse/ZEPPELIN-648<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D648&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=hlbpHl6qhaZZD5ZWy6DeFySuZEZt3JysxHFYer2shG8&e=>
>
>
>
> *From:* Benjamin Kim [mailto:bbuild11@gmail.com<ma...@gmail.com>]
> *Sent:* Tuesday, February 02, 2016 3:34 PM
> *To:* users@zeppelin.incubator.apache.org<ma...@zeppelin.incubator.apache.org>
> *Cc:* dev@zeppelin.incubator.apache.org<ma...@zeppelin.incubator.apache.org>
> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> 1.6/zeppelin 0.5.6
>
>
>
> Same here. I want to know the answer too.
>
>
>
>
>
> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>>
> wrote:
>
>
>
> Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> was doing something wrong or what. Glad to know it's not just me!
> Unfortunately I have not yet had the time to look any deeper into it. Would
> you mind filing a JIRA if there isn't already one?
>
>
>
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>> wrote:
>
> Hi guys,
>
>
>
> I load spark-csv dependencies in %spark, but not in %sql using apache
> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> 0.5.5 with spark 1.5 through
>
>
>
> Do you have similar problems?
>
>
>
> I am loading spark csv dependencies (
> https://github.com/databricks/spark-csv<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=gsqKqOCjZHR_BW221s_VLo9A7XweGQR50Me_YKEH0UM&e=>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=>
> )
>
>
>
> Using:
>
> %dep
>
> z.load(“PATH/commons-csv-1.1.jar”)
>
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>
> z.load(“PATH/scala-library-2.10.5.jar”)
>
>
>
> I am able to load a csv from hdfs using data frame API in spark. It is
> running perfect fine.
>
> %spark
>
> val df = sqlContext.read
>
>     .format("com.databricks.spark.csv")
>
>     .option("header", "false") // Use finrst line of all files as header
>
>     .option("inferSchema", "true") // Automatically infer data types
>
>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> in HDFS
>
> df.registerTempTable("people")
>
> df.show()
>
>
>
> This also work:
>
> %spark
>
> val df2=sqlContext.sql(“select * from people”)
>
> df2.show()
>
>
>
> But this doesn’t work….
>
> %sql
>
> select * from people
>
>
>
> java.lang.ClassNotFoundException:
> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> java.lang.Class.forName0(Native Method) at
> java.lang.Class.forName(Class.java:270) at
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at
> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> at
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> at
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>
>
> I notice that in the source code of spark interpreter (class:
> org.apache.zeppelin.spark.ZeppelinContext. There are something different
> between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark
> 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
>
>
>
>
>
> try {
>   // Use reflection because of classname returned by queryExecution
> changes from
>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>   scala.collection.Seq seq = (scala.collection.Seq)
> a.getClass().getMethod("output").invoke(a);
>
>   columns = (List<Attribute>) scala.collection.JavaConverters.
> *seqAsJavaListConverter*(seq)
>                                                              .asJava();
> } catch (NoSuchMethodException | SecurityException |
> IllegalAccessException
>     | IllegalArgumentException | InvocationTargetException e) {
>   throw new InterpreterException(e);
> }
>
>
>
> Yunfeng
>
>
>

RE: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by "Lin, Yunfeng " <yu...@citi.com>.
Thanks! Is there a possible workaround for 0.5.6 before releasing 0.6.0?

From: Jonathan Kelly [mailto:jonathakamzn@gmail.com]
Sent: Tuesday, February 02, 2016 4:19 PM
To: dev@zeppelin.incubator.apache.org; users
Subject: Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was only very recently, but do you happen to know yet when you plan on releasing 0.6.0?

On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org>> wrote:
This issue has been fixed few days ago in master branch.

Here is the PR
https://github.com/apache/incubator-zeppelin/pull/673<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dzeppelin_pull_673&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=hkFOUihaS8xvGHC2psZm_eTCpaIbp9JvGoR0Y6XV2wc&e=>

And related issues filed in JIRA before
https://issues.apache.org/jira/browse/ZEPPELIN-194<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D194&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=AvJcgiZOLR_OmESYH5osZSMnYd9_A6DBRo9CxjYsxuY&e=>
https://issues.apache.org/jira/browse/ZEPPELIN-381<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D381&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=6iFF2WKrNwEJZ68upi94jfS_2wCVPZHeZE0aTzS4slc&e=>

With the latest master branch, we recommend you to load dependencies via
interpreter setting menu instead of %dep interpreter.

If you want to know how to set dependencies with latest master branch,
please check doc
<https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__zeppelin.incubator.apache.org_docs_0.6.0-2Dincubating-2DSNAPSHOT_manual_dependencymanagement.html&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=8vqDxxyOWAchE7NS0L2BLpav1-tIPMXDwE0ndsqln-E&e=>>
and
let me know if it works.

Cheers,
Mina

On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com>> wrote:

> I’ve created an issue in jira
>
>
>
> https://issues.apache.org/jira/browse/ZEPPELIN-648<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZEPPELIN-2D648&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=hlbpHl6qhaZZD5ZWy6DeFySuZEZt3JysxHFYer2shG8&e=>
>
>
>
> *From:* Benjamin Kim [mailto:bbuild11@gmail.com<ma...@gmail.com>]
> *Sent:* Tuesday, February 02, 2016 3:34 PM
> *To:* users@zeppelin.incubator.apache.org<ma...@zeppelin.incubator.apache.org>
> *Cc:* dev@zeppelin.incubator.apache.org<ma...@zeppelin.incubator.apache.org>
> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> 1.6/zeppelin 0.5.6
>
>
>
> Same here. I want to know the answer too.
>
>
>
>
>
> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>>
> wrote:
>
>
>
> Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> was doing something wrong or what. Glad to know it's not just me!
> Unfortunately I have not yet had the time to look any deeper into it. Would
> you mind filing a JIRA if there isn't already one?
>
>
>
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>> wrote:
>
> Hi guys,
>
>
>
> I load spark-csv dependencies in %spark, but not in %sql using apache
> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> 0.5.5 with spark 1.5 through
>
>
>
> Do you have similar problems?
>
>
>
> I am loading spark csv dependencies (
> https://github.com/databricks/spark-csv<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=P05kinGmpfFiWwLExz9hZO76uI5rpP5SoqI8JJ2UWEY&s=gsqKqOCjZHR_BW221s_VLo9A7XweGQR50Me_YKEH0UM&e=>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=>
> )
>
>
>
> Using:
>
> %dep
>
> z.load(“PATH/commons-csv-1.1.jar”)
>
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>
> z.load(“PATH/scala-library-2.10.5.jar”)
>
>
>
> I am able to load a csv from hdfs using data frame API in spark. It is
> running perfect fine.
>
> %spark
>
> val df = sqlContext.read
>
>     .format("com.databricks.spark.csv")
>
>     .option("header", "false") // Use finrst line of all files as header
>
>     .option("inferSchema", "true") // Automatically infer data types
>
>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> in HDFS
>
> df.registerTempTable("people")
>
> df.show()
>
>
>
> This also work:
>
> %spark
>
> val df2=sqlContext.sql(“select * from people”)
>
> df2.show()
>
>
>
> But this doesn’t work….
>
> %sql
>
> select * from people
>
>
>
> java.lang.ClassNotFoundException:
> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> java.lang.Class.forName0(Native Method) at
> java.lang.Class.forName(Class.java:270) at
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at
> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> at
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> at
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>
>
> I notice that in the source code of spark interpreter (class:
> org.apache.zeppelin.spark.ZeppelinContext. There are something different
> between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark
> 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
>
>
>
>
>
> try {
>   // Use reflection because of classname returned by queryExecution
> changes from
>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>   scala.collection.Seq seq = (scala.collection.Seq)
> a.getClass().getMethod("output").invoke(a);
>
>   columns = (List<Attribute>) scala.collection.JavaConverters.
> *seqAsJavaListConverter*(seq)
>                                                              .asJava();
> } catch (NoSuchMethodException | SecurityException |
> IllegalAccessException
>     | IllegalArgumentException | InvocationTargetException e) {
>   throw new InterpreterException(e);
> }
>
>
>
> Yunfeng
>
>
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Jonathan Kelly <jo...@gmail.com>.
Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was only
very recently, but do you happen to know yet when you plan on releasing
0.6.0?

On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org> wrote:

> This issue has been fixed few days ago in master branch.
>
> Here is the PR
> https://github.com/apache/incubator-zeppelin/pull/673
>
> And related issues filed in JIRA before
> https://issues.apache.org/jira/browse/ZEPPELIN-194
> https://issues.apache.org/jira/browse/ZEPPELIN-381
>
> With the latest master branch, we recommend you to load dependencies via
> interpreter setting menu instead of %dep interpreter.
>
> If you want to know how to set dependencies with latest master branch,
> please check doc
> <
> https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html
> >
> and
> let me know if it works.
>
> Cheers,
> Mina
>
> On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com>
> wrote:
>
> > I’ve created an issue in jira
> >
> >
> >
> > https://issues.apache.org/jira/browse/ZEPPELIN-648
> >
> >
> >
> > *From:* Benjamin Kim [mailto:bbuild11@gmail.com]
> > *Sent:* Tuesday, February 02, 2016 3:34 PM
> > *To:* users@zeppelin.incubator.apache.org
> > *Cc:* dev@zeppelin.incubator.apache.org
> > *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> > 1.6/zeppelin 0.5.6
> >
> >
> >
> > Same here. I want to know the answer too.
> >
> >
> >
> >
> >
> > On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>
> > wrote:
> >
> >
> >
> > Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> > was doing something wrong or what. Glad to know it's not just me!
> > Unfortunately I have not yet had the time to look any deeper into it.
> Would
> > you mind filing a JIRA if there isn't already one?
> >
> >
> >
> > On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>
> wrote:
> >
> > Hi guys,
> >
> >
> >
> > I load spark-csv dependencies in %spark, but not in %sql using apache
> > zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> > 0.5.5 with spark 1.5 through
> >
> >
> >
> > Do you have similar problems?
> >
> >
> >
> > I am loading spark csv dependencies (
> > https://github.com/databricks/spark-csv
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=
> >
> > )
> >
> >
> >
> > Using:
> >
> > %dep
> >
> > z.load(“PATH/commons-csv-1.1.jar”)
> >
> > z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> >
> > z.load(“PATH/univocity-parsers-1.5.1.jar:)
> >
> > z.load(“PATH/scala-library-2.10.5.jar”)
> >
> >
> >
> > I am able to load a csv from hdfs using data frame API in spark. It is
> > running perfect fine.
> >
> > %spark
> >
> > val df = sqlContext.read
> >
> >     .format("com.databricks.spark.csv")
> >
> >     .option("header", "false") // Use finrst line of all files as header
> >
> >     .option("inferSchema", "true") // Automatically infer data types
> >
> >     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> > in HDFS
> >
> > df.registerTempTable("people")
> >
> > df.show()
> >
> >
> >
> > This also work:
> >
> > %spark
> >
> > val df2=sqlContext.sql(“select * from people”)
> >
> > df2.show()
> >
> >
> >
> > But this doesn’t work….
> >
> > %sql
> >
> > select * from people
> >
> >
> >
> > java.lang.ClassNotFoundException:
> > com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> > java.security.AccessController.doPrivileged(Native Method) at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> > java.lang.Class.forName0(Native Method) at
> > java.lang.Class.forName(Class.java:270) at
> >
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> >
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> > at
> >
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
> at
> > org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> > at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> > org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> > com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> > at
> >
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> >
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> > at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> > org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> > org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> > at
> >
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> > at
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> > at
> >
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> > at
> >
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> > at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> > org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> >
> >
> >
> >
> > I notice that in the source code of spark interpreter (class:
> > org.apache.zeppelin.spark.ZeppelinContext. There are something different
> > between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while
> Spark
> > 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> > zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
> >
> >
> >
> >
> >
> > try {
> >   // Use reflection because of classname returned by queryExecution
> > changes from
> >   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
> >   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
> >   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
> >   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
> >   scala.collection.Seq seq = (scala.collection.Seq)
> > a.getClass().getMethod("output").invoke(a);
> >
> >   columns = (List<Attribute>) scala.collection.JavaConverters.
> > *seqAsJavaListConverter*(seq)
> >                                                              .asJava();
> > } catch (NoSuchMethodException | SecurityException |
> > IllegalAccessException
> >     | IllegalArgumentException | InvocationTargetException e) {
> >   throw new InterpreterException(e);
> > }
> >
> >
> >
> > Yunfeng
> >
> >
> >
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Jonathan Kelly <jo...@gmail.com>.
Awesome, thank you! BTW, I know that the Zeppelin 0.5.6 release was only
very recently, but do you happen to know yet when you plan on releasing
0.6.0?

On Tue, Feb 2, 2016 at 1:07 PM mina lee <mi...@apache.org> wrote:

> This issue has been fixed few days ago in master branch.
>
> Here is the PR
> https://github.com/apache/incubator-zeppelin/pull/673
>
> And related issues filed in JIRA before
> https://issues.apache.org/jira/browse/ZEPPELIN-194
> https://issues.apache.org/jira/browse/ZEPPELIN-381
>
> With the latest master branch, we recommend you to load dependencies via
> interpreter setting menu instead of %dep interpreter.
>
> If you want to know how to set dependencies with latest master branch,
> please check doc
> <
> https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html
> >
> and
> let me know if it works.
>
> Cheers,
> Mina
>
> On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com>
> wrote:
>
> > I’ve created an issue in jira
> >
> >
> >
> > https://issues.apache.org/jira/browse/ZEPPELIN-648
> >
> >
> >
> > *From:* Benjamin Kim [mailto:bbuild11@gmail.com]
> > *Sent:* Tuesday, February 02, 2016 3:34 PM
> > *To:* users@zeppelin.incubator.apache.org
> > *Cc:* dev@zeppelin.incubator.apache.org
> > *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> > 1.6/zeppelin 0.5.6
> >
> >
> >
> > Same here. I want to know the answer too.
> >
> >
> >
> >
> >
> > On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>
> > wrote:
> >
> >
> >
> > Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> > was doing something wrong or what. Glad to know it's not just me!
> > Unfortunately I have not yet had the time to look any deeper into it.
> Would
> > you mind filing a JIRA if there isn't already one?
> >
> >
> >
> > On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>
> wrote:
> >
> > Hi guys,
> >
> >
> >
> > I load spark-csv dependencies in %spark, but not in %sql using apache
> > zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> > 0.5.5 with spark 1.5 through
> >
> >
> >
> > Do you have similar problems?
> >
> >
> >
> > I am loading spark csv dependencies (
> > https://github.com/databricks/spark-csv
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=
> >
> > )
> >
> >
> >
> > Using:
> >
> > %dep
> >
> > z.load(“PATH/commons-csv-1.1.jar”)
> >
> > z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> >
> > z.load(“PATH/univocity-parsers-1.5.1.jar:)
> >
> > z.load(“PATH/scala-library-2.10.5.jar”)
> >
> >
> >
> > I am able to load a csv from hdfs using data frame API in spark. It is
> > running perfect fine.
> >
> > %spark
> >
> > val df = sqlContext.read
> >
> >     .format("com.databricks.spark.csv")
> >
> >     .option("header", "false") // Use finrst line of all files as header
> >
> >     .option("inferSchema", "true") // Automatically infer data types
> >
> >     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> > in HDFS
> >
> > df.registerTempTable("people")
> >
> > df.show()
> >
> >
> >
> > This also work:
> >
> > %spark
> >
> > val df2=sqlContext.sql(“select * from people”)
> >
> > df2.show()
> >
> >
> >
> > But this doesn’t work….
> >
> > %sql
> >
> > select * from people
> >
> >
> >
> > java.lang.ClassNotFoundException:
> > com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> > java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> > java.security.AccessController.doPrivileged(Native Method) at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> > java.lang.Class.forName0(Native Method) at
> > java.lang.Class.forName(Class.java:270) at
> >
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> > at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> > org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> >
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> > at
> >
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
> at
> > org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> > at
> >
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> > org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> > com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> > com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> > at
> >
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> > at
> >
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> > at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> >
> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> >
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> > at
> >
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> > at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> > org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> > org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> > at
> >
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> > at
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> > at
> >
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> > at
> >
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> > at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> > org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> >
> >
> >
> >
> > I notice that in the source code of spark interpreter (class:
> > org.apache.zeppelin.spark.ZeppelinContext. There are something different
> > between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while
> Spark
> > 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> > zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
> >
> >
> >
> >
> >
> > try {
> >   // Use reflection because of classname returned by queryExecution
> > changes from
> >   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
> >   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
> >   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
> >   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
> >   scala.collection.Seq seq = (scala.collection.Seq)
> > a.getClass().getMethod("output").invoke(a);
> >
> >   columns = (List<Attribute>) scala.collection.JavaConverters.
> > *seqAsJavaListConverter*(seq)
> >                                                              .asJava();
> > } catch (NoSuchMethodException | SecurityException |
> > IllegalAccessException
> >     | IllegalArgumentException | InvocationTargetException e) {
> >   throw new InterpreterException(e);
> > }
> >
> >
> >
> > Yunfeng
> >
> >
> >
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by mina lee <mi...@apache.org>.
This issue has been fixed few days ago in master branch.

Here is the PR
https://github.com/apache/incubator-zeppelin/pull/673

And related issues filed in JIRA before
https://issues.apache.org/jira/browse/ZEPPELIN-194
https://issues.apache.org/jira/browse/ZEPPELIN-381

With the latest master branch, we recommend you to load dependencies via
interpreter setting menu instead of %dep interpreter.

If you want to know how to set dependencies with latest master branch,
please check doc
<https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html>
and
let me know if it works.

Cheers,
Mina

On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com> wrote:

> I’ve created an issue in jira
>
>
>
> https://issues.apache.org/jira/browse/ZEPPELIN-648
>
>
>
> *From:* Benjamin Kim [mailto:bbuild11@gmail.com]
> *Sent:* Tuesday, February 02, 2016 3:34 PM
> *To:* users@zeppelin.incubator.apache.org
> *Cc:* dev@zeppelin.incubator.apache.org
> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> 1.6/zeppelin 0.5.6
>
>
>
> Same here. I want to know the answer too.
>
>
>
>
>
> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>
> wrote:
>
>
>
> Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> was doing something wrong or what. Glad to know it's not just me!
> Unfortunately I have not yet had the time to look any deeper into it. Would
> you mind filing a JIRA if there isn't already one?
>
>
>
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com> wrote:
>
> Hi guys,
>
>
>
> I load spark-csv dependencies in %spark, but not in %sql using apache
> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> 0.5.5 with spark 1.5 through
>
>
>
> Do you have similar problems?
>
>
>
> I am loading spark csv dependencies (
> https://github.com/databricks/spark-csv
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=>
> )
>
>
>
> Using:
>
> %dep
>
> z.load(“PATH/commons-csv-1.1.jar”)
>
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>
> z.load(“PATH/scala-library-2.10.5.jar”)
>
>
>
> I am able to load a csv from hdfs using data frame API in spark. It is
> running perfect fine.
>
> %spark
>
> val df = sqlContext.read
>
>     .format("com.databricks.spark.csv")
>
>     .option("header", "false") // Use finrst line of all files as header
>
>     .option("inferSchema", "true") // Automatically infer data types
>
>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> in HDFS
>
> df.registerTempTable("people")
>
> df.show()
>
>
>
> This also work:
>
> %spark
>
> val df2=sqlContext.sql(“select * from people”)
>
> df2.show()
>
>
>
> But this doesn’t work….
>
> %sql
>
> select * from people
>
>
>
> java.lang.ClassNotFoundException:
> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> java.lang.Class.forName0(Native Method) at
> java.lang.Class.forName(Class.java:270) at
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at
> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> at
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> at
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>
>
> I notice that in the source code of spark interpreter (class:
> org.apache.zeppelin.spark.ZeppelinContext. There are something different
> between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark
> 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
>
>
>
>
>
> try {
>   // Use reflection because of classname returned by queryExecution
> changes from
>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>   scala.collection.Seq seq = (scala.collection.Seq)
> a.getClass().getMethod("output").invoke(a);
>
>   columns = (List<Attribute>) scala.collection.JavaConverters.
> *seqAsJavaListConverter*(seq)
>                                                              .asJava();
> } catch (NoSuchMethodException | SecurityException |
> IllegalAccessException
>     | IllegalArgumentException | InvocationTargetException e) {
>   throw new InterpreterException(e);
> }
>
>
>
> Yunfeng
>
>
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by mina lee <mi...@apache.org>.
This issue has been fixed few days ago in master branch.

Here is the PR
https://github.com/apache/incubator-zeppelin/pull/673

And related issues filed in JIRA before
https://issues.apache.org/jira/browse/ZEPPELIN-194
https://issues.apache.org/jira/browse/ZEPPELIN-381

With the latest master branch, we recommend you to load dependencies via
interpreter setting menu instead of %dep interpreter.

If you want to know how to set dependencies with latest master branch,
please check doc
<https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html>
and
let me know if it works.

Cheers,
Mina

On Tue, Feb 2, 2016 at 12:50 PM, Lin, Yunfeng <yu...@citi.com> wrote:

> I’ve created an issue in jira
>
>
>
> https://issues.apache.org/jira/browse/ZEPPELIN-648
>
>
>
> *From:* Benjamin Kim [mailto:bbuild11@gmail.com]
> *Sent:* Tuesday, February 02, 2016 3:34 PM
> *To:* users@zeppelin.incubator.apache.org
> *Cc:* dev@zeppelin.incubator.apache.org
> *Subject:* Re: csv dependencies loaded in %spark but not %sql in spark
> 1.6/zeppelin 0.5.6
>
>
>
> Same here. I want to know the answer too.
>
>
>
>
>
> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>
> wrote:
>
>
>
> Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> was doing something wrong or what. Glad to know it's not just me!
> Unfortunately I have not yet had the time to look any deeper into it. Would
> you mind filing a JIRA if there isn't already one?
>
>
>
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com> wrote:
>
> Hi guys,
>
>
>
> I load spark-csv dependencies in %spark, but not in %sql using apache
> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> 0.5.5 with spark 1.5 through
>
>
>
> Do you have similar problems?
>
>
>
> I am loading spark csv dependencies (
> https://github.com/databricks/spark-csv
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=>
> )
>
>
>
> Using:
>
> %dep
>
> z.load(“PATH/commons-csv-1.1.jar”)
>
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>
> z.load(“PATH/scala-library-2.10.5.jar”)
>
>
>
> I am able to load a csv from hdfs using data frame API in spark. It is
> running perfect fine.
>
> %spark
>
> val df = sqlContext.read
>
>     .format("com.databricks.spark.csv")
>
>     .option("header", "false") // Use finrst line of all files as header
>
>     .option("inferSchema", "true") // Automatically infer data types
>
>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> in HDFS
>
> df.registerTempTable("people")
>
> df.show()
>
>
>
> This also work:
>
> %spark
>
> val df2=sqlContext.sql(“select * from people”)
>
> df2.show()
>
>
>
> But this doesn’t work….
>
> %sql
>
> select * from people
>
>
>
> java.lang.ClassNotFoundException:
> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> java.lang.Class.forName0(Native Method) at
> java.lang.Class.forName(Class.java:270) at
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at
> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> at
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> at
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>
>
> I notice that in the source code of spark interpreter (class:
> org.apache.zeppelin.spark.ZeppelinContext. There are something different
> between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark
> 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
>
>
>
>
>
> try {
>   // Use reflection because of classname returned by queryExecution
> changes from
>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>   scala.collection.Seq seq = (scala.collection.Seq)
> a.getClass().getMethod("output").invoke(a);
>
>   columns = (List<Attribute>) scala.collection.JavaConverters.
> *seqAsJavaListConverter*(seq)
>                                                              .asJava();
> } catch (NoSuchMethodException | SecurityException |
> IllegalAccessException
>     | IllegalArgumentException | InvocationTargetException e) {
>   throw new InterpreterException(e);
> }
>
>
>
> Yunfeng
>
>
>

RE: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by "Lin, Yunfeng " <yu...@citi.com>.
I’ve created an issue in jira

https://issues.apache.org/jira/browse/ZEPPELIN-648

From: Benjamin Kim [mailto:bbuild11@gmail.com]
Sent: Tuesday, February 02, 2016 3:34 PM
To: users@zeppelin.incubator.apache.org
Cc: dev@zeppelin.incubator.apache.org
Subject: Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Same here. I want to know the answer too.


On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>> wrote:

Hey, I just ran into that same exact issue yesterday and wasn't sure if I was doing something wrong or what. Glad to know it's not just me! Unfortunately I have not yet had the time to look any deeper into it. Would you mind filing a JIRA if there isn't already one?

On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>> wrote:
Hi guys,

I load spark-csv dependencies in %spark, but not in %sql using apache zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin 0.5.5 with spark 1.5 through

Do you have similar problems?

I am loading spark csv dependencies (https://github.com/databricks/spark-csv<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=>)

Using:
%dep
z.load(“PATH/commons-csv-1.1.jar”)
z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
z.load(“PATH/univocity-parsers-1.5.1.jar:)
z.load(“PATH/scala-library-2.10.5.jar”)

I am able to load a csv from hdfs using data frame API in spark. It is running perfect fine.
%spark
val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "false") // Use finrst line of all files as header
    .option("inferSchema", "true") // Automatically infer data types
    .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file in HDFS
df.registerTempTable("people")
df.show()

This also work:
%spark
val df2=sqlContext.sql(“select * from people”)
df2.show()

But this doesn’t work….
%sql
select * from people

java.lang.ClassNotFoundException: com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435) at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at org.apache.xbean.asm5.ClassReader.b(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297) at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)



I notice that in the source code of spark interpreter (class: org.apache.zeppelin.spark.ZeppelinContext. There are something different between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark 1.6 is HiveContext. Unfortunately, no matter true or false are set in zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …


try {
  // Use reflection because of classname returned by queryExecution changes from
  // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
  // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
  Object qe = df.getClass().getMethod("queryExecution").invoke(df);
  Object a = qe.getClass().getMethod("analyzed").invoke(qe);
  scala.collection.Seq seq = (scala.collection.Seq) a.getClass().getMethod("output").invoke(a);

  columns = (List<Attribute>) scala.collection.JavaConverters.seqAsJavaListConverter(seq)
                                                             .asJava();
} catch (NoSuchMethodException | SecurityException | IllegalAccessException
    | IllegalArgumentException | InvocationTargetException e) {
  throw new InterpreterException(e);
}

Yunfeng


RE: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by "Lin, Yunfeng " <yu...@citi.com>.
I’ve created an issue in jira

https://issues.apache.org/jira/browse/ZEPPELIN-648

From: Benjamin Kim [mailto:bbuild11@gmail.com]
Sent: Tuesday, February 02, 2016 3:34 PM
To: users@zeppelin.incubator.apache.org
Cc: dev@zeppelin.incubator.apache.org
Subject: Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Same here. I want to know the answer too.


On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com>> wrote:

Hey, I just ran into that same exact issue yesterday and wasn't sure if I was doing something wrong or what. Glad to know it's not just me! Unfortunately I have not yet had the time to look any deeper into it. Would you mind filing a JIRA if there isn't already one?

On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com>> wrote:
Hi guys,

I load spark-csv dependencies in %spark, but not in %sql using apache zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin 0.5.5 with spark 1.5 through

Do you have similar problems?

I am loading spark csv dependencies (https://github.com/databricks/spark-csv<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=BQMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=b2BXWa66OlJ_NWqk5P310M6mGfus8eDC5O4J0-nePFY&m=dSXRZCZNlnU1tx9rtyX9UWfdjT0EPbafKr2NyIrXP-o&s=zUPPWKYhZiNUuIUWmlXXGF_94ImGHQ4qHpFCU0xSEzg&e=>)

Using:
%dep
z.load(“PATH/commons-csv-1.1.jar”)
z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
z.load(“PATH/univocity-parsers-1.5.1.jar:)
z.load(“PATH/scala-library-2.10.5.jar”)

I am able to load a csv from hdfs using data frame API in spark. It is running perfect fine.
%spark
val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "false") // Use finrst line of all files as header
    .option("inferSchema", "true") // Automatically infer data types
    .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file in HDFS
df.registerTempTable("people")
df.show()

This also work:
%spark
val df2=sqlContext.sql(“select * from people”)
df2.show()

But this doesn’t work….
%sql
select * from people

java.lang.ClassNotFoundException: com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435) at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at org.apache.xbean.asm5.ClassReader.b(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297) at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)



I notice that in the source code of spark interpreter (class: org.apache.zeppelin.spark.ZeppelinContext. There are something different between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark 1.6 is HiveContext. Unfortunately, no matter true or false are set in zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …


try {
  // Use reflection because of classname returned by queryExecution changes from
  // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
  // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
  Object qe = df.getClass().getMethod("queryExecution").invoke(df);
  Object a = qe.getClass().getMethod("analyzed").invoke(qe);
  scala.collection.Seq seq = (scala.collection.Seq) a.getClass().getMethod("output").invoke(a);

  columns = (List<Attribute>) scala.collection.JavaConverters.seqAsJavaListConverter(seq)
                                                             .asJava();
} catch (NoSuchMethodException | SecurityException | IllegalAccessException
    | IllegalArgumentException | InvocationTargetException e) {
  throw new InterpreterException(e);
}

Yunfeng


Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Benjamin Kim <bb...@gmail.com>.
Same here. I want to know the answer too.


> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com> wrote:
> 
> Hey, I just ran into that same exact issue yesterday and wasn't sure if I was doing something wrong or what. Glad to know it's not just me! Unfortunately I have not yet had the time to look any deeper into it. Would you mind filing a JIRA if there isn't already one?
> 
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yunfeng.lin@citi.com <ma...@citi.com>> wrote:
> Hi guys,
> 
>  
> 
> I load spark-csv dependencies in %spark, but not in %sql using apache zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin 0.5.5 with spark 1.5 through
> 
>  
> 
> Do you have similar problems?
> 
>  
> 
> I am loading spark csv dependencies (https://github.com/databricks/spark-csv <https://github.com/databricks/spark-csv>)
> 
>  
> 
> Using:
> 
> %dep
> 
> z.load(“PATH/commons-csv-1.1.jar”)
> 
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> 
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
> 
> z.load(“PATH/scala-library-2.10.5.jar”)
> 
>  
> 
> I am able to load a csv from hdfs using data frame API in spark. It is running perfect fine.
> 
> %spark
> 
> val df = sqlContext.read
> 
>     .format("com.databricks.spark.csv")
> 
>     .option("header", "false") // Use finrst line of all files as header
> 
>     .option("inferSchema", "true") // Automatically infer data types
> 
>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file in HDFS
> 
> df.registerTempTable("people")
> 
> df.show()
> 
>  
> 
> This also work:
> 
> %spark
> 
> val df2=sqlContext.sql(“select * from people”)
> 
> df2.show()
> 
>  
> 
> But this doesn’t work….
> 
> %sql
> 
> select * from people
> 
>  
> 
> java.lang.ClassNotFoundException: com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435) at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at org.apache.xbean.asm5.ClassReader.b(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> 
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297) at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
> 
>  
> 
>  
> 
>  
> 
> I notice that in the source code of spark interpreter (class: org.apache.zeppelin.spark.ZeppelinContext. There are something different between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark 1.6 is HiveContext. Unfortunately, no matter true or false are set in zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
> 
>  
> 
>  
> 
> try {
>   // Use reflection because of classname returned by queryExecution changes from
>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>   scala.collection.Seq seq = (scala.collection.Seq) a.getClass().getMethod("output").invoke(a);
> 
>   columns = (List<Attribute>) scala.collection.JavaConverters.seqAsJavaListConverter(seq)
>                                                              .asJava();
> } catch (NoSuchMethodException | SecurityException | IllegalAccessException
>     | IllegalArgumentException | InvocationTargetException e) {
>   throw new InterpreterException(e);
> }
> 
>  
> 
> Yunfeng
> 


Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Jonathan Kelly <jo...@gmail.com>.
BTW, this sounds very similar to
https://issues.apache.org/jira/browse/ZEPPELIN-297, which affects %pyspark
and was fixed in Zeppelin 0.5.5.

On Tue, Feb 2, 2016 at 12:32 PM Jonathan Kelly <jo...@gmail.com>
wrote:

> Hey, I just ran into that same exact issue yesterday and wasn't sure if I
> was doing something wrong or what. Glad to know it's not just me!
> Unfortunately I have not yet had the time to look any deeper into it. Would
> you mind filing a JIRA if there isn't already one?
>
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com> wrote:
>
>> Hi guys,
>>
>>
>>
>> I load spark-csv dependencies in %spark, but not in %sql using apache
>> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
>> 0.5.5 with spark 1.5 through
>>
>>
>>
>> Do you have similar problems?
>>
>>
>>
>> I am loading spark csv dependencies (
>> https://github.com/databricks/spark-csv)
>>
>>
>>
>> Using:
>>
>> %dep
>>
>> z.load(“PATH/commons-csv-1.1.jar”)
>>
>> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>>
>> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>>
>> z.load(“PATH/scala-library-2.10.5.jar”)
>>
>>
>>
>> I am able to load a csv from hdfs using data frame API in spark. It is
>> running perfect fine.
>>
>> %spark
>>
>> val df = sqlContext.read
>>
>>     .format("com.databricks.spark.csv")
>>
>>     .option("header", "false") // Use finrst line of all files as header
>>
>>     .option("inferSchema", "true") // Automatically infer data types
>>
>>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
>> in HDFS
>>
>> df.registerTempTable("people")
>>
>> df.show()
>>
>>
>>
>> This also work:
>>
>> %spark
>>
>> val df2=sqlContext.sql(“select * from people”)
>>
>> df2.show()
>>
>>
>>
>> But this doesn’t work….
>>
>> %sql
>>
>> select * from people
>>
>>
>>
>> java.lang.ClassNotFoundException:
>> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
>> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
>> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
>> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
>> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
>> java.lang.Class.forName0(Native Method) at
>> java.lang.Class.forName(Class.java:270) at
>> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
>> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
>> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
>> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
>> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
>> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
>> at
>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at
>> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
>> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
>> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
>> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
>> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
>> at
>> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>> at
>> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>
>> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>> at
>> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
>> at
>> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
>> at
>> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
>> at
>> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
>> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
>> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
>> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
>> at
>> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
>> at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>>
>>
>>
>>
>> I notice that in the source code of spark interpreter (class:
>> org.apache.zeppelin.spark.ZeppelinContext. There are something different
>> between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark
>> 1.6 is HiveContext. Unfortunately, no matter true or false are set in
>> zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
>>
>>
>>
>>
>>
>> try {
>>   // Use reflection because of classname returned by queryExecution
>> changes from
>>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>>   scala.collection.Seq seq = (scala.collection.Seq)
>> a.getClass().getMethod("output").invoke(a);
>>
>>   columns = (List<Attribute>) scala.collection.JavaConverters.
>> *seqAsJavaListConverter*(seq)
>>                                                              .asJava();
>> } catch (NoSuchMethodException | SecurityException |
>> IllegalAccessException
>>     | IllegalArgumentException | InvocationTargetException e) {
>>   throw new InterpreterException(e);
>> }
>>
>>
>>
>> Yunfeng
>>
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Benjamin Kim <bb...@gmail.com>.
Same here. I want to know the answer too.


> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly <jo...@gmail.com> wrote:
> 
> Hey, I just ran into that same exact issue yesterday and wasn't sure if I was doing something wrong or what. Glad to know it's not just me! Unfortunately I have not yet had the time to look any deeper into it. Would you mind filing a JIRA if there isn't already one?
> 
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yunfeng.lin@citi.com <ma...@citi.com>> wrote:
> Hi guys,
> 
>  
> 
> I load spark-csv dependencies in %spark, but not in %sql using apache zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin 0.5.5 with spark 1.5 through
> 
>  
> 
> Do you have similar problems?
> 
>  
> 
> I am loading spark csv dependencies (https://github.com/databricks/spark-csv <https://github.com/databricks/spark-csv>)
> 
>  
> 
> Using:
> 
> %dep
> 
> z.load(“PATH/commons-csv-1.1.jar”)
> 
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> 
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
> 
> z.load(“PATH/scala-library-2.10.5.jar”)
> 
>  
> 
> I am able to load a csv from hdfs using data frame API in spark. It is running perfect fine.
> 
> %spark
> 
> val df = sqlContext.read
> 
>     .format("com.databricks.spark.csv")
> 
>     .option("header", "false") // Use finrst line of all files as header
> 
>     .option("inferSchema", "true") // Automatically infer data types
> 
>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file in HDFS
> 
> df.registerTempTable("people")
> 
> df.show()
> 
>  
> 
> This also work:
> 
> %spark
> 
> val df2=sqlContext.sql(“select * from people”)
> 
> df2.show()
> 
>  
> 
> But this doesn’t work….
> 
> %sql
> 
> select * from people
> 
>  
> 
> java.lang.ClassNotFoundException: com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435) at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at org.apache.xbean.asm5.ClassReader.b(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> 
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297) at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
> 
>  
> 
>  
> 
>  
> 
> I notice that in the source code of spark interpreter (class: org.apache.zeppelin.spark.ZeppelinContext. There are something different between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark 1.6 is HiveContext. Unfortunately, no matter true or false are set in zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
> 
>  
> 
>  
> 
> try {
>   // Use reflection because of classname returned by queryExecution changes from
>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>   scala.collection.Seq seq = (scala.collection.Seq) a.getClass().getMethod("output").invoke(a);
> 
>   columns = (List<Attribute>) scala.collection.JavaConverters.seqAsJavaListConverter(seq)
>                                                              .asJava();
> } catch (NoSuchMethodException | SecurityException | IllegalAccessException
>     | IllegalArgumentException | InvocationTargetException e) {
>   throw new InterpreterException(e);
> }
> 
>  
> 
> Yunfeng
> 


Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Jonathan Kelly <jo...@gmail.com>.
Hey, I just ran into that same exact issue yesterday and wasn't sure if I
was doing something wrong or what. Glad to know it's not just me!
Unfortunately I have not yet had the time to look any deeper into it. Would
you mind filing a JIRA if there isn't already one?

On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com> wrote:

> Hi guys,
>
>
>
> I load spark-csv dependencies in %spark, but not in %sql using apache
> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> 0.5.5 with spark 1.5 through
>
>
>
> Do you have similar problems?
>
>
>
> I am loading spark csv dependencies (
> https://github.com/databricks/spark-csv)
>
>
>
> Using:
>
> %dep
>
> z.load(“PATH/commons-csv-1.1.jar”)
>
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>
> z.load(“PATH/scala-library-2.10.5.jar”)
>
>
>
> I am able to load a csv from hdfs using data frame API in spark. It is
> running perfect fine.
>
> %spark
>
> val df = sqlContext.read
>
>     .format("com.databricks.spark.csv")
>
>     .option("header", "false") // Use finrst line of all files as header
>
>     .option("inferSchema", "true") // Automatically infer data types
>
>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> in HDFS
>
> df.registerTempTable("people")
>
> df.show()
>
>
>
> This also work:
>
> %spark
>
> val df2=sqlContext.sql(“select * from people”)
>
> df2.show()
>
>
>
> But this doesn’t work….
>
> %sql
>
> select * from people
>
>
>
> java.lang.ClassNotFoundException:
> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> java.lang.Class.forName0(Native Method) at
> java.lang.Class.forName(Class.java:270) at
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at
> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> at
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> at
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>
>
> I notice that in the source code of spark interpreter (class:
> org.apache.zeppelin.spark.ZeppelinContext. There are something different
> between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark
> 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
>
>
>
>
>
> try {
>   // Use reflection because of classname returned by queryExecution
> changes from
>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>   scala.collection.Seq seq = (scala.collection.Seq)
> a.getClass().getMethod("output").invoke(a);
>
>   columns = (List<Attribute>) scala.collection.JavaConverters.
> *seqAsJavaListConverter*(seq)
>                                                              .asJava();
> } catch (NoSuchMethodException | SecurityException |
> IllegalAccessException
>     | IllegalArgumentException | InvocationTargetException e) {
>   throw new InterpreterException(e);
> }
>
>
>
> Yunfeng
>

Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

Posted by Jonathan Kelly <jo...@gmail.com>.
Hey, I just ran into that same exact issue yesterday and wasn't sure if I
was doing something wrong or what. Glad to know it's not just me!
Unfortunately I have not yet had the time to look any deeper into it. Would
you mind filing a JIRA if there isn't already one?

On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng <yu...@citi.com> wrote:

> Hi guys,
>
>
>
> I load spark-csv dependencies in %spark, but not in %sql using apache
> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin
> 0.5.5 with spark 1.5 through
>
>
>
> Do you have similar problems?
>
>
>
> I am loading spark csv dependencies (
> https://github.com/databricks/spark-csv)
>
>
>
> Using:
>
> %dep
>
> z.load(“PATH/commons-csv-1.1.jar”)
>
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
>
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
>
> z.load(“PATH/scala-library-2.10.5.jar”)
>
>
>
> I am able to load a csv from hdfs using data frame API in spark. It is
> running perfect fine.
>
> %spark
>
> val df = sqlContext.read
>
>     .format("com.databricks.spark.csv")
>
>     .option("header", "false") // Use finrst line of all files as header
>
>     .option("inferSchema", "true") // Automatically infer data types
>
>     .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file
> in HDFS
>
> df.registerTempTable("people")
>
> df.show()
>
>
>
> This also work:
>
> %spark
>
> val df2=sqlContext.sql(“select * from people”)
>
> df2.show()
>
>
>
> But this doesn’t work….
>
> %sql
>
> select * from people
>
>
>
> java.lang.ClassNotFoundException:
> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> java.lang.Class.forName0(Native Method) at
> java.lang.Class.forName(Class.java:270) at
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
> at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at
> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at
> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
> at
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> at
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
> at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134) at
> org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413) at
> org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
> at
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
> at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>
>
> I notice that in the source code of spark interpreter (class:
> org.apache.zeppelin.spark.ZeppelinContext. There are something different
> between spark 1.5 and spark 1.6. Spark 1.5 is using SQLContext while Spark
> 1.6 is HiveContext. Unfortunately, no matter true or false are set in
> zeppelin.spark.useHiveContext, %sql just can’t find csv dependencies …
>
>
>
>
>
> try {
>   // Use reflection because of classname returned by queryExecution
> changes from
>   // Spark <1.5.2 org.apache.spark.sql.SQLContext$QueryExecution
>   // Spark 1.6.0> org.apache.spark.sql.hive.HiveContext$QueryExecution
>   Object qe = df.getClass().getMethod("queryExecution").invoke(df);
>   Object a = qe.getClass().getMethod("analyzed").invoke(qe);
>   scala.collection.Seq seq = (scala.collection.Seq)
> a.getClass().getMethod("output").invoke(a);
>
>   columns = (List<Attribute>) scala.collection.JavaConverters.
> *seqAsJavaListConverter*(seq)
>                                                              .asJava();
> } catch (NoSuchMethodException | SecurityException |
> IllegalAccessException
>     | IllegalArgumentException | InvocationTargetException e) {
>   throw new InterpreterException(e);
> }
>
>
>
> Yunfeng
>