You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Eugene <bl...@gmail.com> on 2015/09/09 13:30:28 UTC

Notebook note with %sql does not see registered tables in previous note

Hello,

In promo video tutorial for Zeppelin (
http://www.youtube.com/watch?v=_PQbVH_aO5E&feature=youtu.be) I see
demonstration of how to declare table 'bank' in one note and query this
table in another note.
I tried doing this and it did not work.

I have one notebook with two notes:

Note #1: // basically all data definition for second note
eventsDF.registerTempTable("events")

Note #2: // querying defined table
%sql select * from events limit 10


Note #1 gets executed okay, Note #2 fails
org.apache.spark.sql.AnalysisException: no such table events;

How should I proceed to make my table be available for Note #2?

Thanks!

-- 


Best regards,
Eugene.

Re: Notebook note with %sql does not see registered tables in previous note

Posted by Eugene <bl...@gmail.com>.
Found solution!
Instead of using 'val sqlContext = new org.apache.spark.sql.SQLContext(sc)'
I need to use 'sqlContext' (provided by Zeppelin)

Based on
http://stackoverflow.com/questions/30819075/zeppelin-sqlcontext-registertemptable-issue

2015-09-09 16:40 GMT+04:00 Eugene <bl...@gmail.com>:

> I use spark 1.4.1 on EMR cluster with YARN
>
> Sharing my code:
>
> PARAGRAPH #1
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import scala.io.Source
> @transient val suffixesStr = "hello\nworld"
> @transient val suffList = suffixesStr.lines.filter(line =>
> !line.startsWith("//") && line.trim() != "")
> val suffListRDD = sc.parallelize(suffList.toList)
> val schemaString = "suffix"
> val schemaItemsCount = schemaString.count(_ == ' ') + 1
> // filtering to get only valid rows
> val eventsRDD = suffListRDD.filter(x => "\\t".r.findAllIn(x).length + 1 ==
> schemaItemsCount)
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.types.{StructType,StructField,StringType};
> val schema =StructType(schemaString.split(" ").map(fieldName =>
> StructField(fieldName, StringType, true)))
> // Convert records of the RDD (eventsRDD) to Rows.
> // split with preserve tokens
> val rowRDD = eventsRDD.map(_.split("\\t", -1)).map(p => Row(p(0)))
> val eventsDF = sqlContext.createDataFrame(rowRDD, schema)
> eventsDF.registerTempTable("events")
> // Uncomment these if you like to verify the logic in same paragraph
> //val cnt = sqlContext.sql("SELECT count(*) from events")
> //cnt.take(10).foreach(println)
> ----------------OUTPUT---------------
> sqlContext: org.apache.spark.sql.SQLContext =
> org.apache.spark.sql.SQLContext@36e407de import scala.io.Source
> suffixesStr: String = hello world suffList: Iterator[String] = non-empty
> iterator suffListRDD: org.apache.spark.rdd.RDD[String] =
> ParallelCollectionRDD[3] at parallelize at <console>:28 schemaString:
> String = suffix schemaItemsCount: Int = 1 eventsRDD:
> org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at filter at
> <console>:36 import org.apache.spark.sql.Row import
> org.apache.spark.sql.types.{StructType, StructField, StringType} schema:
> org.apache.spark.sql.types.StructType =
> StructType(StructField(suffix,StringType,true)) rowRDD:
> org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[6] at
> map at <console>:41 eventsDF: org.apache.spark.sql.DataFrame = [suffix:
> string]
>
> PARAGRAPH #2
> %sql SELECT count(*) from events
> ----------------OUTPUT---------------
> org.apache.spark.sql.AnalysisException: no such table events; line 1 pos
> 21 at
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>
>
>
> As you asked, I also checked the bank example. It works just fine:
>
> PARAGRAPH #1
> @transient val suffixesStr = "hello;world" @transient val suffList =
> suffixesStr.lines.filter(line => !line.startsWith("//") && line.trim() !=
> "") val bankText = sc.parallelize(suffList.toList) case class
> Bank(f1:String, f2:String) val bank =
> bankText.map(s=>s.split(";")).map(s=>Bank(s(0),s(1)))
> bank.toDF().registerTempTable("bank")
> ----------------OUTPUT---------------
> suffixesStr: String = hello;world suffList: Iterator[String] = non-empty
> iterator bankText: org.apache.spark.rdd.RDD[String] =
> ParallelCollectionRDD[11] at parallelize at <console>:30 defined class Bank
> bank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[13] at map at
> <console>:34
>
> PARAGRAPH #2
> %sql select count(*) from bank
> ----------------OUTPUT---------------
> 1
>
>
> What could be the root cause?
>
> 2015-09-09 15:50 GMT+04:00 IT CTO <go...@gmail.com>:
>
>> I just tested it on my code and had not problem doing so.
>> Can you share the code before the line eventsDF.registerTempTable("
>> events")?
>> Can you run the Zeppelin Load Data Into Table paragraph in the Zeppelin
>> Tutorial and then try to run the select from bank on another note and see
>> if it works?
>>
>> Generally speaking since they both run in the same JVM and share the same
>> spark context you shouldn't have problem doing what you described.
>>
>> Thanks,
>> Eran
>>
>> On Wed, Sep 9, 2015 at 2:30 PM Eugene <bl...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> In promo video tutorial for Zeppelin (
>>> http://www.youtube.com/watch?v=_PQbVH_aO5E&feature=youtu.be) I see
>>> demonstration of how to declare table 'bank' in one note and query this
>>> table in another note.
>>> I tried doing this and it did not work.
>>>
>>> I have one notebook with two notes:
>>>
>>> Note #1: // basically all data definition for second note
>>> eventsDF.registerTempTable("events")
>>>
>>> Note #2: // querying defined table
>>> %sql select * from events limit 10
>>>
>>>
>>> Note #1 gets executed okay, Note #2 fails
>>> org.apache.spark.sql.AnalysisException: no such table events;
>>>
>>> How should I proceed to make my table be available for Note #2?
>>>
>>> Thanks!
>>>
>>>
>>> --
>>>
>>>
>>> Best regards,
>>> Eugene.
>>>
>> --
>> Eran | "You don't need eyes to see, you need vision" (Faithless)
>>
>
>
>
> --
>
>
> Best regards,
> Eugene.
>



-- 


Best regards,
Eugene.

Re: Notebook note with %sql does not see registered tables in previous note

Posted by Eugene <bl...@gmail.com>.
I use spark 1.4.1 on EMR cluster with YARN

Sharing my code:

PARAGRAPH #1

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import scala.io.Source
@transient val suffixesStr = "hello\nworld"
@transient val suffList = suffixesStr.lines.filter(line =>
!line.startsWith("//") && line.trim() != "")
val suffListRDD = sc.parallelize(suffList.toList)
val schemaString = "suffix"
val schemaItemsCount = schemaString.count(_ == ' ') + 1
// filtering to get only valid rows
val eventsRDD = suffListRDD.filter(x => "\\t".r.findAllIn(x).length + 1 ==
schemaItemsCount)
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};
val schema =StructType(schemaString.split(" ").map(fieldName =>
StructField(fieldName, StringType, true)))
// Convert records of the RDD (eventsRDD) to Rows.
// split with preserve tokens
val rowRDD = eventsRDD.map(_.split("\\t", -1)).map(p => Row(p(0)))
val eventsDF = sqlContext.createDataFrame(rowRDD, schema)
eventsDF.registerTempTable("events")
// Uncomment these if you like to verify the logic in same paragraph
//val cnt = sqlContext.sql("SELECT count(*) from events")
//cnt.take(10).foreach(println)
----------------OUTPUT---------------
sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@36e407de import scala.io.Source
suffixesStr: String = hello world suffList: Iterator[String] = non-empty
iterator suffListRDD: org.apache.spark.rdd.RDD[String] =
ParallelCollectionRDD[3] at parallelize at <console>:28 schemaString:
String = suffix schemaItemsCount: Int = 1 eventsRDD:
org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at filter at
<console>:36 import org.apache.spark.sql.Row import
org.apache.spark.sql.types.{StructType, StructField, StringType} schema:
org.apache.spark.sql.types.StructType =
StructType(StructField(suffix,StringType,true)) rowRDD:
org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[6] at
map at <console>:41 eventsDF: org.apache.spark.sql.DataFrame = [suffix:
string]

PARAGRAPH #2
%sql SELECT count(*) from events
----------------OUTPUT---------------
org.apache.spark.sql.AnalysisException: no such table events; line 1 pos 21
at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)



As you asked, I also checked the bank example. It works just fine:

PARAGRAPH #1
@transient val suffixesStr = "hello;world" @transient val suffList =
suffixesStr.lines.filter(line => !line.startsWith("//") && line.trim() !=
"") val bankText = sc.parallelize(suffList.toList) case class
Bank(f1:String, f2:String) val bank =
bankText.map(s=>s.split(";")).map(s=>Bank(s(0),s(1)))
bank.toDF().registerTempTable("bank")
----------------OUTPUT---------------
suffixesStr: String = hello;world suffList: Iterator[String] = non-empty
iterator bankText: org.apache.spark.rdd.RDD[String] =
ParallelCollectionRDD[11] at parallelize at <console>:30 defined class Bank
bank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[13] at map at
<console>:34

PARAGRAPH #2
%sql select count(*) from bank
----------------OUTPUT---------------
1


What could be the root cause?

2015-09-09 15:50 GMT+04:00 IT CTO <go...@gmail.com>:

> I just tested it on my code and had not problem doing so.
> Can you share the code before the line eventsDF.registerTempTable("
> events")?
> Can you run the Zeppelin Load Data Into Table paragraph in the Zeppelin
> Tutorial and then try to run the select from bank on another note and see
> if it works?
>
> Generally speaking since they both run in the same JVM and share the same
> spark context you shouldn't have problem doing what you described.
>
> Thanks,
> Eran
>
> On Wed, Sep 9, 2015 at 2:30 PM Eugene <bl...@gmail.com> wrote:
>
>> Hello,
>>
>> In promo video tutorial for Zeppelin (
>> http://www.youtube.com/watch?v=_PQbVH_aO5E&feature=youtu.be) I see
>> demonstration of how to declare table 'bank' in one note and query this
>> table in another note.
>> I tried doing this and it did not work.
>>
>> I have one notebook with two notes:
>>
>> Note #1: // basically all data definition for second note
>> eventsDF.registerTempTable("events")
>>
>> Note #2: // querying defined table
>> %sql select * from events limit 10
>>
>>
>> Note #1 gets executed okay, Note #2 fails
>> org.apache.spark.sql.AnalysisException: no such table events;
>>
>> How should I proceed to make my table be available for Note #2?
>>
>> Thanks!
>>
>>
>> --
>>
>>
>> Best regards,
>> Eugene.
>>
> --
> Eran | "You don't need eyes to see, you need vision" (Faithless)
>



-- 


Best regards,
Eugene.

Re: Notebook note with %sql does not see registered tables in previous note

Posted by IT CTO <go...@gmail.com>.
I just tested it on my code and had not problem doing so.
Can you share the code before the line eventsDF.registerTempTable("events")?
Can you run the Zeppelin Load Data Into Table paragraph in the Zeppelin
Tutorial and then try to run the select from bank on another note and see
if it works?

Generally speaking since they both run in the same JVM and share the same
spark context you shouldn't have problem doing what you described.

Thanks,
Eran

On Wed, Sep 9, 2015 at 2:30 PM Eugene <bl...@gmail.com> wrote:

> Hello,
>
> In promo video tutorial for Zeppelin (
> http://www.youtube.com/watch?v=_PQbVH_aO5E&feature=youtu.be) I see
> demonstration of how to declare table 'bank' in one note and query this
> table in another note.
> I tried doing this and it did not work.
>
> I have one notebook with two notes:
>
> Note #1: // basically all data definition for second note
> eventsDF.registerTempTable("events")
>
> Note #2: // querying defined table
> %sql select * from events limit 10
>
>
> Note #1 gets executed okay, Note #2 fails
> org.apache.spark.sql.AnalysisException: no such table events;
>
> How should I proceed to make my table be available for Note #2?
>
> Thanks!
>
>
> --
>
>
> Best regards,
> Eugene.
>
-- 
Eran | "You don't need eyes to see, you need vision" (Faithless)