You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Serega Sheypak <se...@gmail.com> on 2017/05/02 11:57:47 UTC

Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL

Here is my sample notebook:
%spark
val linesText = sc.textFile("hdfs://cluster/user/me/lines.txt")

  case class Line(id:Long, firstField:String, secondField:String)

  val lines = linesText.map{ line =>
  val splitted = line.split(" ")
    println("splitted => " + splitted)
    Line(splitted(0).toLong, splitted(1), splitted(2))
  }

  lines.toDF().registerTempTable("lines")

  %sql select firstField, secondField, count(1) from lines group by
firstField, secondField order by firstField, secondField

1. I can see that spark job was started on my YARN cluster.
2. It failed
UI shows exception. Can't understand what do I do wrong:
%sql select firstField, secondField, count(1) from lines group by
firstField, secondField order by firstField, secondField ^
3. There is suspicious output in zeppelin log:

 INFO [2017-05-02 11:50:02,846] ({pool-2-thread-8}
SchedulerFactory.java[jobFinished]:137) - Job
paragraph_1493724118696_868476558 finished by scheduler
org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session1712472970
ERROR [2017-05-02 11:50:18,809] ({qtp1286783232-166}
NotebookServer.java[onMessage]:380) - Can't handle message
java.lang.NullPointerException
at
org.apache.zeppelin.socket.NotebookServer.addNewParagraphIfLastParagraphIsExecuted(NotebookServer.java:1713)
at
org.apache.zeppelin.socket.NotebookServer.persistAndExecuteSingleParagraph(NotebookServer.java:1741)
at
org.apache.zeppelin.socket.NotebookServer.runAllParagraphs(NotebookServer.java:1641)
at
org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:291)
at
org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at
org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at
org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at
org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at
org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at
org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at
org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at
org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at
org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at
org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
 INFO [2017-05-02 11:50:18,811] ({pool-2-thread-14}
SchedulerFactory.java[jobStarted]:131) - Job
paragraph_1493724118696_868476558 started by scheduler
org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session1712472970
 INFO [2017-05-02 11:50:18,812] ({pool-2-thread-14}
Paragraph.java[jobRun]:363) - run paragraph 20170502-112158_458502255 using
spark org.apache.zeppelin.interpreter.LazyOpenInterpreter@24ca4045
 WARN [2017-05-02 11:50:19,810] ({pool-2-thread-14}
NotebookServer.java[afterStatusChange]:2162) - Job
20170502-112158_458502255 is finished, status: ERROR, exception: null,
result: %text linesText: org.apache.spark.rdd.RDD[String] =
hdfs://path/to/my/file.txt MapPartitionsRDD[21] at textFile at <console>:27
defined class Line
lines: org.apache.spark.rdd.RDD[Line] = MapPartitionsRDD[22] at map at
<console>:31
warning: there was one deprecation warning; re-run with -deprecation for
details
<console>:1: error: ';' expected but ',' found.
  %sql select firstField, secondField, count(1) from lines group by
firstField, secondField order by firstField, secondField


4. Interpreter log is also confusing:
INFO [2017-05-02 11:41:52,706] ({pool-2-thread-2}
Logging.scala[logInfo]:54) - Warehouse location for Hive client (version
1.1.0) is file:/spark-warehouse
INFO [2017-05-02 11:41:52,983] ({pool-2-thread-2}
PerfLogger.java[PerfLogBegin]:122) - <PERFLOG method=create_database
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2}
HiveMetaStore.java[logInfo]:795) - 0: create_database:
Database(name:default, description:default database,
locationUri:file:/spark-warehouse, parameters:{})
INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2}
HiveMetaStore.java[logAuditEvent]:388) - ugi=zblenessy
ip=unknown-ip-addr cmd=create_database:
Database(name:default, description:default database,
locationUri:file:/spark-warehouse, parameters:{})
ERROR [2017-05-02 11:41:52,992] ({pool-2-thread-2}
RetryingHMSHandler.java[invokeInternal]:189) -
AlreadyExistsException(message:Database default already exists)
 at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:944)

What should I do with failed metastore DB creation? Is it fine?

I'm stuck, can you give me some ideas please?

Re: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL

Posted by Serega Sheypak <se...@gmail.com>.
Just spark code works fine:
val linesText = sc.textFile("hdfs://my/file/onhdfs.txt")

  case class Line(id:Long, firstField:String, secondField:String)

  val lines = linesText.map{ line =>
  val splitted = line.split(" ")
    println("splitted => " + splitted)
    Line(splitted(0).toLong, splitted(1), splitted(2))
  }

lines.collect().foreach(println)

prints file contexts to UI. I have some trouble with sql...



2017-05-02 13:57 GMT+02:00 Serega Sheypak <se...@gmail.com>:

> Here is my sample notebook:
> %spark
> val linesText = sc.textFile("hdfs://cluster/user/me/lines.txt")
>
>   case class Line(id:Long, firstField:String, secondField:String)
>
>   val lines = linesText.map{ line =>
>   val splitted = line.split(" ")
>     println("splitted => " + splitted)
>     Line(splitted(0).toLong, splitted(1), splitted(2))
>   }
>
>   lines.toDF().registerTempTable("lines")
>
>   %sql select firstField, secondField, count(1) from lines group by
> firstField, secondField order by firstField, secondField
>
> 1. I can see that spark job was started on my YARN cluster.
> 2. It failed
> UI shows exception. Can't understand what do I do wrong:
> %sql select firstField, secondField, count(1) from lines group by
> firstField, secondField order by firstField, secondField ^
> 3. There is suspicious output in zeppelin log:
>
>  INFO [2017-05-02 11:50:02,846] ({pool-2-thread-8} SchedulerFactory.java[jobFinished]:137)
> - Job paragraph_1493724118696_868476558 finished by scheduler
> org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_
> session1712472970
> ERROR [2017-05-02 11:50:18,809] ({qtp1286783232-166}
> NotebookServer.java[onMessage]:380) - Can't handle message
> java.lang.NullPointerException
> at org.apache.zeppelin.socket.NotebookServer.
> addNewParagraphIfLastParagraphIsExecuted(NotebookServer.java:1713)
> at org.apache.zeppelin.socket.NotebookServer.
> persistAndExecuteSingleParagraph(NotebookServer.java:1741)
> at org.apache.zeppelin.socket.NotebookServer.runAllParagraphs(
> NotebookServer.java:1641)
> at org.apache.zeppelin.socket.NotebookServer.onMessage(
> NotebookServer.java:291)
> at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(
> NotebookSocket.java:59)
> at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.
> onTextMessage(JettyListenerEventDriver.java:128)
> at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.
> messageComplete(SimpleTextMessage.java:69)
> at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.
> appendMessage(AbstractEventDriver.java:65)
> at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.
> onTextFrame(JettyListenerEventDriver.java:122)
> at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.
> incomingFrame(AbstractEventDriver.java:161)
> at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(
> WebSocketSession.java:309)
> at org.eclipse.jetty.websocket.common.extensions.
> ExtensionStack.incomingFrame(ExtensionStack.java:214)
> at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
> at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
> at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.
> readParse(AbstractWebSocketConnection.java:632)
> at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.
> onFillable(AbstractWebSocketConnection.java:480)
> at org.eclipse.jetty.io.AbstractConnection$2.run(
> AbstractConnection.java:544)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:635)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>  INFO [2017-05-02 11:50:18,811] ({pool-2-thread-14} SchedulerFactory.java[jobStarted]:131)
> - Job paragraph_1493724118696_868476558 started by scheduler
> org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_
> session1712472970
>  INFO [2017-05-02 11:50:18,812] ({pool-2-thread-14}
> Paragraph.java[jobRun]:363) - run paragraph 20170502-112158_458502255 using
> spark org.apache.zeppelin.interpreter.LazyOpenInterpreter@24ca4045
>  WARN [2017-05-02 11:50:19,810] ({pool-2-thread-14} NotebookServer.java[afterStatusChange]:2162)
> - Job 20170502-112158_458502255 is finished, status: ERROR, exception:
> null, result: %text linesText: org.apache.spark.rdd.RDD[String] =
> hdfs://path/to/my/file.txt MapPartitionsRDD[21] at textFile at <console>:27
> defined class Line
> lines: org.apache.spark.rdd.RDD[Line] = MapPartitionsRDD[22] at map at
> <console>:31
> warning: there was one deprecation warning; re-run with -deprecation for
> details
> <console>:1: error: ';' expected but ',' found.
>   %sql select firstField, secondField, count(1) from lines group by
> firstField, secondField order by firstField, secondField
>
>
> 4. Interpreter log is also confusing:
> INFO [2017-05-02 11:41:52,706] ({pool-2-thread-2}
> Logging.scala[logInfo]:54) - Warehouse location for Hive client (version
> 1.1.0) is file:/spark-warehouse
> INFO [2017-05-02 11:41:52,983] ({pool-2-thread-2}
> PerfLogger.java[PerfLogBegin]:122) - <PERFLOG method=create_database
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2}
> HiveMetaStore.java[logInfo]:795) - 0: create_database:
> Database(name:default, description:default database,
> locationUri:file:/spark-warehouse, parameters:{})
> INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} HiveMetaStore.java[logAuditEvent]:388)
> - ugi=zblenessy ip=unknown-ip-addr cmd=create_database:
> Database(name:default, description:default database,
> locationUri:file:/spark-warehouse, parameters:{})
> ERROR [2017-05-02 11:41:52,992] ({pool-2-thread-2} RetryingHMSHandler.java[invokeInternal]:189)
> - AlreadyExistsException(message:Database default already exists)
>  at org.apache.hadoop.hive.metastore.HiveMetaStore$
> HMSHandler.create_database(HiveMetaStore.java:944)
>
> What should I do with failed metastore DB creation? Is it fine?
>
> I'm stuck, can you give me some ideas please?
>

RE: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL

Posted by David Howell <da...@zipmoney.com.au>.
That looks to be related to  Cloudera,  I'm sorry I can't provide any useful advice, I've not used that distro. Probably best to ask on a Cloudera forum, user group or the Cloudera support.

From: Serega Sheypak<ma...@gmail.com>
Sent: Wednesday, 3 May 2017 3:11 AM
To: David Howell<ma...@zipmoney.com.au>
Cc: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL

Ah... sorry, so stupid. You are right. I've moved
%sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField

to the next paragraph and it works!

What does it mean?
zeppelin-interpreter-spark-spark interpreter complain every second on:

WARN [2017-05-02 17:05:42,396] ({dispatcher-event-loop-16} ScriptBasedMapping.java[runResolveCommand]:254) - Exception running /etc/hadoop/conf.cloudera.yarn7/topology.py 17.134.172.218

java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn7/topology.py" (in directory "/"): error=2, No such file or directory

does spark really need it? What I'm supposed to do?


2017-05-02 15:16 GMT+02:00 David Howell <da...@zipmoney.com.au>>:
Hi Serega,
I see this in the error log “error: ';' expected but ',' found.”

Are you running the %sql in the same paragraph as the %spark? I don’t think that is supported. I think you have to shift the %sql to a new paragraph, you can then run the spark and then the sql separately.

From: Serega Sheypak<ma...@gmail.com>
Sent: Tuesday, 2 May 2017 9:58 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL

Here is my sample notebook:
%spark
val linesText = sc.textFile("hdfs://cluster/user/me/lines.txt")

  case class Line(id:Long, firstField:String, secondField:String)

  val lines = linesText.map{ line =>
  val splitted = line.split(" ")
    println("splitted => " + splitted)
    Line(splitted(0).toLong, splitted(1), splitted(2))
  }

  lines.toDF().registerTempTable("lines")

  %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField

1. I can see that spark job was started on my YARN cluster.
2. It failed
UI shows exception. Can't understand what do I do wrong:
%sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField ^
3. There is suspicious output in zeppelin log:

 INFO [2017-05-02 11:50:02,846] ({pool-2-thread-8} SchedulerFactory.java[jobFinished]:137) - Job paragraph_1493724118696_868476558 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session1712472970
ERROR [2017-05-02 11:50:18,809] ({qtp1286783232-166} NotebookServer.java[onMessage]:380) - Can't handle message
java.lang.NullPointerException
at org.apache.zeppelin.socket.NotebookServer.addNewParagraphIfLastParagraphIsExecuted(NotebookServer.java:1713)
at org.apache.zeppelin.socket.NotebookServer.persistAndExecuteSingleParagraph(NotebookServer.java:1741)
at org.apache.zeppelin.socket.NotebookServer.runAllParagraphs(NotebookServer.java:1641)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:291)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io<http://common.io>.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io<http://common.io>.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io<http://org.eclipse.jetty.io>.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
 INFO [2017-05-02 11:50:18,811] ({pool-2-thread-14} SchedulerFactory.java[jobStarted]:131) - Job paragraph_1493724118696_868476558 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session1712472970
 INFO [2017-05-02 11:50:18,812] ({pool-2-thread-14} Paragraph.java[jobRun]:363) - run paragraph 20170502-112158_458502255 using spark org.apache.zeppelin.interpreter.LazyOpenInterpreter@24ca4045
 WARN [2017-05-02 11:50:19,810] ({pool-2-thread-14} NotebookServer.java[afterStatusChange]:2162) - Job 20170502-112158_458502255 is finished, status: ERROR, exception: null, result: %text linesText: org.apache.spark.rdd.RDD[String] = hdfs://path/to/my/file.txt MapPartitionsRDD[21] at textFile at <console>:27
defined class Line
lines: org.apache.spark.rdd.RDD[Line] = MapPartitionsRDD[22] at map at <console>:31
warning: there was one deprecation warning; re-run with -deprecation for details
<console>:1: error: ';' expected but ',' found.
  %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField


4. Interpreter log is also confusing:
INFO [2017-05-02 11:41:52,706] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Warehouse location for Hive client (version 1.1.0) is file:/spark-warehouse
INFO [2017-05-02 11:41:52,983] ({pool-2-thread-2} PerfLogger.java[PerfLogBegin]:122) - <PERFLOG method=create_database from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} HiveMetaStore.java[logInfo]:795) - 0: create_database: Database(name:default, description:default database, locationUri:file:/spark-warehouse, parameters:{})
INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} HiveMetaStore.java[logAuditEvent]:388) - ugi=zblenessy ip=unknown-ip-addr cmd=create_database: Database(name:default, description:default database, locationUri:file:/spark-warehouse, parameters:{})
ERROR [2017-05-02 11:41:52,992] ({pool-2-thread-2} RetryingHMSHandler.java[invokeInternal]:189) - AlreadyExistsException(message:Database default already exists)
 at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:944)

What should I do with failed metastore DB creation? Is it fine?

I'm stuck, can you give me some ideas please?


Re: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL

Posted by Serega Sheypak <se...@gmail.com>.
Ah... sorry, so stupid. You are right. I've moved
%sql select firstField, secondField, count(1) from lines group by
firstField, secondField order by firstField, secondField

to the next paragraph and it works!

What does it mean?
zeppelin-interpreter-spark-spark interpreter complain every second on:

WARN [2017-05-02 17:05:42,396] ({dispatcher-event-loop-16}
ScriptBasedMapping.java[runResolveCommand]:254) - Exception running
/etc/hadoop/conf.cloudera.yarn7/topology.py 17.134.172.218

java.io.IOException: Cannot run program
"/etc/hadoop/conf.cloudera.yarn7/topology.py" (in directory "/"): error=2,
No such file or directory

does spark really need it? What I'm supposed to do?


2017-05-02 15:16 GMT+02:00 David Howell <da...@zipmoney.com.au>:

> Hi Serega,
>
> I see this in the error log “error: ';' expected but ',' found.”
>
>
>
> Are you running the %sql in the same paragraph as the %spark? I don’t
> think that is supported. I think you have to shift the %sql to a new
> paragraph, you can then run the spark and then the sql separately.
>
>
>
> *From: *Serega Sheypak <se...@gmail.com>
> *Sent: *Tuesday, 2 May 2017 9:58 PM
> *To: *users@zeppelin.apache.org
> *Subject: *Can't run simple example with scala and spark SQL. Some non
> obvious syntax error in SQL
>
>
> Here is my sample notebook:
> %spark
> val linesText = sc.textFile("hdfs://cluster/user/me/lines.txt")
>
>   case class Line(id:Long, firstField:String, secondField:String)
>
>   val lines = linesText.map{ line =>
>   val splitted = line.split(" ")
>     println("splitted => " + splitted)
>     Line(splitted(0).toLong, splitted(1), splitted(2))
>   }
>
>   lines.toDF().registerTempTable("lines")
>
>   %sql select firstField, secondField, count(1) from lines group by
> firstField, secondField order by firstField, secondField
>
> 1. I can see that spark job was started on my YARN cluster.
> 2. It failed
> UI shows exception. Can't understand what do I do wrong:
> %sql select firstField, secondField, count(1) from lines group by
> firstField, secondField order by firstField, secondField ^
> 3. There is suspicious output in zeppelin log:
>
>  INFO [2017-05-02 11:50:02,846] ({pool-2-thread-8} SchedulerFactory.java[jobFinished]:137)
> - Job paragraph_1493724118696_868476558 finished by scheduler
> org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_
> session1712472970
> ERROR [2017-05-02 11:50:18,809] ({qtp1286783232-166}
> NotebookServer.java[onMessage]:380) - Can't handle message
> java.lang.NullPointerException
> at org.apache.zeppelin.socket.NotebookServer.
> addNewParagraphIfLastParagraphIsExecuted(NotebookServer.java:1713)
> at org.apache.zeppelin.socket.NotebookServer.
> persistAndExecuteSingleParagraph(NotebookServer.java:1741)
> at org.apache.zeppelin.socket.NotebookServer.runAllParagraphs(
> NotebookServer.java:1641)
> at org.apache.zeppelin.socket.NotebookServer.onMessage(
> NotebookServer.java:291)
> at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(
> NotebookSocket.java:59)
> at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.
> onTextMessage(JettyListenerEventDriver.java:128)
> at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.
> messageComplete(SimpleTextMessage.java:69)
> at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.
> appendMessage(AbstractEventDriver.java:65)
> at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.
> onTextFrame(JettyListenerEventDriver.java:122)
> at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.
> incomingFrame(AbstractEventDriver.java:161)
> at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(
> WebSocketSession.java:309)
> at org.eclipse.jetty.websocket.common.extensions.
> ExtensionStack.incomingFrame(ExtensionStack.java:214)
> at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
> at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
> at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.
> readParse(AbstractWebSocketConnection.java:632)
> at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.
> onFillable(AbstractWebSocketConnection.java:480)
> at org.eclipse.jetty.io.AbstractConnection$2.run(
> AbstractConnection.java:544)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:635)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>  INFO [2017-05-02 11:50:18,811] ({pool-2-thread-14} SchedulerFactory.java[jobStarted]:131)
> - Job paragraph_1493724118696_868476558 started by scheduler
> org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_
> session1712472970
>  INFO [2017-05-02 11:50:18,812] ({pool-2-thread-14}
> Paragraph.java[jobRun]:363) - run paragraph 20170502-112158_458502255 using
> spark org.apache.zeppelin.interpreter.LazyOpenInterpreter@24ca4045
>  WARN [2017-05-02 11:50:19,810] ({pool-2-thread-14} NotebookServer.java[afterStatusChange]:2162)
> - Job 20170502-112158_458502255 is finished, status: ERROR, exception:
> null, result: %text linesText: org.apache.spark.rdd.RDD[String] =
> hdfs://path/to/my/file.txt MapPartitionsRDD[21] at textFile at <console>:27
> defined class Line
> lines: org.apache.spark.rdd.RDD[Line] = MapPartitionsRDD[22] at map at
> <console>:31
> warning: there was one deprecation warning; re-run with -deprecation for
> details
> <console>:1: error: ';' expected but ',' found.
>   %sql select firstField, secondField, count(1) from lines group by
> firstField, secondField order by firstField, secondField
>
>
> 4. Interpreter log is also confusing:
> INFO [2017-05-02 11:41:52,706] ({pool-2-thread-2}
> Logging.scala[logInfo]:54) - Warehouse location for Hive client (version
> 1.1.0) is file:/spark-warehouse
> INFO [2017-05-02 11:41:52,983] ({pool-2-thread-2}
> PerfLogger.java[PerfLogBegin]:122) - <PERFLOG method=create_database
> from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
> INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2}
> HiveMetaStore.java[logInfo]:795) - 0: create_database:
> Database(name:default, description:default database,
> locationUri:file:/spark-warehouse, parameters:{})
> INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} HiveMetaStore.java[logAuditEvent]:388)
> - ugi=zblenessy ip=unknown-ip-addr cmd=create_database:
> Database(name:default, description:default database,
> locationUri:file:/spark-warehouse, parameters:{})
> ERROR [2017-05-02 11:41:52,992] ({pool-2-thread-2} RetryingHMSHandler.java[invokeInternal]:189)
> - AlreadyExistsException(message:Database default already exists)
>  at org.apache.hadoop.hive.metastore.HiveMetaStore$
> HMSHandler.create_database(HiveMetaStore.java:944)
>
> What should I do with failed metastore DB creation? Is it fine?
>
> I'm stuck, can you give me some ideas please?
>

RE: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL

Posted by David Howell <da...@zipmoney.com.au>.
Hi Serega,
I see this in the error log “error: ';' expected but ',' found.”

Are you running the %sql in the same paragraph as the %spark? I don’t think that is supported. I think you have to shift the %sql to a new paragraph, you can then run the spark and then the sql separately.

From: Serega Sheypak<ma...@gmail.com>
Sent: Tuesday, 2 May 2017 9:58 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Can't run simple example with scala and spark SQL. Some non obvious syntax error in SQL

Here is my sample notebook:
%spark
val linesText = sc.textFile("hdfs://cluster/user/me/lines.txt")

  case class Line(id:Long, firstField:String, secondField:String)

  val lines = linesText.map{ line =>
  val splitted = line.split(" ")
    println("splitted => " + splitted)
    Line(splitted(0).toLong, splitted(1), splitted(2))
  }

  lines.toDF().registerTempTable("lines")

  %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField

1. I can see that spark job was started on my YARN cluster.
2. It failed
UI shows exception. Can't understand what do I do wrong:
%sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField ^
3. There is suspicious output in zeppelin log:

 INFO [2017-05-02 11:50:02,846] ({pool-2-thread-8} SchedulerFactory.java[jobFinished]:137) - Job paragraph_1493724118696_868476558 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session1712472970
ERROR [2017-05-02 11:50:18,809] ({qtp1286783232-166} NotebookServer.java[onMessage]:380) - Can't handle message
java.lang.NullPointerException
at org.apache.zeppelin.socket.NotebookServer.addNewParagraphIfLastParagraphIsExecuted(NotebookServer.java:1713)
at org.apache.zeppelin.socket.NotebookServer.persistAndExecuteSingleParagraph(NotebookServer.java:1741)
at org.apache.zeppelin.socket.NotebookServer.runAllParagraphs(NotebookServer.java:1641)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:291)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
 INFO [2017-05-02 11:50:18,811] ({pool-2-thread-14} SchedulerFactory.java[jobStarted]:131) - Job paragraph_1493724118696_868476558 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session1712472970
 INFO [2017-05-02 11:50:18,812] ({pool-2-thread-14} Paragraph.java[jobRun]:363) - run paragraph 20170502-112158_458502255 using spark org.apache.zeppelin.interpreter.LazyOpenInterpreter@24ca4045
 WARN [2017-05-02 11:50:19,810] ({pool-2-thread-14} NotebookServer.java[afterStatusChange]:2162) - Job 20170502-112158_458502255 is finished, status: ERROR, exception: null, result: %text linesText: org.apache.spark.rdd.RDD[String] = hdfs://path/to/my/file.txt MapPartitionsRDD[21] at textFile at <console>:27
defined class Line
lines: org.apache.spark.rdd.RDD[Line] = MapPartitionsRDD[22] at map at <console>:31
warning: there was one deprecation warning; re-run with -deprecation for details
<console>:1: error: ';' expected but ',' found.
  %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField


4. Interpreter log is also confusing:
INFO [2017-05-02 11:41:52,706] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Warehouse location for Hive client (version 1.1.0) is file:/spark-warehouse
INFO [2017-05-02 11:41:52,983] ({pool-2-thread-2} PerfLogger.java[PerfLogBegin]:122) - <PERFLOG method=create_database from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} HiveMetaStore.java[logInfo]:795) - 0: create_database: Database(name:default, description:default database, locationUri:file:/spark-warehouse, parameters:{})
INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} HiveMetaStore.java[logAuditEvent]:388) - ugi=zblenessy ip=unknown-ip-addr cmd=create_database: Database(name:default, description:default database, locationUri:file:/spark-warehouse, parameters:{})
ERROR [2017-05-02 11:41:52,992] ({pool-2-thread-2} RetryingHMSHandler.java[invokeInternal]:189) - AlreadyExistsException(message:Database default already exists)
 at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:944)

What should I do with failed metastore DB creation? Is it fine?

I'm stuck, can you give me some ideas please?