You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "AV (JIRA)" <ji...@apache.org> on 2018/12/28 10:05:00 UTC
[jira] [Created] (ZEPPELIN-3927) Unstable State running Code
AV created ZEPPELIN-3927:
----------------------------
Summary: Unstable State running Code
Key: ZEPPELIN-3927
URL: https://issues.apache.org/jira/browse/ZEPPELIN-3927
Project: Zeppelin
Issue Type: Bug
Components: zeppelin-interpreter
Affects Versions: 0.9.0
Reporter: AV
Executing the tutorial notebook code produces weird results using Spark 2.4.0:
> import org.apache.commons.io.IOUtils
> import java.net.URL
> import java.nio.charset.Charset
>
>
> // Zeppelin creates and injects sc (SparkContext) and sqlContext (HiveContext or SqlContext)
> // So you don't need create them manually
>
> // Remote Address
> val csvURL = "https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv";
>
> // Parallel processing
> val bankText = sc.parallelize( IOUtils.toString( new URL(csvURL), Charset.forName("UTF-8") ).toString().split("\n") )
>
> case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)
>
> val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map(
> s => Bank(s(0).toInt,
> s(1).replaceAll("\"", ""),
> s(2).replaceAll("\"", ""),
> s(3).replaceAll("\"", ""),
> s(5).replaceAll("\"", "").toInt
> )
> ).toDF()
>
> bank.registerTempTable("bank")
In the first run (after an spark interpreter restart), everything works fine, the output is:
> warning: there was one deprecation warning; re-run with -deprecation for details
> import sqlContext.implicits._
> import org.apache.commons.io.IOUtils
> import java.net.URL
> import java.nio.charset.Charset csvURL: String = [https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv]
> bankText: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0]
> at parallelize at <console>:28 defined class Bank bank: org.apache.spark.sql.DataFrame = [age: int, job: string ... 3 more fields]
After the code has been executed once any re-run fails:
> warning: there was one deprecation warning; re-run with -deprecation for details
> java.lang.IllegalArgumentException: URI is not absolute
> at java.net.URI.toURL(URI.java:1088)
> at org.apache.hadoop.fs.http.AbstractHttpFileSystem.open(AbstractHttpFileSystem.java:60)
> at org.apache.hadoop.fs.http.HttpsFileSystem.open(HttpsFileSystem.java:23)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
> at org.apache.hadoop.fs.FsUrlConnection.connect(FsUrlConnection.java:50)
> at org.apache.hadoop.fs.FsUrlConnection.getInputStream(FsUrlConnection.java:59)
> at java.net.URL.openStream(URL.java:1045)
> at org.apache.commons.io.IOUtils.toString(IOUtils.java:894) ... 39 elided
The deprecation warning:
> <console>:36: error: value toDF is not a member of org.apache.spark.rdd.RDD[Bank]
> possible cause: maybe a semicolon is missing before `value toDF'? ).toDF()
Any ideas?
ps.: I'm a little bit curious why there are no other messages regarding my problems. Using the latest stable spark / hadoop releases when compiling from source is natural for me.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)