You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Ophir Cohen (JIRA)" <ji...@apache.org> on 2015/07/02 11:42:05 UTC

[jira] [Created] (ZEPPELIN-150) Registered UDFs does not work on Spark jobs initiated from Zeppelin

Ophir Cohen created ZEPPELIN-150:
------------------------------------

             Summary: Registered UDFs does not work on Spark jobs initiated from Zeppelin
                 Key: ZEPPELIN-150
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-150
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.5.0
         Environment: - Zeppelin 0.5.0
- Spark 1.3.1 on top yarn cluster
- Hadoop 2.4
            Reporter: Ophir Cohen


When trying using UDF from Zeppelin we get _java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext_
(see below the full exception).

h5. Steps to reproduce:
1. Create and register the UDF:
{code}
def getNum(): Int = {
    100
}
hc.udf.register("getNum",getNum _)
{code}

2. Try on exists table:
{code}
%sql select getNum() from filteredNc limit 1
{code}
Failed.

3. Directly on HiveContext:
{code}
hc.sql("select getNum() from filteredNc limit 1").collect
{code}
Failed.

h5. few insights
1. On Spark shell it works as expected.
2. This bug happened only with RDDs/tables that originated from external source (Hive/S3 parquet files). Creating new DataFrame and register it works as expected.

The (almost) full exception:
{code}
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)