You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by moon soo Lee <mo...@apache.org> on 2015/07/02 01:35:22 UTC

Re: UDFs in Zeppelin??

Really appreciate for sharing the problem.
Very interesting. Do you mind file a issue on JIRA?

Best,
moon

On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <op...@gmail.com> wrote:

> BTW, this isn't working as well:
>
>
>
> *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val
> sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
> sidNameDF2.registerTempTable("tmp_sid_name2")*
>
>
> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <op...@gmail.com> wrote:
>
>> I've made some progress in this issue and I think it's a bug...
>>
>> Apparently, when trying to use registered UDFs on tables that comes from
>> Hive - it returns the above exception (*ClassNotFoundException:
>> org.apache.zeppelin.spark.ZeppelinContext*).
>> When create new table and register it - UDFs works as expected.
>> You can see below to full details and example.
>>
>> Can someone tell if it's the expected behavior or a bug?
>> BTW
>> I don't mind to work on that bug - if you can give a pointer to the right
>> places.
>>
>> BTW2
>> Trying to register the SAME DataFrame as tempTable does not solve the
>> problem - only creating new table out of new DataFrame (see below).
>>
>>
>> *Detailed example*
>> 1. I have table in Hive called '*hive_table*' with string field called
>> *'name'* and int filed called *'sid'*
>>
>> 2. I registered a udf:
>> *def getStr(str: String) = str + "_str"*
>> *hc.udf.register("getStr", getStr _)*
>>
>> 3. Running the following on Zeppelin:
>> *%sql select getStr(name), * from** hive_table*
>> yields with excpetion:
>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>
>> 4. Creating new table, as follows:
>> *case class SidName(sid: Int, name: String)*
>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>> row.getString(1)))*
>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>
>> 5. Query the new table in the same fashion:
>> *%sql select getStr(name), * from tmp_sid_name*
>>
>> This time I get the expected results!
>>
>>
>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <op...@gmail.com> wrote:
>>
>>> BTW
>>> The same query, on the same cluster but on Spark shell return the
>>> expected results.
>>>
>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>
>>>> It looks that Zeppelin jar does not distributed to Spark nodes, though
>>>> I can't understand why it needed for the UDF.
>>>>
>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>>
>>>>> Thanks for the response,
>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is
>>>>> different name to z.sqlContext).
>>>>>
>>>>> I get the same results.
>>>>>
>>>>>
>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mi...@nflabs.com> wrote:
>>>>>
>>>>>> Hi Ophir,
>>>>>>
>>>>>> Can you try below?
>>>>>>
>>>>>> def getNum(): Int = {
>>>>>>     100
>>>>>> }
>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>
>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>> and use hiveContext as sqlContext by default.
>>>>>> (If you did not change useHiveContext to be "false" in interpreter
>>>>>> menu.)
>>>>>>
>>>>>> Hope it helps.
>>>>>>
>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <op...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Guys?
>>>>>>> Somebody?
>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>
>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <op...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Guys,
>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>
>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>> HiveContext):
>>>>>>>> 1. Create and register the UDF:
>>>>>>>> def getNum(): Int = {
>>>>>>>>     100
>>>>>>>> }
>>>>>>>>
>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>> 2. And I try to use on exist table:
>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>
>>>>>>>> Or:
>>>>>>>> 3. Trying using direct hc:
>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>
>>>>>>>> Both of them yield with
>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>> (see below the full exception).
>>>>>>>>
>>>>>>>> And my questions is:
>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes?
>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>
>>>>>>>> The exception:
>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>     at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>     at
>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>
>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>
>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>     at
>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>     ... 103 more
>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>     at
>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>     at
>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>     at
>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>     at
>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>     ... 105 more
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: UDFs in Zeppelin??

Posted by Ophir Cohen <op...@gmail.com>.

Will do so soon.
10x

On Thu, Jul 2, 2015 at 2:39 PM, IT CTO <go...@gmail.com> wrote:

> I think you should add these notes to the JIRA note as it is not clear
> from the note itself. (sorry that this is not helping solving the problem
> itself :-))
>
> On Thu, Jul 2, 2015 at 2:06 PM Ophir Cohen <op...@gmail.com> wrote:
>
>> It does not happen in local mode.
>> Actually whenever it works in the same process it works great.
>> It looks that somehow Zeppelin jar does not distributed into the nodes.
>> Still, it strange as register UDF and the UDF itslef does not need
>> ZeppelinContext (at least not explicitly).
>>
>> And yes, filterdNc is a local table, I just use it to enable me call the
>> UDF. you can try that on any table.
>>
>> On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <go...@gmail.com> wrote:
>>
>>> Does this happen on a local mode as well or just on external cluster?
>>> with regard to the repro - %sql select getNum() from filteredNc limit 1
>>> I guess, filterdNc is some table you have? cause when I tried it on my
>>> local machine I got :
>>> no such table filteredNc; line 1 pos 21
>>> Eran
>>>
>>> On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <op...@gmail.com> wrote:
>>>
>>>> Thank you Moon.
>>>> Here is the link:
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-150
>>>>
>>>> Please let me know how can I help further more.
>>>>
>>>> On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> Really appreciate for sharing the problem.
>>>>> Very interesting. Do you mind file a issue on JIRA?
>>>>>
>>>>> Best,
>>>>> moon
>>>>>
>>>>> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <op...@gmail.com> wrote:
>>>>>
>>>>>> BTW, this isn't working as well:
>>>>>>
>>>>>>
>>>>>>
>>>>>> *val sidNameDF = hc.sql("select sid, name from hive_table limit
>>>>>> 10")val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
>>>>>> sidNameDF2.registerTempTable("tmp_sid_name2")*
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <op...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I've made some progress in this issue and I think it's a bug...
>>>>>>>
>>>>>>> Apparently, when trying to use registered UDFs on tables that comes
>>>>>>> from Hive - it returns the above exception (*ClassNotFoundException:
>>>>>>> org.apache.zeppelin.spark.ZeppelinContext*).
>>>>>>> When create new table and register it - UDFs works as expected.
>>>>>>> You can see below to full details and example.
>>>>>>>
>>>>>>> Can someone tell if it's the expected behavior or a bug?
>>>>>>> BTW
>>>>>>> I don't mind to work on that bug - if you can give a pointer to the
>>>>>>> right places.
>>>>>>>
>>>>>>> BTW2
>>>>>>> Trying to register the SAME DataFrame as tempTable does not solve
>>>>>>> the problem - only creating new table out of new DataFrame (see below).
>>>>>>>
>>>>>>>
>>>>>>> *Detailed example*
>>>>>>> 1. I have table in Hive called '*hive_table*' with string field
>>>>>>> called *'name'* and int filed called *'sid'*
>>>>>>>
>>>>>>> 2. I registered a udf:
>>>>>>> *def getStr(str: String) = str + "_str"*
>>>>>>> *hc.udf.register("getStr", getStr _)*
>>>>>>>
>>>>>>> 3. Running the following on Zeppelin:
>>>>>>> *%sql select getStr(name), * from** hive_table*
>>>>>>> yields with excpetion:
>>>>>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>>>>>>
>>>>>>> 4. Creating new table, as follows:
>>>>>>> *case class SidName(sid: Int, name: String)*
>>>>>>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>>>>>>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>>>>>>> row.getString(1)))*
>>>>>>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>>>>>>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>>>>>>
>>>>>>> 5. Query the new table in the same fashion:
>>>>>>> *%sql select getStr(name), * from tmp_sid_name*
>>>>>>>
>>>>>>> This time I get the expected results!
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <op...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> BTW
>>>>>>>> The same query, on the same cluster but on Spark shell return the
>>>>>>>> expected results.
>>>>>>>>
>>>>>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <op...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It looks that Zeppelin jar does not distributed to Spark nodes,
>>>>>>>>> though I can't understand why it needed for the UDF.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <op...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the response,
>>>>>>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that
>>>>>>>>>> is different name to z.sqlContext).
>>>>>>>>>>
>>>>>>>>>> I get the same results.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mi...@nflabs.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ophir,
>>>>>>>>>>>
>>>>>>>>>>> Can you try below?
>>>>>>>>>>>
>>>>>>>>>>> def getNum(): Int = {
>>>>>>>>>>>     100
>>>>>>>>>>> }
>>>>>>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>>>>>>
>>>>>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>>>>>>> and use hiveContext as sqlContext by default.
>>>>>>>>>>> (If you did not change useHiveContext to be "false" in
>>>>>>>>>>> interpreter menu.)
>>>>>>>>>>>
>>>>>>>>>>> Hope it helps.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <op...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Guys?
>>>>>>>>>>>> Somebody?
>>>>>>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <ophchu@gmail.com
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Guys,
>>>>>>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>>>>>>> HiveContext):
>>>>>>>>>>>>> 1. Create and register the UDF:
>>>>>>>>>>>>> def getNum(): Int = {
>>>>>>>>>>>>>     100
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>>>>>>> 2. And I try to use on exist table:
>>>>>>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or:
>>>>>>>>>>>>> 3. Trying using direct hc:
>>>>>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>>>>>>
>>>>>>>>>>>>> Both of them yield with
>>>>>>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>>>>>>> (see below the full exception).
>>>>>>>>>>>>>
>>>>>>>>>>>>> And my questions is:
>>>>>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark
>>>>>>>>>>>>> nodes?
>>>>>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The exception:
>>>>>>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>>>>>>
>>>>>>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>>>>     ... 103 more
>>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>>>>>>     ... 105 more
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Re: UDFs in Zeppelin??

Posted by IT CTO <go...@gmail.com>.

I think you should add these notes to the JIRA note as it is not clear from
the note itself. (sorry that this is not helping solving the problem itself
:-))

On Thu, Jul 2, 2015 at 2:06 PM Ophir Cohen <op...@gmail.com> wrote:

> It does not happen in local mode.
> Actually whenever it works in the same process it works great.
> It looks that somehow Zeppelin jar does not distributed into the nodes.
> Still, it strange as register UDF and the UDF itslef does not need
> ZeppelinContext (at least not explicitly).
>
> And yes, filterdNc is a local table, I just use it to enable me call the
> UDF. you can try that on any table.
>
> On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <go...@gmail.com> wrote:
>
>> Does this happen on a local mode as well or just on external cluster?
>> with regard to the repro - %sql select getNum() from filteredNc limit 1
>> I guess, filterdNc is some table you have? cause when I tried it on my
>> local machine I got :
>> no such table filteredNc; line 1 pos 21
>> Eran
>>
>> On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <op...@gmail.com> wrote:
>>
>>> Thank you Moon.
>>> Here is the link:
>>> https://issues.apache.org/jira/browse/ZEPPELIN-150
>>>
>>> Please let me know how can I help further more.
>>>
>>> On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> Really appreciate for sharing the problem.
>>>> Very interesting. Do you mind file a issue on JIRA?
>>>>
>>>> Best,
>>>> moon
>>>>
>>>> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <op...@gmail.com> wrote:
>>>>
>>>>> BTW, this isn't working as well:
>>>>>
>>>>>
>>>>>
>>>>> *val sidNameDF = hc.sql("select sid, name from hive_table limit
>>>>> 10")val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
>>>>> sidNameDF2.registerTempTable("tmp_sid_name2")*
>>>>>
>>>>>
>>>>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>>>
>>>>>> I've made some progress in this issue and I think it's a bug...
>>>>>>
>>>>>> Apparently, when trying to use registered UDFs on tables that comes
>>>>>> from Hive - it returns the above exception (*ClassNotFoundException:
>>>>>> org.apache.zeppelin.spark.ZeppelinContext*).
>>>>>> When create new table and register it - UDFs works as expected.
>>>>>> You can see below to full details and example.
>>>>>>
>>>>>> Can someone tell if it's the expected behavior or a bug?
>>>>>> BTW
>>>>>> I don't mind to work on that bug - if you can give a pointer to the
>>>>>> right places.
>>>>>>
>>>>>> BTW2
>>>>>> Trying to register the SAME DataFrame as tempTable does not solve the
>>>>>> problem - only creating new table out of new DataFrame (see below).
>>>>>>
>>>>>>
>>>>>> *Detailed example*
>>>>>> 1. I have table in Hive called '*hive_table*' with string field
>>>>>> called *'name'* and int filed called *'sid'*
>>>>>>
>>>>>> 2. I registered a udf:
>>>>>> *def getStr(str: String) = str + "_str"*
>>>>>> *hc.udf.register("getStr", getStr _)*
>>>>>>
>>>>>> 3. Running the following on Zeppelin:
>>>>>> *%sql select getStr(name), * from** hive_table*
>>>>>> yields with excpetion:
>>>>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>>>>>
>>>>>> 4. Creating new table, as follows:
>>>>>> *case class SidName(sid: Int, name: String)*
>>>>>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>>>>>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>>>>>> row.getString(1)))*
>>>>>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>>>>>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>>>>>
>>>>>> 5. Query the new table in the same fashion:
>>>>>> *%sql select getStr(name), * from tmp_sid_name*
>>>>>>
>>>>>> This time I get the expected results!
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <op...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> BTW
>>>>>>> The same query, on the same cluster but on Spark shell return the
>>>>>>> expected results.
>>>>>>>
>>>>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <op...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It looks that Zeppelin jar does not distributed to Spark nodes,
>>>>>>>> though I can't understand why it needed for the UDF.
>>>>>>>>
>>>>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <op...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the response,
>>>>>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is
>>>>>>>>> different name to z.sqlContext).
>>>>>>>>>
>>>>>>>>> I get the same results.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mi...@nflabs.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ophir,
>>>>>>>>>>
>>>>>>>>>> Can you try below?
>>>>>>>>>>
>>>>>>>>>> def getNum(): Int = {
>>>>>>>>>>     100
>>>>>>>>>> }
>>>>>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>>>>>
>>>>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>>>>>> and use hiveContext as sqlContext by default.
>>>>>>>>>> (If you did not change useHiveContext to be "false" in
>>>>>>>>>> interpreter menu.)
>>>>>>>>>>
>>>>>>>>>> Hope it helps.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <op...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Guys?
>>>>>>>>>>> Somebody?
>>>>>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <op...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Guys,
>>>>>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>>>>>
>>>>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>>>>>> HiveContext):
>>>>>>>>>>>> 1. Create and register the UDF:
>>>>>>>>>>>> def getNum(): Int = {
>>>>>>>>>>>>     100
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>>>>>> 2. And I try to use on exist table:
>>>>>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>>>>>
>>>>>>>>>>>> Or:
>>>>>>>>>>>> 3. Trying using direct hc:
>>>>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>>>>>
>>>>>>>>>>>> Both of them yield with
>>>>>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>>>>>> (see below the full exception).
>>>>>>>>>>>>
>>>>>>>>>>>> And my questions is:
>>>>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark
>>>>>>>>>>>> nodes?
>>>>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>>>>>
>>>>>>>>>>>> The exception:
>>>>>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>>>>>     at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>>>>>     at
>>>>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>>>>>
>>>>>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>>>>>
>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>>>     at
>>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>>>     ... 103 more
>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>>>>>     at
>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>>>     at
>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>>>     at
>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>>>>>     at
>>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>>>>>     ... 105 more
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>

Re: UDFs in Zeppelin??

Posted by Ophir Cohen <op...@gmail.com>.

It does not happen in local mode.
Actually whenever it works in the same process it works great.
It looks that somehow Zeppelin jar does not distributed into the nodes.
Still, it strange as register UDF and the UDF itslef does not need
ZeppelinContext (at least not explicitly).

And yes, filterdNc is a local table, I just use it to enable me call the
UDF. you can try that on any table.

On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <go...@gmail.com> wrote:

> Does this happen on a local mode as well or just on external cluster?
> with regard to the repro - %sql select getNum() from filteredNc limit 1
> I guess, filterdNc is some table you have? cause when I tried it on my
> local machine I got :
> no such table filteredNc; line 1 pos 21
> Eran
>
> On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <op...@gmail.com> wrote:
>
>> Thank you Moon.
>> Here is the link:
>> https://issues.apache.org/jira/browse/ZEPPELIN-150
>>
>> Please let me know how can I help further more.
>>
>> On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Really appreciate for sharing the problem.
>>> Very interesting. Do you mind file a issue on JIRA?
>>>
>>> Best,
>>> moon
>>>
>>> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <op...@gmail.com> wrote:
>>>
>>>> BTW, this isn't working as well:
>>>>
>>>>
>>>>
>>>> *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val
>>>> sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
>>>> sidNameDF2.registerTempTable("tmp_sid_name2")*
>>>>
>>>>
>>>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>>
>>>>> I've made some progress in this issue and I think it's a bug...
>>>>>
>>>>> Apparently, when trying to use registered UDFs on tables that comes
>>>>> from Hive - it returns the above exception (*ClassNotFoundException:
>>>>> org.apache.zeppelin.spark.ZeppelinContext*).
>>>>> When create new table and register it - UDFs works as expected.
>>>>> You can see below to full details and example.
>>>>>
>>>>> Can someone tell if it's the expected behavior or a bug?
>>>>> BTW
>>>>> I don't mind to work on that bug - if you can give a pointer to the
>>>>> right places.
>>>>>
>>>>> BTW2
>>>>> Trying to register the SAME DataFrame as tempTable does not solve the
>>>>> problem - only creating new table out of new DataFrame (see below).
>>>>>
>>>>>
>>>>> *Detailed example*
>>>>> 1. I have table in Hive called '*hive_table*' with string field
>>>>> called *'name'* and int filed called *'sid'*
>>>>>
>>>>> 2. I registered a udf:
>>>>> *def getStr(str: String) = str + "_str"*
>>>>> *hc.udf.register("getStr", getStr _)*
>>>>>
>>>>> 3. Running the following on Zeppelin:
>>>>> *%sql select getStr(name), * from** hive_table*
>>>>> yields with excpetion:
>>>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>>>>
>>>>> 4. Creating new table, as follows:
>>>>> *case class SidName(sid: Int, name: String)*
>>>>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>>>>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>>>>> row.getString(1)))*
>>>>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>>>>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>>>>
>>>>> 5. Query the new table in the same fashion:
>>>>> *%sql select getStr(name), * from tmp_sid_name*
>>>>>
>>>>> This time I get the expected results!
>>>>>
>>>>>
>>>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>>>
>>>>>> BTW
>>>>>> The same query, on the same cluster but on Spark shell return the
>>>>>> expected results.
>>>>>>
>>>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <op...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> It looks that Zeppelin jar does not distributed to Spark nodes,
>>>>>>> though I can't understand why it needed for the UDF.
>>>>>>>
>>>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <op...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the response,
>>>>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is
>>>>>>>> different name to z.sqlContext).
>>>>>>>>
>>>>>>>> I get the same results.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mi...@nflabs.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Ophir,
>>>>>>>>>
>>>>>>>>> Can you try below?
>>>>>>>>>
>>>>>>>>> def getNum(): Int = {
>>>>>>>>>     100
>>>>>>>>> }
>>>>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>>>>
>>>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>>>>> and use hiveContext as sqlContext by default.
>>>>>>>>> (If you did not change useHiveContext to be "false" in interpreter
>>>>>>>>> menu.)
>>>>>>>>>
>>>>>>>>> Hope it helps.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <op...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Guys?
>>>>>>>>>> Somebody?
>>>>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>>>>
>>>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <op...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Guys,
>>>>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>>>>
>>>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>>>>> HiveContext):
>>>>>>>>>>> 1. Create and register the UDF:
>>>>>>>>>>> def getNum(): Int = {
>>>>>>>>>>>     100
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>>>>> 2. And I try to use on exist table:
>>>>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>>>>
>>>>>>>>>>> Or:
>>>>>>>>>>> 3. Trying using direct hc:
>>>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>>>>
>>>>>>>>>>> Both of them yield with
>>>>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>>>>> (see below the full exception).
>>>>>>>>>>>
>>>>>>>>>>> And my questions is:
>>>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark
>>>>>>>>>>> nodes?
>>>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>>>>
>>>>>>>>>>> The exception:
>>>>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>>>>     at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>>>>     at
>>>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>>>>
>>>>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>>>>
>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>>     ... 103 more
>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>>>>     ... 105 more
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: UDFs in Zeppelin??

Posted by IT CTO <go...@gmail.com>.

Does this happen on a local mode as well or just on external cluster?
with regard to the repro - %sql select getNum() from filteredNc limit 1
I guess, filterdNc is some table you have? cause when I tried it on my
local machine I got :
no such table filteredNc; line 1 pos 21
Eran

On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <op...@gmail.com> wrote:

> Thank you Moon.
> Here is the link:
> https://issues.apache.org/jira/browse/ZEPPELIN-150
>
> Please let me know how can I help further more.
>
> On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <mo...@apache.org> wrote:
>
>> Really appreciate for sharing the problem.
>> Very interesting. Do you mind file a issue on JIRA?
>>
>> Best,
>> moon
>>
>> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <op...@gmail.com> wrote:
>>
>>> BTW, this isn't working as well:
>>>
>>>
>>>
>>> *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val
>>> sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
>>> sidNameDF2.registerTempTable("tmp_sid_name2")*
>>>
>>>
>>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>
>>>> I've made some progress in this issue and I think it's a bug...
>>>>
>>>> Apparently, when trying to use registered UDFs on tables that comes
>>>> from Hive - it returns the above exception (*ClassNotFoundException:
>>>> org.apache.zeppelin.spark.ZeppelinContext*).
>>>> When create new table and register it - UDFs works as expected.
>>>> You can see below to full details and example.
>>>>
>>>> Can someone tell if it's the expected behavior or a bug?
>>>> BTW
>>>> I don't mind to work on that bug - if you can give a pointer to the
>>>> right places.
>>>>
>>>> BTW2
>>>> Trying to register the SAME DataFrame as tempTable does not solve the
>>>> problem - only creating new table out of new DataFrame (see below).
>>>>
>>>>
>>>> *Detailed example*
>>>> 1. I have table in Hive called '*hive_table*' with string field called
>>>> *'name'* and int filed called *'sid'*
>>>>
>>>> 2. I registered a udf:
>>>> *def getStr(str: String) = str + "_str"*
>>>> *hc.udf.register("getStr", getStr _)*
>>>>
>>>> 3. Running the following on Zeppelin:
>>>> *%sql select getStr(name), * from** hive_table*
>>>> yields with excpetion:
>>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>>>
>>>> 4. Creating new table, as follows:
>>>> *case class SidName(sid: Int, name: String)*
>>>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>>>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>>>> row.getString(1)))*
>>>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>>>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>>>
>>>> 5. Query the new table in the same fashion:
>>>> *%sql select getStr(name), * from tmp_sid_name*
>>>>
>>>> This time I get the expected results!
>>>>
>>>>
>>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>>
>>>>> BTW
>>>>> The same query, on the same cluster but on Spark shell return the
>>>>> expected results.
>>>>>
>>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>>>
>>>>>> It looks that Zeppelin jar does not distributed to Spark nodes,
>>>>>> though I can't understand why it needed for the UDF.
>>>>>>
>>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <op...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the response,
>>>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is
>>>>>>> different name to z.sqlContext).
>>>>>>>
>>>>>>> I get the same results.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mi...@nflabs.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ophir,
>>>>>>>>
>>>>>>>> Can you try below?
>>>>>>>>
>>>>>>>> def getNum(): Int = {
>>>>>>>>     100
>>>>>>>> }
>>>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>>>
>>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>>>> and use hiveContext as sqlContext by default.
>>>>>>>> (If you did not change useHiveContext to be "false" in interpreter
>>>>>>>> menu.)
>>>>>>>>
>>>>>>>> Hope it helps.
>>>>>>>>
>>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <op...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Guys?
>>>>>>>>> Somebody?
>>>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>>>
>>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <op...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Guys,
>>>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>>>
>>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>>>> HiveContext):
>>>>>>>>>> 1. Create and register the UDF:
>>>>>>>>>> def getNum(): Int = {
>>>>>>>>>>     100
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>>>> 2. And I try to use on exist table:
>>>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>>>
>>>>>>>>>> Or:
>>>>>>>>>> 3. Trying using direct hc:
>>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>>>
>>>>>>>>>> Both of them yield with
>>>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>>>> (see below the full exception).
>>>>>>>>>>
>>>>>>>>>> And my questions is:
>>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes?
>>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>>>
>>>>>>>>>> The exception:
>>>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>>>     at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>>>     at
>>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>>>
>>>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>>>
>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>     at
>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>     ... 103 more
>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>>>     ... 105 more
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: UDFs in Zeppelin??

Posted by Ophir Cohen <op...@gmail.com>.

Thank you Moon.
Here is the link:
https://issues.apache.org/jira/browse/ZEPPELIN-150

Please let me know how can I help further more.

On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <mo...@apache.org> wrote:

> Really appreciate for sharing the problem.
> Very interesting. Do you mind file a issue on JIRA?
>
> Best,
> moon
>
> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <op...@gmail.com> wrote:
>
>> BTW, this isn't working as well:
>>
>>
>>
>> *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val
>> sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
>> sidNameDF2.registerTempTable("tmp_sid_name2")*
>>
>>
>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <op...@gmail.com> wrote:
>>
>>> I've made some progress in this issue and I think it's a bug...
>>>
>>> Apparently, when trying to use registered UDFs on tables that comes
>>> from Hive - it returns the above exception (*ClassNotFoundException:
>>> org.apache.zeppelin.spark.ZeppelinContext*).
>>> When create new table and register it - UDFs works as expected.
>>> You can see below to full details and example.
>>>
>>> Can someone tell if it's the expected behavior or a bug?
>>> BTW
>>> I don't mind to work on that bug - if you can give a pointer to the
>>> right places.
>>>
>>> BTW2
>>> Trying to register the SAME DataFrame as tempTable does not solve the
>>> problem - only creating new table out of new DataFrame (see below).
>>>
>>>
>>> *Detailed example*
>>> 1. I have table in Hive called '*hive_table*' with string field called
>>> *'name'* and int filed called *'sid'*
>>>
>>> 2. I registered a udf:
>>> *def getStr(str: String) = str + "_str"*
>>> *hc.udf.register("getStr", getStr _)*
>>>
>>> 3. Running the following on Zeppelin:
>>> *%sql select getStr(name), * from** hive_table*
>>> yields with excpetion:
>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>>
>>> 4. Creating new table, as follows:
>>> *case class SidName(sid: Int, name: String)*
>>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>>> row.getString(1)))*
>>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>>
>>> 5. Query the new table in the same fashion:
>>> *%sql select getStr(name), * from tmp_sid_name*
>>>
>>> This time I get the expected results!
>>>
>>>
>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>
>>>> BTW
>>>> The same query, on the same cluster but on Spark shell return the
>>>> expected results.
>>>>
>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>>
>>>>> It looks that Zeppelin jar does not distributed to Spark nodes, though
>>>>> I can't understand why it needed for the UDF.
>>>>>
>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <op...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the response,
>>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is
>>>>>> different name to z.sqlContext).
>>>>>>
>>>>>> I get the same results.
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mi...@nflabs.com> wrote:
>>>>>>
>>>>>>> Hi Ophir,
>>>>>>>
>>>>>>> Can you try below?
>>>>>>>
>>>>>>> def getNum(): Int = {
>>>>>>>     100
>>>>>>> }
>>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>>
>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>>> and use hiveContext as sqlContext by default.
>>>>>>> (If you did not change useHiveContext to be "false" in interpreter
>>>>>>> menu.)
>>>>>>>
>>>>>>> Hope it helps.
>>>>>>>
>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <op...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Guys?
>>>>>>>> Somebody?
>>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>>
>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <op...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Guys,
>>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>>
>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>>> HiveContext):
>>>>>>>>> 1. Create and register the UDF:
>>>>>>>>> def getNum(): Int = {
>>>>>>>>>     100
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>>> 2. And I try to use on exist table:
>>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>>
>>>>>>>>> Or:
>>>>>>>>> 3. Trying using direct hc:
>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>>
>>>>>>>>> Both of them yield with
>>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>>> (see below the full exception).
>>>>>>>>>
>>>>>>>>> And my questions is:
>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes?
>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>>
>>>>>>>>> The exception:
>>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>>     at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>>     at
>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>>
>>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>>
>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>     ... 103 more
>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>>     ... 105 more
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>