You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by tog <gu...@gmail.com> on 2015/10/30 23:59:52 UTC
Re: Access variables instantiated in different interpreter

Hi there

I am moving this discussion to the dev list.

I was able to reproduce up to a certain extent the first video except that
the view is not updated following a modification of the resource (the html
file in my case).

As far as helium is concerned, I would be happy if you could share the
helium.conf file to test your 3 apps located in
https://github.com/zeppelin-project/helium-packages.

I found the possibility to run external application very interesting - and
this is from my perspective the breakthrough proposed here.

As far as the transfer of data across interpreters is concerned (as this
was the initial request), I was wondering if we could have something more
platform independent - Doan suggested thrift.

I believe that this would allow more polyglot applications. Currently
Zeppelin does offer a thrift interface  but to my knowledge only Java
interpreter application have shown up. Has anyone considered implementing a
plain Python or Julia interpreter ?

Cheers
Guillaume


On 29 October 2015 at 20:01, DuyHai Doan <do...@gmail.com> wrote:

> Look like the
> https://github.com/Leemoonsoo/incubator-zeppelin/blob/helium/zeppelin-interpreter/src/main/java/org/apache/zeppelin/resource/ResourcePool.java
> is very similar to my idea :)
>
> On Thu, Oct 29, 2015 at 8:57 PM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Thanks for the pointer on resource pool Moon.
>>
>> My first idea was:
>>
>> 1. create another Zeppelin infrastructure interpreter who will embed an
>> in-memory DB (Apache xxx)
>> 2. extends the Interpreter base class to create a connection pool and
>> offer base methods like put(String key, Object value) and get(String key).
>> 3. communication between each interpreter and the in-memory DB will be
>> done using Thrift
>> 4. optionally the in-memory DB can persist data on local storage (or
>> implementable extension) for durability
>>
>> I don't know how resource pool is implemented, I will throw an eye to it
>>
>> On Thu, Oct 29, 2015 at 8:40 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Yes it is.
>>>
>>> ResourcePool class in PoC implements sharing.
>>> But ofcourse we can discuss better design and implementation.
>>>
>>> Thanks,
>>> moon
>>>
>>> On 2015년 10월 29일 (목) at 오후 7:06 tog <gu...@gmail.com> wrote:
>>>
>>>> Moon
>>>>
>>>> Is  the POC described in the wiki already implementing that sharing?
>>>>
>>>> Cheers
>>>> Guillaume
>>>> On Oct 29, 2015 5:26 PM, "moon soo Lee" <le...@gmail.com> wrote:
>>>>
>>>>> Actually, Helium proposal (
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal)
>>>>> includes one "Resource pool" . Which is basically distributed map across
>>>>> interpreter processes. Please take a look if you're interested.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>> On 2015년 10월 29일 (목) at 오후 6:17 tog <gu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Doan,
>>>>>> What if not all interpreters will share the same memory/machine? I
>>>>>> guess we should propose something that should work also in the case of
>>>>>> remote interpreters.
>>>>>> But may be this is already something you have in mind.
>>>>>>
>>>>>> Cheers
>>>>>> Guillaume
>>>>>>
>>>>>> On 29 October 2015 at 16:58, DuyHai Doan <do...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am thinking about a new idea to propose a common in-memory shared
>>>>>>> storage so that all interpreters can pass around variables. I'll create a
>>>>>>> JIRA soon to submit the idea and the architecture
>>>>>>>
>>>>>>> On Thu, Oct 29, 2015 at 5:40 PM, moon soo Lee <mo...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> If your custom interpreter is in the same interpreter group
>>>>>>>> 'spark', you can exchange data between SparkInterpreter and your custom
>>>>>>>> interpreter. (because of interpreters in the same group runs in the same
>>>>>>>> process)
>>>>>>>>
>>>>>>>> But if your custom interpreter is in the different interpreter
>>>>>>>> group, then only way at the moment is persist data from SparkInterpreter
>>>>>>>> and read data in your custom interpreter.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> moon
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Oct 29, 2015 at 11:07 AM Miyuru Dayarathna <
>>>>>>>> miyurud@yahoo.co.uk> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am trying to access the Spark data frame defined in the Zeppelin
>>>>>>>>> Tutorial notebook from a separate paragraph using a custom written Zeppelin
>>>>>>>>> Interpreter. To make it more clear given below is the code snippet from
>>>>>>>>> "Load data into table" paragraph of the Zeppelin Tutorial notebook. When
>>>>>>>>> this is run the data frame called "bank" gets initialized in a Spark
>>>>>>>>> Interpreter process. I want to use the bank data frame from my custom
>>>>>>>>> Zeppelin interpreter. Can you please let me know how to do this? Is there a
>>>>>>>>> Zeppelin API which provides me the access to such variables running in a
>>>>>>>>> different Interpreter than where they were instantiated?
>>>>>>>>>
>>>>>>>>> //----------------------------------
>>>>>>>>>
>>>>>>>>> val bank = bankText.map(s => s.split(";")).filter(s => s(0) !=
>>>>>>>>> "\"age\"").map(
>>>>>>>>>     s => Bank(s(0).toInt,
>>>>>>>>>             s(1).replaceAll("\"", ""),
>>>>>>>>>             s(2).replaceAll("\"", ""),
>>>>>>>>>             s(3).replaceAll("\"", ""),
>>>>>>>>>             s(5).replaceAll("\"", "").toInt
>>>>>>>>>         )
>>>>>>>>> ).toDF()
>>>>>>>>>
>>>>>>>>> bank.registerTempTable("bank")
>>>>>>>>>
>>>>>>>>> //----------------------------------
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Miyuru
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net
>>>>>>
>>>>>
>>
>


-- 
PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net