You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Vasiliy Morkovkin <mo...@phystech.edu> on 2019/03/04 23:05:08 UTC

Zeppelin in GSOC 2019

Hi everyone, I'm pursuing bachelor degree at Moscow institute of physics 
and technology and eager to contribute to Zeppelin in context of GSOC 
2019. I've become a real fan of Zeppelin over the past couple of months, 
using it at my job. But I have found out only one ticket (front-end 
task) with label of GSOC 2019 on your Jira. Perhaps you may have any 
ideas for new features or improvements in Zeppelin, but you don't have 
enough hands on them. It would be wonderful if anyone agreed to mentor 
these ideas within GSOC :)
Currently I am in a position of Scala developer (back-end) for 1.5 year. 
I also can write in Java or Python without any problems if necessary. 
Really fond of databases and highload. Also I have experience with some 
other great Apache projects like Cassandra, Kafka and Spark.

Best regards, Basil Morkovkin.


Re: Zeppelin in GSOC 2019

Posted by Xun Liu <ne...@163.com>.
Hi Felix Cheung

Thank you for your Suggest.


> 在 2019年3月11日,上午5:47,Felix Cheung <fe...@hotmail.com> 写道:
> 
> Hi Xun,
> 
> Thanks for your work - could you change the title of the email, I think you will get more attention to your ask to review the design.
> 
> 
> ________________________________
> From: Xun Liu <ne...@163.com>
> Sent: Sunday, March 10, 2019 12:03 AM
> To: Jongyoul Lee; moon@apache.org; Jeff Zhang; Vasiliy Morkovkin
> Cc: dev@zeppelin.apache.org
> Subject: Re: Zeppelin in GSOC 2019
> 
> Hello, everyone,
> 
> I have completed the zeppelin workflow system design, please review, you can directly modify the document or fill in the comments.
> 
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc: https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit# <https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#>
> 
> :-)
> 
>> 在 2019年3月8日,下午2:10,Jeff Zhang <zj...@gmail.com> 写道:
>> 
>> Hi Liu,
>> 
>> See this link https://community.apache.org/gsoc.html
>> 
>> 
>> Xun Liu <ne...@163.com> 于2019年3月8日周五 下午1:58写道:
>> 
>>> Hi, Jongyoul Lee, Морковкин
>>> 
>>> I queried the information about GSOS. Is it still necessary to apply for
>>> the zeppelin community first?
>>> I don't know much about GSOS. In addition to helping the project, the
>>> mentor
>>> What other work needs to be done?
>>> 
>>>> 在 2019年3月8日,上午10:01,Xun Liu <ne...@163.com> 写道:
>>>> 
>>>> Hi, Морковкин
>>>> 
>>>> I am very happy to be your mentor for GSOC. :-)
>>>> I believe that by completing this work, I can also learn a lot.
>>>> 
>>>> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>>> 
>>>>> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <
>>> morkovkin.vv@phystech.edu> 写道:
>>>>> 
>>>>> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala.
>>> It makes it easy to impose dependencies on the execution order of tasks.
>>> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <
>>> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces
>>> the flow which is shown in the attached picture.
>>>>> Xun Liu, It would be great to clarify whether you agree to be a mentor
>>> exactly within GSOC, or without it? :)
>>>>> 
>>>>> ----------------------------------------
>>>>> Best regards, Basil Morkovkin
>>>>> 
>>>>> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjffdu@gmail.com <mailto:
>>> zjffdu@gmail.com>>:
>>>>> 
>>>>> Thanks Liu for taking over this, I will help review the design.
>>>>> 
>>>>> Xun Liu <neliuxun@163.com <ma...@163.com>> 于2019年3月7日周四
>>> 下午4:05写道:
>>>>> Hi Vasiliy Morkovkin
>>>>> 
>>>>> Thank you very much for your willingness to implement this feature of
>>> workflow.
>>>>> I will work with you with the highest priority.
>>>>> I am planning to update the system design documentation for workflow
>>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
>>>>> Please set the Watcher in ZEPPELIN-4018.
>>>>> This way you can get notification messages for document updates in a
>>> timely manner.
>>>>> 
>>>>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
>>>>> If you need it, you can email me at liuxun323@gmail.com <mailto:
>>> liuxun323@gmail.com> <mailto:liuxun323@gmail.com <mailto:
>>> liuxun323@gmail.com>> , I will reply you the fastest.
>>>>> Do you think this kind of cooperation is OK?
>>>>> 
>>>>> 
>>>>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
>>> system design. Thanks!
>>>>> 
>>>>> :-)
>>>>> 
>>>>>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
>>> morkovkin.vv@phystech.edu <ma...@phystech.edu>> 写道:
>>>>>> 
>>>>>> Thank you for such a detailed feedback!
>>>>>> I am definitely interested to work on the workflow implementation with
>>> you Xun Liu! Could you become a mentor in GSOC with this task?
>>>>>> Some front-end work is not a problem at all.
>>>>>> I'm ready to work at least 30 hours per week in the summer, while now
>>> I'd like to take some smaller tasks to take a closer look at existing
>>> codebase and to get familiar with your development workflow. Do you have
>>> such tasks on mind?
>>>>>> 
>>>>>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <mailto:
>>> neliuxun@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>:
>>>>>> Hi Vasiliy Morkovkin
>>>>>> 
>>>>>> I said my thoughts on workflow,
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>> 
>>>>>> Because there are more than 20 interpreters in zeppelin,
>>>>>> Data analysts can be used to do a variety of data development,
>>>>>> A lot of data development is interdependent. For example,
>>>>>> the development of machine learning algorithms requires relying on
>>> spark to preprocess data, and so on.
>>>>>> 
>>>>>> Now open source workflow software has Azkaban, airflow,
>>>>>> Azkaban is relatively simple and has been used to meet most scenarios,
>>> and our company is using it.
>>>>>> Airflow looks complicated and I have not used it.
>>>>>> In fact, I have previously implemented workflow workflow for notes and
>>> paragraphs in zeppelin via azkaban.
>>>>>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
>>> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>>
>>>>>> 
>>>>>> However, I think zeppelin should have built-in workflow capabilities.
>>>>>> Instead of relying on external software to schedule notes in zeppelin
>>> for the following reasons:
>>>>>> 1. Now that we have upgraded from the data processing era to the
>>> algorithm era,
>>>>>> After zeppelin has its own workflow, it will form a data loop.
>>>>>> 
>>>>>> 2. zeppelin's powerful interactive processing capabilities help
>>> algorithm engineers improve productivity and work.
>>>>>> Zeppelin should give the algorithm engineer more direct control.
>>>>>> Instead of handing the algorithm to other teams(or software) to do the
>>> workflow.
>>>>>> 
>>>>>> 3. zeppelin knows more about the processing status of data than
>>> Azkaban and airflow.
>>>>>> So the built-in workflow will have better performance, user experience
>>> and control.
>>>>>> 
>>>>>> If you are interested in workflow(ZEPPELIN-4018),
>>>>>> I am willing to work with you to complete all system design and code
>>> development work.
>>>>>> 
>>>>>> :-)
>>>>>> 
>>>>>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <mailto:
>>> zjffdu@gmail.com> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil,
>>>>>>> 
>>>>>>> Thanks for your interest in zeppelin, here's my comments about the
>>> tickets
>>>>>>> you interested.
>>>>>>> 
>>>>>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>>
>>>>>>> This involves 2 sides of work: frontend and backend:
>>>>>>> In frontend, we should use arrow js to handle the table data,
>>> include
>>>>>>> display it and processing it (such as aggregation)
>>>>>>> In backend, we should use arrow for each language, and allow them to
>>>>>>> exchange data in the same process. And use arrow IPC to exchange data
>>>>>>> across processes.
>>>>>>> Overall, this is a pretty large task. If you really want to do, I
>>> would
>>>>>>> suggest you to just take part of it.
>>>>>>> 
>>>>>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>>
>>>>>>> Regarding model serving, I don't have clear picture about this.
>>> Others
>>>>>>> can comment on this.
>>>>>>> 
>>>>>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>>> Job scheduling is pretty important for zeppelin, I would make this
>>> as
>>>>>>> the highest priority for zeppelin among these tickets. airflow is one
>>>>>>> option, but I am open to other solutions. First we need to figure out
>>> how
>>>>>>> user schedule jobs in zeppelin, then choose the right framework. It
>>> would
>>>>>>> also involves some frontend work
>>>>>>> 
>>>>>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>
>>>>>>> Spark 2.4.0 supporting is already there, but scala 2.12 is not
>>>>>>> supported yet. It won't be a big project for GSOC IMO.
>>>>>>> 
>>>>>>> 5. OLAP.
>>>>>>> Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>>>>>>> Zeppelin can support it very well. But we could create specific
>>> interpreter
>>>>>>> for OLAP engine if their native api perform better than jdbc. Another
>>> thing
>>>>>>> I can think of improving OLAP is visualization, although Zeppelin
>>> already
>>>>>>> support some built-in visualization, there's still some visualization
>>>>>>> missing. We could provide more.
>>>>>>> 
>>>>>>> 6. Auto-completions.
>>>>>>> We have already support ipython[1] in zeppelin which provide almost
>>> the
>>>>>>> same auto-completion like jupyter. But it lacks for accessing python
>>> api
>>>>>>> doc. This is also pretty important for python users IMO. SQL is
>>> another
>>>>>>> popular language in Zeppelin, but it also doesn't provide good
>>>>>>> code-completion experience, we can do better as well.
>>>>>>> 
>>>>>>> 7. Notifications.
>>>>>>> I think notification can be integrated into job scheduling.
>>> Notification
>>>>>>> can be sent when job is failed/succeed.
>>>>>>> 
>>>>>>> 
>>>>>>> Let us know which jira you are more interested, and also please
>>> consider
>>>>>>> how much time you can spent on this. Again, we are very appreciated
>>> your
>>>>>>> interest on zeppelin and look forward your contribution.
>>>>>>> 
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>> <
>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>
>>> <
>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>> <
>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <mailto:
>>> morkovkin.vv@phystech.edu> <mailto:morkovkin.vv@phystech.edu <mailto:
>>> morkovkin.vv@phystech.edu>>> 于2019年3月6日周三
>>>>>>> 上午7:41写道:
>>>>>>> 
>>>>>>>> Thank you for your replies! I've checked existing set of issues and
>>> found
>>>>>>>> several curious ones:
>>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very
>>>>>>>> nice
>>>>>>>> way to increase analytical processing performance using Arrow
>>> project;
>>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models
>>>>>>>> regardless of ZeppelinServer sounds quite intriguing too. Although
>>> there is
>>>>>>>> much to think about;
>>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance
>>>>>>>> https://airflow.apache.org/ <https://airflow.apache.org/> <
>>> https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be
>>> useful in implementing complex
>>>>>>>> execution workflows.
>>>>>>>> Those tasks are global and intriguing, requiring complex
>>> architectural
>>>>>>>> solutions.
>>>>>>>> Also I've probably found the ticket which is suitable for me to get
>>>>>>>> involved into the project:
>>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think?
>>>>>>>> Are there any "low hanging fruits"?
>>>>>>>> 
>>>>>>>> And I have several ideas on my own. Some of them might be not
>>> relevant due
>>>>>>>> to the vision of the project or other reasons. Just ideas:
>>>>>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be
>>> quite
>>>>>>>> logical to add more integrations with existing OLAP solutions like
>>> Pinot,
>>>>>>>> ClickHouse and Druid. Currently I've found integration only with
>>> Kylin;
>>>>>>>> - Better autocompletion. Jupyter offers not only a list of already
>>>>>>>> initialized variables, but also quick access to documentation. It's
>>>>>>>> convenient;
>>>>>>>> - Notifications. Some colleagues would have appreciated the
>>> notifications
>>>>>>>> service, which sends you messages (via mail, Slack bot or something
>>> else)
>>>>>>>> indicating that your long-running paragraphs has completed.
>>>>>>>> 
>>>>>>>> Feedback is very appreciated :)
>>>>>>>> 
>>>>>>>> It would be wonderful if someone agreed to sacrifice his time and
>>> become a
>>>>>>>> mentor in GSOC program!
>>>>>>>> 
>>>>>>>> ----------------------------------------
>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com
>>> <ma...@gmail.com> <mailto:jongyoul@gmail.com <mailto:
>>> jongyoul@gmail.com>>>:
>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> I've confirmed I could add more issues for GSOC. Can you explain
>>> what you
>>>>>>>>> would like to contribute to? I can add more issues
>>>>>>>>> 
>>>>>>>>> JL
>>>>>>>>> 
>>>>>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <mailto:
>>> neliuxun@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>
>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi, Vasiliy Morkovkin
>>>>>>>>>> 
>>>>>>>>>> Welcome to the zeppelin community! :-)
>>>>>>>>>> 
>>>>>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <mailto:
>>> jongyoul@gmail.com> <mailto:jongyoul@gmail.com <ma...@gmail.com>>>
>>> 写道:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for contacting Zeppelin with your interest.
>>>>>>>>>>> 
>>>>>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I
>>> have
>>>>>>>>>>> thought about. We always encourage to contribute Zeppelin with
>>> several
>>>>>>>>>>> topics including your idea.
>>>>>>>>>>> 
>>>>>>>>>>> Please describe something more.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>> JL
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org
>>> <ma...@apache.org> <mailto:moon@apache.org <ma...@apache.org>>>
>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> Great to see your interest to project. Thanks!
>>>>>>>>>>>> Looks like we need volunteers for a mentor and some backend
>>> subject
>>>>>>>> for
>>>>>>>>>>>> GSoC2019.
>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> moon
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>>>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu>
>>> <mailto:morkovkin.vv@phystech.edu <ma...@phystech.edu>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>>>>>>>> physics
>>>>>>>>>>>>> and technology and eager to contribute to Zeppelin in context of
>>>>>>>> GSOC
>>>>>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>>>>>>>> months,
>>>>>>>>>>>>> using it at my job. But I have found out only one ticket
>>> (front-end
>>>>>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may
>>> have any
>>>>>>>>>>>>> ideas for new features or improvements in Zeppelin, but you
>>> don't
>>>>>>>> have
>>>>>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>>>>>>>> mentor
>>>>>>>>>>>>> these ideas within GSOC :)
>>>>>>>>>>>>> Currently I am in a position of Scala developer (back-end) for
>>> 1.5
>>>>>>>>>> year.
>>>>>>>>>>>>> I also can write in Java or Python without any problems if
>>>>>>>> necessary.
>>>>>>>>>>>>> Really fond of databases and highload. Also I have experience
>>> with
>>>>>>>>>> some
>>>>>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>>> http://madeng.net/>>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>>> http://madeng.net/>>
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Jeff Zhang
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>> 
>>> 
>>> 
>> 
>> --
>> Best Regards
>> 
>> Jeff Zhang
> 



Re: Zeppelin in GSOC 2019

Posted by Felix Cheung <fe...@hotmail.com>.
Hi Xun,

Thanks for your work - could you change the title of the email, I think you will get more attention to your ask to review the design.


________________________________
From: Xun Liu <ne...@163.com>
Sent: Sunday, March 10, 2019 12:03 AM
To: Jongyoul Lee; moon@apache.org; Jeff Zhang; Vasiliy Morkovkin
Cc: dev@zeppelin.apache.org
Subject: Re: Zeppelin in GSOC 2019

Hello, everyone,

I have completed the zeppelin workflow system design, please review, you can directly modify the document or fill in the comments.

JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>
gdoc: https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit# <https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#>

:-)

> 在 2019年3月8日,下午2:10,Jeff Zhang <zj...@gmail.com> 写道:
>
> Hi Liu,
>
> See this link https://community.apache.org/gsoc.html
>
>
> Xun Liu <ne...@163.com> 于2019年3月8日周五 下午1:58写道:
>
>> Hi, Jongyoul Lee, Морковкин
>>
>> I queried the information about GSOS. Is it still necessary to apply for
>> the zeppelin community first?
>> I don't know much about GSOS. In addition to helping the project, the
>> mentor
>> What other work needs to be done?
>>
>>> 在 2019年3月8日,上午10:01,Xun Liu <ne...@163.com> 写道:
>>>
>>> Hi, Морковкин
>>>
>>> I am very happy to be your mentor for GSOC. :-)
>>> I believe that by completing this work, I can also learn a lot.
>>>
>>> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>>
>>>> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <
>> morkovkin.vv@phystech.edu> 写道:
>>>>
>>>> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala.
>> It makes it easy to impose dependencies on the execution order of tasks.
>> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <
>> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces
>> the flow which is shown in the attached picture.
>>>> Xun Liu, It would be great to clarify whether you agree to be a mentor
>> exactly within GSOC, or without it? :)
>>>>
>>>> ----------------------------------------
>>>> Best regards, Basil Morkovkin
>>>>
>>>> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjffdu@gmail.com <mailto:
>> zjffdu@gmail.com>>:
>>>>
>>>> Thanks Liu for taking over this, I will help review the design.
>>>>
>>>> Xun Liu <neliuxun@163.com <ma...@163.com>> 于2019年3月7日周四
>> 下午4:05写道:
>>>> Hi Vasiliy Morkovkin
>>>>
>>>> Thank you very much for your willingness to implement this feature of
>> workflow.
>>>> I will work with you with the highest priority.
>>>> I am planning to update the system design documentation for workflow
>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
>>>> Please set the Watcher in ZEPPELIN-4018.
>>>> This way you can get notification messages for document updates in a
>> timely manner.
>>>>
>>>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
>>>> If you need it, you can email me at liuxun323@gmail.com <mailto:
>> liuxun323@gmail.com> <mailto:liuxun323@gmail.com <mailto:
>> liuxun323@gmail.com>> , I will reply you the fastest.
>>>> Do you think this kind of cooperation is OK?
>>>>
>>>>
>>>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
>> system design. Thanks!
>>>>
>>>> :-)
>>>>
>>>>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
>> morkovkin.vv@phystech.edu <ma...@phystech.edu>> 写道:
>>>>>
>>>>> Thank you for such a detailed feedback!
>>>>> I am definitely interested to work on the workflow implementation with
>> you Xun Liu! Could you become a mentor in GSOC with this task?
>>>>> Some front-end work is not a problem at all.
>>>>> I'm ready to work at least 30 hours per week in the summer, while now
>> I'd like to take some smaller tasks to take a closer look at existing
>> codebase and to get familiar with your development workflow. Do you have
>> such tasks on mind?
>>>>>
>>>>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <mailto:
>> neliuxun@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>:
>>>>> Hi Vasiliy Morkovkin
>>>>>
>>>>> I said my thoughts on workflow,
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>
>>>>> Because there are more than 20 interpreters in zeppelin,
>>>>> Data analysts can be used to do a variety of data development,
>>>>> A lot of data development is interdependent. For example,
>>>>> the development of machine learning algorithms requires relying on
>> spark to preprocess data, and so on.
>>>>>
>>>>> Now open source workflow software has Azkaban, airflow,
>>>>> Azkaban is relatively simple and has been used to meet most scenarios,
>> and our company is using it.
>>>>> Airflow looks complicated and I have not used it.
>>>>> In fact, I have previously implemented workflow workflow for notes and
>> paragraphs in zeppelin via azkaban.
>>>>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
>> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>>
>>>>>
>>>>> However, I think zeppelin should have built-in workflow capabilities.
>>>>> Instead of relying on external software to schedule notes in zeppelin
>> for the following reasons:
>>>>> 1. Now that we have upgraded from the data processing era to the
>> algorithm era,
>>>>> After zeppelin has its own workflow, it will form a data loop.
>>>>>
>>>>> 2. zeppelin's powerful interactive processing capabilities help
>> algorithm engineers improve productivity and work.
>>>>> Zeppelin should give the algorithm engineer more direct control.
>>>>> Instead of handing the algorithm to other teams(or software) to do the
>> workflow.
>>>>>
>>>>> 3. zeppelin knows more about the processing status of data than
>> Azkaban and airflow.
>>>>> So the built-in workflow will have better performance, user experience
>> and control.
>>>>>
>>>>> If you are interested in workflow(ZEPPELIN-4018),
>>>>> I am willing to work with you to complete all system design and code
>> development work.
>>>>>
>>>>> :-)
>>>>>
>>>>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <mailto:
>> zjffdu@gmail.com> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil,
>>>>>>
>>>>>> Thanks for your interest in zeppelin, here's my comments about the
>> tickets
>>>>>> you interested.
>>>>>>
>>>>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>>
>>>>>> This involves 2 sides of work: frontend and backend:
>>>>>> In frontend, we should use arrow js to handle the table data,
>> include
>>>>>> display it and processing it (such as aggregation)
>>>>>> In backend, we should use arrow for each language, and allow them to
>>>>>> exchange data in the same process. And use arrow IPC to exchange data
>>>>>> across processes.
>>>>>> Overall, this is a pretty large task. If you really want to do, I
>> would
>>>>>> suggest you to just take part of it.
>>>>>>
>>>>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>>
>>>>>> Regarding model serving, I don't have clear picture about this.
>> Others
>>>>>> can comment on this.
>>>>>>
>>>>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>> Job scheduling is pretty important for zeppelin, I would make this
>> as
>>>>>> the highest priority for zeppelin among these tickets. airflow is one
>>>>>> option, but I am open to other solutions. First we need to figure out
>> how
>>>>>> user schedule jobs in zeppelin, then choose the right framework. It
>> would
>>>>>> also involves some frontend work
>>>>>>
>>>>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>
>>>>>> Spark 2.4.0 supporting is already there, but scala 2.12 is not
>>>>>> supported yet. It won't be a big project for GSOC IMO.
>>>>>>
>>>>>> 5. OLAP.
>>>>>> Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>>>>>> Zeppelin can support it very well. But we could create specific
>> interpreter
>>>>>> for OLAP engine if their native api perform better than jdbc. Another
>> thing
>>>>>> I can think of improving OLAP is visualization, although Zeppelin
>> already
>>>>>> support some built-in visualization, there's still some visualization
>>>>>> missing. We could provide more.
>>>>>>
>>>>>> 6. Auto-completions.
>>>>>> We have already support ipython[1] in zeppelin which provide almost
>> the
>>>>>> same auto-completion like jupyter. But it lacks for accessing python
>> api
>>>>>> doc. This is also pretty important for python users IMO. SQL is
>> another
>>>>>> popular language in Zeppelin, but it also doesn't provide good
>>>>>> code-completion experience, we can do better as well.
>>>>>>
>>>>>> 7. Notifications.
>>>>>> I think notification can be integrated into job scheduling.
>> Notification
>>>>>> can be sent when job is failed/succeed.
>>>>>>
>>>>>>
>>>>>> Let us know which jira you are more interested, and also please
>> consider
>>>>>> how much time you can spent on this. Again, we are very appreciated
>> your
>>>>>> interest on zeppelin and look forward your contribution.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <mailto:
>> morkovkin.vv@phystech.edu> <mailto:morkovkin.vv@phystech.edu <mailto:
>> morkovkin.vv@phystech.edu>>> 于2019年3月6日周三
>>>>>> 上午7:41写道:
>>>>>>
>>>>>>> Thank you for your replies! I've checked existing set of issues and
>> found
>>>>>>> several curious ones:
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very
>>>>>>> nice
>>>>>>> way to increase analytical processing performance using Arrow
>> project;
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models
>>>>>>> regardless of ZeppelinServer sounds quite intriguing too. Although
>> there is
>>>>>>> much to think about;
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance
>>>>>>> https://airflow.apache.org/ <https://airflow.apache.org/> <
>> https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be
>> useful in implementing complex
>>>>>>> execution workflows.
>>>>>>> Those tasks are global and intriguing, requiring complex
>> architectural
>>>>>>> solutions.
>>>>>>> Also I've probably found the ticket which is suitable for me to get
>>>>>>> involved into the project:
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think?
>>>>>>> Are there any "low hanging fruits"?
>>>>>>>
>>>>>>> And I have several ideas on my own. Some of them might be not
>> relevant due
>>>>>>> to the vision of the project or other reasons. Just ideas:
>>>>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be
>> quite
>>>>>>> logical to add more integrations with existing OLAP solutions like
>> Pinot,
>>>>>>> ClickHouse and Druid. Currently I've found integration only with
>> Kylin;
>>>>>>> - Better autocompletion. Jupyter offers not only a list of already
>>>>>>> initialized variables, but also quick access to documentation. It's
>>>>>>> convenient;
>>>>>>> - Notifications. Some colleagues would have appreciated the
>> notifications
>>>>>>> service, which sends you messages (via mail, Slack bot or something
>> else)
>>>>>>> indicating that your long-running paragraphs has completed.
>>>>>>>
>>>>>>> Feedback is very appreciated :)
>>>>>>>
>>>>>>> It would be wonderful if someone agreed to sacrifice his time and
>> become a
>>>>>>> mentor in GSOC program!
>>>>>>>
>>>>>>> ----------------------------------------
>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>
>>>>>>>
>>>>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com
>> <ma...@gmail.com> <mailto:jongyoul@gmail.com <mailto:
>> jongyoul@gmail.com>>>:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I've confirmed I could add more issues for GSOC. Can you explain
>> what you
>>>>>>>> would like to contribute to? I can add more issues
>>>>>>>>
>>>>>>>> JL
>>>>>>>>
>>>>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <mailto:
>> neliuxun@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>
>> wrote:
>>>>>>>>
>>>>>>>>> Hi, Vasiliy Morkovkin
>>>>>>>>>
>>>>>>>>> Welcome to the zeppelin community! :-)
>>>>>>>>>
>>>>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <mailto:
>> jongyoul@gmail.com> <mailto:jongyoul@gmail.com <ma...@gmail.com>>>
>> 写道:
>>>>>>>>>>
>>>>>>>>>> Thanks for contacting Zeppelin with your interest.
>>>>>>>>>>
>>>>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I
>> have
>>>>>>>>>> thought about. We always encourage to contribute Zeppelin with
>> several
>>>>>>>>>> topics including your idea.
>>>>>>>>>>
>>>>>>>>>> Please describe something more.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>> JL
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org
>> <ma...@apache.org> <mailto:moon@apache.org <ma...@apache.org>>>
>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Great to see your interest to project. Thanks!
>>>>>>>>>>> Looks like we need volunteers for a mentor and some backend
>> subject
>>>>>>> for
>>>>>>>>>>> GSoC2019.
>>>>>>>>>>> Any ideas?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> moon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu>
>> <mailto:morkovkin.vv@phystech.edu <ma...@phystech.edu>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>>>>>>> physics
>>>>>>>>>>>> and technology and eager to contribute to Zeppelin in context of
>>>>>>> GSOC
>>>>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>>>>>>> months,
>>>>>>>>>>>> using it at my job. But I have found out only one ticket
>> (front-end
>>>>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may
>> have any
>>>>>>>>>>>> ideas for new features or improvements in Zeppelin, but you
>> don't
>>>>>>> have
>>>>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>>>>>>> mentor
>>>>>>>>>>>> these ideas within GSOC :)
>>>>>>>>>>>> Currently I am in a position of Scala developer (back-end) for
>> 1.5
>>>>>>>>> year.
>>>>>>>>>>>> I also can write in Java or Python without any problems if
>>>>>>> necessary.
>>>>>>>>>>>> Really fond of databases and highload. Also I have experience
>> with
>>>>>>>>> some
>>>>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>> http://madeng.net/>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>> http://madeng.net/>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang


Re: Zeppelin in GSOC 2019

Posted by Xun Liu <ne...@163.com>.
Hello, everyone, 

I have completed the zeppelin workflow system design, please review, you can directly modify the document or fill in the comments.

JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> 
gdoc: https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit# <https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#> 

:-)

> 在 2019年3月8日,下午2:10,Jeff Zhang <zj...@gmail.com> 写道:
> 
> Hi Liu,
> 
> See this link https://community.apache.org/gsoc.html
> 
> 
> Xun Liu <ne...@163.com> 于2019年3月8日周五 下午1:58写道:
> 
>> Hi, Jongyoul Lee, Морковкин
>> 
>> I queried the information about GSOS. Is it still necessary to apply for
>> the zeppelin community first?
>> I don't know much about GSOS. In addition to helping the project, the
>> mentor
>> What other work needs to be done?
>> 
>>> 在 2019年3月8日,上午10:01,Xun Liu <ne...@163.com> 写道:
>>> 
>>> Hi, Морковкин
>>> 
>>> I am very happy to be your mentor for GSOC. :-)
>>> I believe that by completing this work, I can also learn a lot.
>>> 
>>> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>> 
>>>> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <
>> morkovkin.vv@phystech.edu> 写道:
>>>> 
>>>> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala.
>> It makes it easy to impose dependencies on the execution order of tasks.
>> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <
>> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces
>> the flow which is shown in the attached picture.
>>>> Xun Liu, It would be great to clarify whether you agree to be a mentor
>> exactly within GSOC, or without it? :)
>>>> 
>>>> ----------------------------------------
>>>> Best regards, Basil Morkovkin
>>>> 
>>>> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjffdu@gmail.com <mailto:
>> zjffdu@gmail.com>>:
>>>> 
>>>> Thanks Liu for taking over this, I will help review the design.
>>>> 
>>>> Xun Liu <neliuxun@163.com <ma...@163.com>> 于2019年3月7日周四
>> 下午4:05写道:
>>>> Hi Vasiliy Morkovkin
>>>> 
>>>> Thank you very much for your willingness to implement this feature of
>> workflow.
>>>> I will work with you with the highest priority.
>>>> I am planning to update the system design documentation for workflow
>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
>>>> Please set the Watcher in ZEPPELIN-4018.
>>>> This way you can get notification messages for document updates in a
>> timely manner.
>>>> 
>>>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
>>>> If you need it, you can email me at liuxun323@gmail.com <mailto:
>> liuxun323@gmail.com> <mailto:liuxun323@gmail.com <mailto:
>> liuxun323@gmail.com>> , I will reply you the fastest.
>>>> Do you think this kind of cooperation is OK?
>>>> 
>>>> 
>>>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
>> system design. Thanks!
>>>> 
>>>> :-)
>>>> 
>>>>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
>> morkovkin.vv@phystech.edu <ma...@phystech.edu>> 写道:
>>>>> 
>>>>> Thank you for such a detailed feedback!
>>>>> I am definitely interested to work on the workflow implementation with
>> you Xun Liu! Could you become a mentor in GSOC with this task?
>>>>> Some front-end work is not a problem at all.
>>>>> I'm ready to work at least 30 hours per week in the summer, while now
>> I'd like to take some smaller tasks to take a closer look at existing
>> codebase and to get familiar with your development workflow. Do you have
>> such tasks on mind?
>>>>> 
>>>>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <mailto:
>> neliuxun@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>:
>>>>> Hi Vasiliy Morkovkin
>>>>> 
>>>>> I said my thoughts on workflow,
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>> 
>>>>> Because there are more than 20 interpreters in zeppelin,
>>>>> Data analysts can be used to do a variety of data development,
>>>>> A lot of data development is interdependent. For example,
>>>>> the development of machine learning algorithms requires relying on
>> spark to preprocess data, and so on.
>>>>> 
>>>>> Now open source workflow software has Azkaban, airflow,
>>>>> Azkaban is relatively simple and has been used to meet most scenarios,
>> and our company is using it.
>>>>> Airflow looks complicated and I have not used it.
>>>>> In fact, I have previously implemented workflow workflow for notes and
>> paragraphs in zeppelin via azkaban.
>>>>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
>> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>>
>>>>> 
>>>>> However, I think zeppelin should have built-in workflow capabilities.
>>>>> Instead of relying on external software to schedule notes in zeppelin
>> for the following reasons:
>>>>> 1. Now that we have upgraded from the data processing era to the
>> algorithm era,
>>>>> After zeppelin has its own workflow, it will form a data loop.
>>>>> 
>>>>> 2. zeppelin's powerful interactive processing capabilities help
>> algorithm engineers improve productivity and work.
>>>>> Zeppelin should give the algorithm engineer more direct control.
>>>>> Instead of handing the algorithm to other teams(or software) to do the
>> workflow.
>>>>> 
>>>>> 3. zeppelin knows more about the processing status of data than
>> Azkaban and airflow.
>>>>> So the built-in workflow will have better performance, user experience
>> and control.
>>>>> 
>>>>> If you are interested in workflow(ZEPPELIN-4018),
>>>>> I am willing to work with you to complete all system design and code
>> development work.
>>>>> 
>>>>> :-)
>>>>> 
>>>>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <mailto:
>> zjffdu@gmail.com> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> 写道:
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil,
>>>>>> 
>>>>>> Thanks for your interest in zeppelin, here's my comments about the
>> tickets
>>>>>> you interested.
>>>>>> 
>>>>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>>
>>>>>>  This involves 2 sides of work: frontend and backend:
>>>>>>  In frontend, we should use arrow js to handle the table data,
>> include
>>>>>> display it and processing it (such as aggregation)
>>>>>>  In backend, we should use arrow for each language, and allow them to
>>>>>> exchange data in the same process. And use arrow IPC to exchange data
>>>>>> across processes.
>>>>>> Overall, this is a pretty large task. If you really want to do, I
>> would
>>>>>> suggest you to just take part of it.
>>>>>> 
>>>>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>>
>>>>>>  Regarding model serving, I don't have clear picture about this.
>> Others
>>>>>> can comment on this.
>>>>>> 
>>>>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>>  Job scheduling is pretty important for zeppelin, I would make this
>> as
>>>>>> the highest priority for zeppelin among these tickets. airflow is one
>>>>>> option, but I am open to other solutions. First we need to figure out
>> how
>>>>>> user schedule jobs in zeppelin, then choose the right framework. It
>> would
>>>>>> also involves some frontend work
>>>>>> 
>>>>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>
>>>>>>  Spark 2.4.0 supporting is already there, but scala 2.12 is not
>>>>>> supported yet. It won't be a big project for GSOC IMO.
>>>>>> 
>>>>>> 5. OLAP.
>>>>>>  Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>>>>>> Zeppelin can support it very well. But we could create specific
>> interpreter
>>>>>> for OLAP engine if their native api perform better than jdbc. Another
>> thing
>>>>>> I can think of improving OLAP is visualization, although Zeppelin
>> already
>>>>>> support some built-in visualization, there's still some visualization
>>>>>> missing. We could provide more.
>>>>>> 
>>>>>> 6. Auto-completions.
>>>>>> We have already support ipython[1]  in zeppelin which provide almost
>> the
>>>>>> same auto-completion like jupyter. But it lacks for accessing python
>> api
>>>>>> doc. This is also pretty important for python users IMO. SQL is
>> another
>>>>>> popular language in Zeppelin, but it also doesn't provide good
>>>>>> code-completion experience, we can do better as well.
>>>>>> 
>>>>>> 7. Notifications.
>>>>>> I think notification can be integrated into job scheduling.
>> Notification
>>>>>> can be sent when job is failed/succeed.
>>>>>> 
>>>>>> 
>>>>>> Let us know which jira you are more interested, and also please
>> consider
>>>>>> how much time you can spent on this. Again, we are very appreciated
>> your
>>>>>> interest on zeppelin and look forward your contribution.
>>>>>> 
>>>>>> 
>>>>>> [1]
>>>>>> 
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <mailto:
>> morkovkin.vv@phystech.edu> <mailto:morkovkin.vv@phystech.edu <mailto:
>> morkovkin.vv@phystech.edu>>> 于2019年3月6日周三
>>>>>> 上午7:41写道:
>>>>>> 
>>>>>>> Thank you for your replies! I've checked existing set of issues and
>> found
>>>>>>> several curious ones:
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very
>>>>>>> nice
>>>>>>> way to increase analytical processing performance using Arrow
>> project;
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models
>>>>>>> regardless of ZeppelinServer sounds quite intriguing too. Although
>> there is
>>>>>>> much to think about;
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance
>>>>>>> https://airflow.apache.org/ <https://airflow.apache.org/> <
>> https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be
>> useful in implementing complex
>>>>>>> execution workflows.
>>>>>>> Those tasks are global and intriguing, requiring complex
>> architectural
>>>>>>> solutions.
>>>>>>> Also I've probably found the ticket which is suitable for me to get
>>>>>>> involved into the project:
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think?
>>>>>>> Are there any "low hanging fruits"?
>>>>>>> 
>>>>>>> And I have several ideas on my own. Some of them might be not
>> relevant due
>>>>>>> to the vision of the project or other reasons. Just ideas:
>>>>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be
>> quite
>>>>>>> logical to add more integrations with existing OLAP solutions like
>> Pinot,
>>>>>>> ClickHouse and Druid. Currently I've found integration only with
>> Kylin;
>>>>>>> - Better autocompletion. Jupyter offers not only a list of already
>>>>>>> initialized variables, but also quick access to documentation. It's
>>>>>>> convenient;
>>>>>>> - Notifications. Some colleagues would have appreciated the
>> notifications
>>>>>>> service, which sends you messages (via mail, Slack bot or something
>> else)
>>>>>>> indicating that your long-running paragraphs has completed.
>>>>>>> 
>>>>>>> Feedback is very appreciated :)
>>>>>>> 
>>>>>>> It would be wonderful if someone agreed to sacrifice his time and
>> become a
>>>>>>> mentor in GSOC program!
>>>>>>> 
>>>>>>> ----------------------------------------
>>>>>>> Best regards, Basil Morkovkin.
>>>>>>> 
>>>>>>> 
>>>>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com
>> <ma...@gmail.com> <mailto:jongyoul@gmail.com <mailto:
>> jongyoul@gmail.com>>>:
>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> I've confirmed I could add more issues for GSOC. Can you explain
>> what you
>>>>>>>> would like to contribute to? I can add more issues
>>>>>>>> 
>>>>>>>> JL
>>>>>>>> 
>>>>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <mailto:
>> neliuxun@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>
>> wrote:
>>>>>>>> 
>>>>>>>>> Hi, Vasiliy Morkovkin
>>>>>>>>> 
>>>>>>>>> Welcome to the zeppelin community! :-)
>>>>>>>>> 
>>>>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <mailto:
>> jongyoul@gmail.com> <mailto:jongyoul@gmail.com <ma...@gmail.com>>>
>> 写道:
>>>>>>>>>> 
>>>>>>>>>> Thanks for contacting Zeppelin with your interest.
>>>>>>>>>> 
>>>>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I
>> have
>>>>>>>>>> thought about. We always encourage to contribute Zeppelin with
>> several
>>>>>>>>>> topics including your idea.
>>>>>>>>>> 
>>>>>>>>>> Please describe something more.
>>>>>>>>>> 
>>>>>>>>>> Thanks.
>>>>>>>>>> JL
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org
>> <ma...@apache.org> <mailto:moon@apache.org <ma...@apache.org>>>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> Great to see your interest to project. Thanks!
>>>>>>>>>>> Looks like we need volunteers for a mentor and some backend
>> subject
>>>>>>> for
>>>>>>>>>>> GSoC2019.
>>>>>>>>>>> Any ideas?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> moon
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu>
>> <mailto:morkovkin.vv@phystech.edu <ma...@phystech.edu>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>>>>>>> physics
>>>>>>>>>>>> and technology and eager to contribute to Zeppelin in context of
>>>>>>> GSOC
>>>>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>>>>>>> months,
>>>>>>>>>>>> using it at my job. But I have found out only one ticket
>> (front-end
>>>>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may
>> have any
>>>>>>>>>>>> ideas for new features or improvements in Zeppelin, but you
>> don't
>>>>>>> have
>>>>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>>>>>>> mentor
>>>>>>>>>>>> these ideas within GSOC :)
>>>>>>>>>>>> Currently I am in a position of Scala developer (back-end) for
>> 1.5
>>>>>>>>> year.
>>>>>>>>>>>> I also can write in Java or Python without any problems if
>>>>>>> necessary.
>>>>>>>>>>>> Really fond of databases and highload. Also I have experience
>> with
>>>>>>>>> some
>>>>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>> http://madeng.net/>>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>> http://madeng.net/>>
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best Regards
>>>>>> 
>>>>>> Jeff Zhang
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best Regards
>>>> 
>>>> Jeff Zhang
>>> 
>> 
>> 
>> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Zeppelin in GSOC 2019

Posted by Jeff Zhang <zj...@gmail.com>.
Hi Liu,

See this link https://community.apache.org/gsoc.html


Xun Liu <ne...@163.com> 于2019年3月8日周五 下午1:58写道:

> Hi, Jongyoul Lee, Морковкин
>
> I queried the information about GSOS. Is it still necessary to apply for
> the zeppelin community first?
> I don't know much about GSOS. In addition to helping the project, the
> mentor
> What other work needs to be done?
>
> > 在 2019年3月8日,上午10:01,Xun Liu <ne...@163.com> 写道:
> >
> > Hi, Морковкин
> >
> > I am very happy to be your mentor for GSOC. :-)
> > I believe that by completing this work, I can also learn a lot.
> >
> > Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> >
> >> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <
> morkovkin.vv@phystech.edu> 写道:
> >>
> >> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala.
> It makes it easy to impose dependencies on the execution order of tasks.
> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <
> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces
> the flow which is shown in the attached picture.
> >> Xun Liu, It would be great to clarify whether you agree to be a mentor
> exactly within GSOC, or without it? :)
> >>
> >> ----------------------------------------
> >> Best regards, Basil Morkovkin
> >>
> >> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjffdu@gmail.com <mailto:
> zjffdu@gmail.com>>:
> >>
> >> Thanks Liu for taking over this, I will help review the design.
> >>
> >> Xun Liu <neliuxun@163.com <ma...@163.com>> 于2019年3月7日周四
> 下午4:05写道:
> >> Hi Vasiliy Morkovkin
> >>
> >> Thank you very much for your willingness to implement this feature of
> workflow.
> >> I will work with you with the highest priority.
> >> I am planning to update the system design documentation for workflow
> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
> >> Please set the Watcher in ZEPPELIN-4018.
> >> This way you can get notification messages for document updates in a
> timely manner.
> >>
> >> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
> >> If you need it, you can email me at liuxun323@gmail.com <mailto:
> liuxun323@gmail.com> <mailto:liuxun323@gmail.com <mailto:
> liuxun323@gmail.com>> , I will reply you the fastest.
> >> Do you think this kind of cooperation is OK?
> >>
> >>
> >> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
> system design. Thanks!
> >>
> >> :-)
> >>
> >>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
> morkovkin.vv@phystech.edu <ma...@phystech.edu>> 写道:
> >>>
> >>> Thank you for such a detailed feedback!
> >>> I am definitely interested to work on the workflow implementation with
> you Xun Liu! Could you become a mentor in GSOC with this task?
> >>> Some front-end work is not a problem at all.
> >>> I'm ready to work at least 30 hours per week in the summer, while now
> I'd like to take some smaller tasks to take a closer look at existing
> codebase and to get familiar with your development workflow. Do you have
> such tasks on mind?
> >>>
> >>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <mailto:
> neliuxun@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>:
> >>> Hi Vasiliy Morkovkin
> >>>
> >>> I said my thoughts on workflow,
> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
> >>>
> >>> Because there are more than 20 interpreters in zeppelin,
> >>> Data analysts can be used to do a variety of data development,
> >>> A lot of data development is interdependent. For example,
> >>> the development of machine learning algorithms requires relying on
> spark to preprocess data, and so on.
> >>>
> >>> Now open source workflow software has Azkaban, airflow,
> >>> Azkaban is relatively simple and has been used to meet most scenarios,
> and our company is using it.
> >>> Airflow looks complicated and I have not used it.
> >>> In fact, I have previously implemented workflow workflow for notes and
> paragraphs in zeppelin via azkaban.
> >>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>>
> >>>
> >>> However, I think zeppelin should have built-in workflow capabilities.
> >>> Instead of relying on external software to schedule notes in zeppelin
> for the following reasons:
> >>> 1. Now that we have upgraded from the data processing era to the
> algorithm era,
> >>> After zeppelin has its own workflow, it will form a data loop.
> >>>
> >>> 2. zeppelin's powerful interactive processing capabilities help
> algorithm engineers improve productivity and work.
> >>> Zeppelin should give the algorithm engineer more direct control.
> >>> Instead of handing the algorithm to other teams(or software) to do the
> workflow.
> >>>
> >>> 3. zeppelin knows more about the processing status of data than
> Azkaban and airflow.
> >>> So the built-in workflow will have better performance, user experience
> and control.
> >>>
> >>> If you are interested in workflow(ZEPPELIN-4018),
> >>> I am willing to work with you to complete all system design and code
> development work.
> >>>
> >>> :-)
> >>>
> >>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <mailto:
> zjffdu@gmail.com> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> 写道:
> >>>>
> >>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil,
> >>>>
> >>>> Thanks for your interest in zeppelin, here's my comments about the
> tickets
> >>>> you interested.
> >>>>
> >>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3651>>
> >>>>   This involves 2 sides of work: frontend and backend:
> >>>>   In frontend, we should use arrow js to handle the table data,
> include
> >>>> display it and processing it (such as aggregation)
> >>>>   In backend, we should use arrow for each language, and allow them to
> >>>> exchange data in the same process. And use arrow IPC to exchange data
> >>>> across processes.
> >>>>  Overall, this is a pretty large task. If you really want to do, I
> would
> >>>> suggest you to just take part of it.
> >>>>
> >>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3994>>
> >>>>   Regarding model serving, I don't have clear picture about this.
> Others
> >>>> can comment on this.
> >>>>
> >>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
> >>>>   Job scheduling is pretty important for zeppelin, I would make this
> as
> >>>> the highest priority for zeppelin among these tickets. airflow is one
> >>>> option, but I am open to other solutions. First we need to figure out
> how
> >>>> user schedule jobs in zeppelin, then choose the right framework. It
> would
> >>>> also involves some frontend work
> >>>>
> >>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>
> >>>>   Spark 2.4.0 supporting is already there, but scala 2.12 is not
> >>>> supported yet. It won't be a big project for GSOC IMO.
> >>>>
> >>>> 5. OLAP.
> >>>>   Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
> >>>> Zeppelin can support it very well. But we could create specific
> interpreter
> >>>> for OLAP engine if their native api perform better than jdbc. Another
> thing
> >>>> I can think of improving OLAP is visualization, although Zeppelin
> already
> >>>> support some built-in visualization, there's still some visualization
> >>>> missing. We could provide more.
> >>>>
> >>>> 6. Auto-completions.
> >>>>  We have already support ipython[1]  in zeppelin which provide almost
> the
> >>>> same auto-completion like jupyter. But it lacks for accessing python
> api
> >>>> doc. This is also pretty important for python users IMO. SQL is
> another
> >>>> popular language in Zeppelin, but it also doesn't provide good
> >>>> code-completion experience, we can do better as well.
> >>>>
> >>>> 7. Notifications.
> >>>>  I think notification can be integrated into job scheduling.
> Notification
> >>>> can be sent when job is failed/succeed.
> >>>>
> >>>>
> >>>> Let us know which jira you are more interested, and also please
> consider
> >>>> how much time you can spent on this. Again, we are very appreciated
> your
> >>>> interest on zeppelin and look forward your contribution.
> >>>>
> >>>>
> >>>> [1]
> >>>>
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
> <
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>
> <
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
> <
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
> >>
> >>>>
> >>>>
> >>>>
> >>>> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <mailto:
> morkovkin.vv@phystech.edu> <mailto:morkovkin.vv@phystech.edu <mailto:
> morkovkin.vv@phystech.edu>>> 于2019年3月6日周三
> >>>> 上午7:41写道:
> >>>>
> >>>>> Thank you for your replies! I've checked existing set of issues and
> found
> >>>>> several curious ones:
> >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very
> >>>>> nice
> >>>>> way to increase analytical processing performance using Arrow
> project;
> >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models
> >>>>> regardless of ZeppelinServer sounds quite intriguing too. Although
> there is
> >>>>> much to think about;
> >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance
> >>>>> https://airflow.apache.org/ <https://airflow.apache.org/> <
> https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be
> useful in implementing complex
> >>>>> execution workflows.
> >>>>> Those tasks are global and intriguing, requiring complex
> architectural
> >>>>> solutions.
> >>>>> Also I've probably found the ticket which is suitable for me to get
> >>>>> involved into the project:
> >>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think?
> >>>>> Are there any "low hanging fruits"?
> >>>>>
> >>>>> And I have several ideas on my own. Some of them might be not
> relevant due
> >>>>> to the vision of the project or other reasons. Just ideas:
> >>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be
> quite
> >>>>> logical to add more integrations with existing OLAP solutions like
> Pinot,
> >>>>> ClickHouse and Druid. Currently I've found integration only with
> Kylin;
> >>>>> - Better autocompletion. Jupyter offers not only a list of already
> >>>>> initialized variables, but also quick access to documentation. It's
> >>>>> convenient;
> >>>>> - Notifications. Some colleagues would have appreciated the
> notifications
> >>>>> service, which sends you messages (via mail, Slack bot or something
> else)
> >>>>> indicating that your long-running paragraphs has completed.
> >>>>>
> >>>>> Feedback is very appreciated :)
> >>>>>
> >>>>> It would be wonderful if someone agreed to sacrifice his time and
> become a
> >>>>> mentor in GSOC program!
> >>>>>
> >>>>> ----------------------------------------
> >>>>> Best regards, Basil Morkovkin.
> >>>>>
> >>>>>
> >>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com
> <ma...@gmail.com> <mailto:jongyoul@gmail.com <mailto:
> jongyoul@gmail.com>>>:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> I've confirmed I could add more issues for GSOC. Can you explain
> what you
> >>>>>> would like to contribute to? I can add more issues
> >>>>>>
> >>>>>> JL
> >>>>>>
> >>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <mailto:
> neliuxun@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>
> wrote:
> >>>>>>
> >>>>>>> Hi, Vasiliy Morkovkin
> >>>>>>>
> >>>>>>> Welcome to the zeppelin community! :-)
> >>>>>>>
> >>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <mailto:
> jongyoul@gmail.com> <mailto:jongyoul@gmail.com <ma...@gmail.com>>>
> 写道:
> >>>>>>>>
> >>>>>>>> Thanks for contacting Zeppelin with your interest.
> >>>>>>>>
> >>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I
> have
> >>>>>>>> thought about. We always encourage to contribute Zeppelin with
> several
> >>>>>>>> topics including your idea.
> >>>>>>>>
> >>>>>>>> Please describe something more.
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>> JL
> >>>>>>>>
> >>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org
> <ma...@apache.org> <mailto:moon@apache.org <ma...@apache.org>>>
> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> Great to see your interest to project. Thanks!
> >>>>>>>>> Looks like we need volunteers for a mentor and some backend
> subject
> >>>>> for
> >>>>>>>>> GSoC2019.
> >>>>>>>>> Any ideas?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> moon
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
> >>>>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu>
> <mailto:morkovkin.vv@phystech.edu <ma...@phystech.edu>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
> >>>>>>> physics
> >>>>>>>>>> and technology and eager to contribute to Zeppelin in context of
> >>>>> GSOC
> >>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
> >>>>>>> months,
> >>>>>>>>>> using it at my job. But I have found out only one ticket
> (front-end
> >>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may
> have any
> >>>>>>>>>> ideas for new features or improvements in Zeppelin, but you
> don't
> >>>>> have
> >>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
> >>>>> mentor
> >>>>>>>>>> these ideas within GSOC :)
> >>>>>>>>>> Currently I am in a position of Scala developer (back-end) for
> 1.5
> >>>>>>> year.
> >>>>>>>>>> I also can write in Java or Python without any problems if
> >>>>> necessary.
> >>>>>>>>>> Really fond of databases and highload. Also I have experience
> with
> >>>>>>> some
> >>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
> >>>>>>>>>>
> >>>>>>>>>> Best regards, Basil Morkovkin.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> 이종열, Jongyoul Lee, 李宗烈
> >>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
> http://madeng.net/>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> 이종열, Jongyoul Lee, 李宗烈
> >>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
> http://madeng.net/>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards
> >>>>
> >>>> Jeff Zhang
> >>>
> >>
> >>
> >>
> >> --
> >> Best Regards
> >>
> >> Jeff Zhang
> >
>
>
>

-- 
Best Regards

Jeff Zhang

Re: Zeppelin in GSOC 2019

Posted by Xun Liu <ne...@163.com>.
Hi, Jongyoul Lee, Морковкин

I queried the information about GSOS. Is it still necessary to apply for the zeppelin community first?
I don't know much about GSOS. In addition to helping the project, the mentor
What other work needs to be done?

> 在 2019年3月8日,上午10:01,Xun Liu <ne...@163.com> 写道:
> 
> Hi, Морковкин
> 
> I am very happy to be your mentor for GSOC. :-)
> I believe that by completing this work, I can also learn a lot.
> 
> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> 
> 
>> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <mo...@phystech.edu> 写道:
>> 
>> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. It makes it easy to impose dependencies on the execution order of tasks. Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces the flow which is shown in the attached picture.
>> Xun Liu, It would be great to clarify whether you agree to be a mentor exactly within GSOC, or without it? :)
>> 
>> ----------------------------------------
>> Best regards, Basil Morkovkin
>> 
>> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjffdu@gmail.com <ma...@gmail.com>>:
>> 
>> Thanks Liu for taking over this, I will help review the design.  
>> 
>> Xun Liu <neliuxun@163.com <ma...@163.com>> 于2019年3月7日周四 下午4:05写道:
>> Hi Vasiliy Morkovkin
>> 
>> Thank you very much for your willingness to implement this feature of workflow.
>> I will work with you with the highest priority.
>> I am planning to update the system design documentation for workflow first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
>> Please set the Watcher in ZEPPELIN-4018.
>> This way you can get notification messages for document updates in a timely manner.
>> 
>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
>> If you need it, you can email me at liuxun323@gmail.com <ma...@gmail.com> <mailto:liuxun323@gmail.com <ma...@gmail.com>> , I will reply you the fastest.
>> Do you think this kind of cooperation is OK?
>> 
>> 
>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our system design. Thanks!
>> 
>> :-)
>> 
>>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <ma...@phystech.edu>> 写道:
>>> 
>>> Thank you for such a detailed feedback!
>>> I am definitely interested to work on the workflow implementation with you Xun Liu! Could you become a mentor in GSOC with this task?
>>> Some front-end work is not a problem at all.
>>> I'm ready to work at least 30 hours per week in the summer, while now I'd like to take some smaller tasks to take a closer look at existing codebase and to get familiar with your development workflow. Do you have such tasks on mind?
>>> 
>>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <ma...@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>:
>>> Hi Vasiliy Morkovkin
>>> 
>>> I said my thoughts on workflow, https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> 
>>> 
>>> Because there are more than 20 interpreters in zeppelin, 
>>> Data analysts can be used to do a variety of data development,
>>> A lot of data development is interdependent. For example, 
>>> the development of machine learning algorithms requires relying on spark to preprocess data, and so on.
>>> 
>>> Now open source workflow software has Azkaban, airflow,
>>> Azkaban is relatively simple and has been used to meet most scenarios, and our company is using it.
>>> Airflow looks complicated and I have not used it.
>>> In fact, I have previously implemented workflow workflow for notes and paragraphs in zeppelin via azkaban.
>>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>> 
>>> 
>>> However, I think zeppelin should have built-in workflow capabilities. 
>>> Instead of relying on external software to schedule notes in zeppelin for the following reasons:
>>> 1. Now that we have upgraded from the data processing era to the algorithm era,
>>> After zeppelin has its own workflow, it will form a data loop.
>>> 
>>> 2. zeppelin's powerful interactive processing capabilities help algorithm engineers improve productivity and work.
>>> Zeppelin should give the algorithm engineer more direct control.
>>> Instead of handing the algorithm to other teams(or software) to do the workflow.
>>> 
>>> 3. zeppelin knows more about the processing status of data than Azkaban and airflow.
>>> So the built-in workflow will have better performance, user experience and control.
>>> 
>>> If you are interested in workflow(ZEPPELIN-4018), 
>>> I am willing to work with you to complete all system design and code development work.
>>> 
>>> :-)
>>> 
>>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <ma...@gmail.com> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> 写道:
>>>> 
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil,
>>>> 
>>>> Thanks for your interest in zeppelin, here's my comments about the tickets
>>>> you interested.
>>>> 
>>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651> <https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651>>
>>>>   This involves 2 sides of work: frontend and backend:
>>>>   In frontend, we should use arrow js to handle the table data, include
>>>> display it and processing it (such as aggregation)
>>>>   In backend, we should use arrow for each language, and allow them to
>>>> exchange data in the same process. And use arrow IPC to exchange data
>>>> across processes.
>>>>  Overall, this is a pretty large task. If you really want to do, I would
>>>> suggest you to just take part of it.
>>>> 
>>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994> <https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994>>
>>>>   Regarding model serving, I don't have clear picture about this. Others
>>>> can comment on this.
>>>> 
>>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>   Job scheduling is pretty important for zeppelin, I would make this as
>>>> the highest priority for zeppelin among these tickets. airflow is one
>>>> option, but I am open to other solutions. First we need to figure out how
>>>> user schedule jobs in zeppelin, then choose the right framework. It would
>>>> also involves some frontend work
>>>> 
>>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857> <https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857>>
>>>>   Spark 2.4.0 supporting is already there, but scala 2.12 is not
>>>> supported yet. It won't be a big project for GSOC IMO.
>>>> 
>>>> 5. OLAP.
>>>>   Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>>>> Zeppelin can support it very well. But we could create specific interpreter
>>>> for OLAP engine if their native api perform better than jdbc. Another thing
>>>> I can think of improving OLAP is visualization, although Zeppelin already
>>>> support some built-in visualization, there's still some visualization
>>>> missing. We could provide more.
>>>> 
>>>> 6. Auto-completions.
>>>>  We have already support ipython[1]  in zeppelin which provide almost the
>>>> same auto-completion like jupyter. But it lacks for accessing python api
>>>> doc. This is also pretty important for python users IMO. SQL is another
>>>> popular language in Zeppelin, but it also doesn't provide good
>>>> code-completion experience, we can do better as well.
>>>> 
>>>> 7. Notifications.
>>>>  I think notification can be integrated into job scheduling. Notification
>>>> can be sent when job is failed/succeed.
>>>> 
>>>> 
>>>> Let us know which jira you are more interested, and also please consider
>>>> how much time you can spent on this. Again, we are very appreciated your
>>>> interest on zeppelin and look forward your contribution.
>>>> 
>>>> 
>>>> [1]
>>>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>>
>>>> 
>>>> 
>>>> 
>>>> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <ma...@phystech.edu> <mailto:morkovkin.vv@phystech.edu <ma...@phystech.edu>>> 于2019年3月6日周三
>>>> 上午7:41写道:
>>>> 
>>>>> Thank you for your replies! I've checked existing set of issues and found
>>>>> several curious ones:
>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651> <https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very
>>>>> nice
>>>>> way to increase analytical processing performance using Arrow project;
>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994> <https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models
>>>>> regardless of ZeppelinServer sounds quite intriguing too. Although there is
>>>>> much to think about;
>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance
>>>>> https://airflow.apache.org/ <https://airflow.apache.org/> <https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be useful in implementing complex
>>>>> execution workflows.
>>>>> Those tasks are global and intriguing, requiring complex architectural
>>>>> solutions.
>>>>> Also I've probably found the ticket which is suitable for me to get
>>>>> involved into the project:
>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857> <https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think?
>>>>> Are there any "low hanging fruits"?
>>>>> 
>>>>> And I have several ideas on my own. Some of them might be not relevant due
>>>>> to the vision of the project or other reasons. Just ideas:
>>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
>>>>> logical to add more integrations with existing OLAP solutions like Pinot,
>>>>> ClickHouse and Druid. Currently I've found integration only with Kylin;
>>>>> - Better autocompletion. Jupyter offers not only a list of already
>>>>> initialized variables, but also quick access to documentation. It's
>>>>> convenient;
>>>>> - Notifications. Some colleagues would have appreciated the notifications
>>>>> service, which sends you messages (via mail, Slack bot or something else)
>>>>> indicating that your long-running paragraphs has completed.
>>>>> 
>>>>> Feedback is very appreciated :)
>>>>> 
>>>>> It would be wonderful if someone agreed to sacrifice his time and become a
>>>>> mentor in GSOC program!
>>>>> 
>>>>> ----------------------------------------
>>>>> Best regards, Basil Morkovkin.
>>>>> 
>>>>> 
>>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com> <mailto:jongyoul@gmail.com <ma...@gmail.com>>>:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I've confirmed I could add more issues for GSOC. Can you explain what you
>>>>>> would like to contribute to? I can add more issues
>>>>>> 
>>>>>> JL
>>>>>> 
>>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <ma...@163.com> <mailto:neliuxun@163.com <ma...@163.com>>> wrote:
>>>>>> 
>>>>>>> Hi, Vasiliy Morkovkin
>>>>>>> 
>>>>>>> Welcome to the zeppelin community! :-)
>>>>>>> 
>>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com> <mailto:jongyoul@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>>> 
>>>>>>>> Thanks for contacting Zeppelin with your interest.
>>>>>>>> 
>>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I have
>>>>>>>> thought about. We always encourage to contribute Zeppelin with several
>>>>>>>> topics including your idea.
>>>>>>>> 
>>>>>>>> Please describe something more.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> JL
>>>>>>>> 
>>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org <ma...@apache.org> <mailto:moon@apache.org <ma...@apache.org>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Great to see your interest to project. Thanks!
>>>>>>>>> Looks like we need volunteers for a mentor and some backend subject
>>>>> for
>>>>>>>>> GSoC2019.
>>>>>>>>> Any ideas?
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> moon
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu> <mailto:morkovkin.vv@phystech.edu <ma...@phystech.edu>>>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>>>>> physics
>>>>>>>>>> and technology and eager to contribute to Zeppelin in context of
>>>>> GSOC
>>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>>>>> months,
>>>>>>>>>> using it at my job. But I have found out only one ticket (front-end
>>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
>>>>>>>>>> ideas for new features or improvements in Zeppelin, but you don't
>>>>> have
>>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>>>>> mentor
>>>>>>>>>> these ideas within GSOC :)
>>>>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5
>>>>>>> year.
>>>>>>>>>> I also can write in Java or Python without any problems if
>>>>> necessary.
>>>>>>>>>> Really fond of databases and highload. Also I have experience with
>>>>>>> some
>>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>>>>> 
>>>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <http://madeng.net/>>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <http://madeng.net/>>
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards
>>>> 
>>>> Jeff Zhang
>>> 
>> 
>> 
>> 
>> -- 
>> Best Regards
>> 
>> Jeff Zhang
> 



Re: Zeppelin in GSOC 2019

Posted by Xun Liu <ne...@163.com>.
Hi, Морковкин

I am very happy to be your mentor for GSOC. :-)
I believe that by completing this work, I can also learn a lot.

Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> 

> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <mo...@phystech.edu> 写道:
> 
> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. It makes it easy to impose dependencies on the execution order of tasks. Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces the flow which is shown in the attached picture.
> Xun Liu, It would be great to clarify whether you agree to be a mentor exactly within GSOC, or without it? :)
> 
> ----------------------------------------
> Best regards, Basil Morkovkin
> 
> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjffdu@gmail.com <ma...@gmail.com>>:
> 
> Thanks Liu for taking over this, I will help review the design.  
> 
> Xun Liu <neliuxun@163.com <ma...@163.com>> 于2019年3月7日周四 下午4:05写道:
> Hi Vasiliy Morkovkin
> 
> Thank you very much for your willingness to implement this feature of workflow.
> I will work with you with the highest priority.
> I am planning to update the system design documentation for workflow first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
> Please set the Watcher in ZEPPELIN-4018.
> This way you can get notification messages for document updates in a timely manner.
> 
> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
> If you need it, you can email me at liuxun323@gmail.com <ma...@gmail.com> <mailto:liuxun323@gmail.com <ma...@gmail.com>> , I will reply you the fastest.
> Do you think this kind of cooperation is OK?
> 
> 
> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our system design. Thanks!
> 
> :-)
> 
> > 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <ma...@phystech.edu>> 写道:
> > 
> > Thank you for such a detailed feedback!
> > I am definitely interested to work on the workflow implementation with you Xun Liu! Could you become a mentor in GSOC with this task?
> > Some front-end work is not a problem at all.
> > I'm ready to work at least 30 hours per week in the summer, while now I'd like to take some smaller tasks to take a closer look at existing codebase and to get familiar with your development workflow. Do you have such tasks on mind?
> > 
> > ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <ma...@163.com> <mailto:neliuxun@163.com <ma...@163.com>>>:
> > Hi Vasiliy Morkovkin
> > 
> > I said my thoughts on workflow, https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> 
> > 
> > Because there are more than 20 interpreters in zeppelin, 
> > Data analysts can be used to do a variety of data development,
> > A lot of data development is interdependent. For example, 
> > the development of machine learning algorithms requires relying on spark to preprocess data, and so on.
> > 
> > Now open source workflow software has Azkaban, airflow,
> > Azkaban is relatively simple and has been used to meet most scenarios, and our company is using it.
> > Airflow looks complicated and I have not used it.
> > In fact, I have previously implemented workflow workflow for notes and paragraphs in zeppelin via azkaban.
> > https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>> 
> > 
> > However, I think zeppelin should have built-in workflow capabilities. 
> > Instead of relying on external software to schedule notes in zeppelin for the following reasons:
> > 1. Now that we have upgraded from the data processing era to the algorithm era,
> > After zeppelin has its own workflow, it will form a data loop.
> > 
> > 2. zeppelin's powerful interactive processing capabilities help algorithm engineers improve productivity and work.
> > Zeppelin should give the algorithm engineer more direct control.
> > Instead of handing the algorithm to other teams(or software) to do the workflow.
> > 
> > 3. zeppelin knows more about the processing status of data than Azkaban and airflow.
> > So the built-in workflow will have better performance, user experience and control.
> > 
> > If you are interested in workflow(ZEPPELIN-4018), 
> > I am willing to work with you to complete all system design and code development work.
> > 
> > :-)
> > 
> >> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <ma...@gmail.com> <mailto:zjffdu@gmail.com <ma...@gmail.com>>> 写道:
> >> 
> >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil,
> >> 
> >> Thanks for your interest in zeppelin, here's my comments about the tickets
> >> you interested.
> >> 
> >> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651> <https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651>>
> >>    This involves 2 sides of work: frontend and backend:
> >>    In frontend, we should use arrow js to handle the table data, include
> >> display it and processing it (such as aggregation)
> >>    In backend, we should use arrow for each language, and allow them to
> >> exchange data in the same process. And use arrow IPC to exchange data
> >> across processes.
> >>   Overall, this is a pretty large task. If you really want to do, I would
> >> suggest you to just take part of it.
> >> 
> >> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994> <https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994>>
> >>    Regarding model serving, I don't have clear picture about this. Others
> >> can comment on this.
> >> 
> >> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
> >>    Job scheduling is pretty important for zeppelin, I would make this as
> >> the highest priority for zeppelin among these tickets. airflow is one
> >> option, but I am open to other solutions. First we need to figure out how
> >> user schedule jobs in zeppelin, then choose the right framework. It would
> >> also involves some frontend work
> >> 
> >> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857> <https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857>>
> >>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
> >> supported yet. It won't be a big project for GSOC IMO.
> >> 
> >> 5. OLAP.
> >>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
> >> Zeppelin can support it very well. But we could create specific interpreter
> >> for OLAP engine if their native api perform better than jdbc. Another thing
> >> I can think of improving OLAP is visualization, although Zeppelin already
> >> support some built-in visualization, there's still some visualization
> >> missing. We could provide more.
> >> 
> >> 6. Auto-completions.
> >>   We have already support ipython[1]  in zeppelin which provide almost the
> >> same auto-completion like jupyter. But it lacks for accessing python api
> >> doc. This is also pretty important for python users IMO. SQL is another
> >> popular language in Zeppelin, but it also doesn't provide good
> >> code-completion experience, we can do better as well.
> >> 
> >> 7. Notifications.
> >>   I think notification can be integrated into job scheduling. Notification
> >> can be sent when job is failed/succeed.
> >> 
> >> 
> >> Let us know which jira you are more interested, and also please consider
> >> how much time you can spent on this. Again, we are very appreciated your
> >> interest on zeppelin and look forward your contribution.
> >> 
> >> 
> >> [1]
> >> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support> <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>>
> >> 
> >> 
> >> 
> >> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <ma...@phystech.edu> <mailto:morkovkin.vv@phystech.edu <ma...@phystech.edu>>> 于2019年3月6日周三
> >> 上午7:41写道:
> >> 
> >>> Thank you for your replies! I've checked existing set of issues and found
> >>> several curious ones:
> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651> <https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very
> >>> nice
> >>> way to increase analytical processing performance using Arrow project;
> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994> <https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models
> >>> regardless of ZeppelinServer sounds quite intriguing too. Although there is
> >>> much to think about;
> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> <https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance
> >>> https://airflow.apache.org/ <https://airflow.apache.org/> <https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be useful in implementing complex
> >>> execution workflows.
> >>> Those tasks are global and intriguing, requiring complex architectural
> >>> solutions.
> >>> Also I've probably found the ticket which is suitable for me to get
> >>> involved into the project:
> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857> <https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think?
> >>> Are there any "low hanging fruits"?
> >>> 
> >>> And I have several ideas on my own. Some of them might be not relevant due
> >>> to the vision of the project or other reasons. Just ideas:
> >>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
> >>> logical to add more integrations with existing OLAP solutions like Pinot,
> >>> ClickHouse and Druid. Currently I've found integration only with Kylin;
> >>> - Better autocompletion. Jupyter offers not only a list of already
> >>> initialized variables, but also quick access to documentation. It's
> >>> convenient;
> >>> - Notifications. Some colleagues would have appreciated the notifications
> >>> service, which sends you messages (via mail, Slack bot or something else)
> >>> indicating that your long-running paragraphs has completed.
> >>> 
> >>> Feedback is very appreciated :)
> >>> 
> >>> It would be wonderful if someone agreed to sacrifice his time and become a
> >>> mentor in GSOC program!
> >>> 
> >>> ----------------------------------------
> >>> Best regards, Basil Morkovkin.
> >>> 
> >>> 
> >>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com> <mailto:jongyoul@gmail.com <ma...@gmail.com>>>:
> >>> 
> >>>> Hello,
> >>>> 
> >>>> I've confirmed I could add more issues for GSOC. Can you explain what you
> >>>> would like to contribute to? I can add more issues
> >>>> 
> >>>> JL
> >>>> 
> >>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <ma...@163.com> <mailto:neliuxun@163.com <ma...@163.com>>> wrote:
> >>>> 
> >>>>> Hi, Vasiliy Morkovkin
> >>>>> 
> >>>>> Welcome to the zeppelin community! :-)
> >>>>> 
> >>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com> <mailto:jongyoul@gmail.com <ma...@gmail.com>>> 写道:
> >>>>>> 
> >>>>>> Thanks for contacting Zeppelin with your interest.
> >>>>>> 
> >>>>>> I added FE topics for GSOC because FE is the most urgent issue I have
> >>>>>> thought about. We always encourage to contribute Zeppelin with several
> >>>>>> topics including your idea.
> >>>>>> 
> >>>>>> Please describe something more.
> >>>>>> 
> >>>>>> Thanks.
> >>>>>> JL
> >>>>>> 
> >>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org <ma...@apache.org> <mailto:moon@apache.org <ma...@apache.org>>> wrote:
> >>>>>> 
> >>>>>>> Hi,
> >>>>>>> 
> >>>>>>> Great to see your interest to project. Thanks!
> >>>>>>> Looks like we need volunteers for a mentor and some backend subject
> >>> for
> >>>>>>> GSoC2019.
> >>>>>>> Any ideas?
> >>>>>>> 
> >>>>>>> Best,
> >>>>>>> moon
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
> >>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu> <mailto:morkovkin.vv@phystech.edu <ma...@phystech.edu>>>
> >>>>>>> wrote:
> >>>>>>> 
> >>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
> >>>>> physics
> >>>>>>>> and technology and eager to contribute to Zeppelin in context of
> >>> GSOC
> >>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
> >>>>> months,
> >>>>>>>> using it at my job. But I have found out only one ticket (front-end
> >>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
> >>>>>>>> ideas for new features or improvements in Zeppelin, but you don't
> >>> have
> >>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
> >>> mentor
> >>>>>>>> these ideas within GSOC :)
> >>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5
> >>>>> year.
> >>>>>>>> I also can write in Java or Python without any problems if
> >>> necessary.
> >>>>>>>> Really fond of databases and highload. Also I have experience with
> >>>>> some
> >>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
> >>>>>>>> 
> >>>>>>>> Best regards, Basil Morkovkin.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> --
> >>>>>> 이종열, Jongyoul Lee, 李宗烈
> >>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <http://madeng.net/>>
> >>>>> 
> >>>>> 
> >>>> 
> >>>> --
> >>>> 이종열, Jongyoul Lee, 李宗烈
> >>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <http://madeng.net/>>
> >>>> 
> >>> 
> >> 
> >> 
> >> -- 
> >> Best Regards
> >> 
> >> Jeff Zhang
> > 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Zeppelin in GSOC 2019

Posted by Морковкин, Василий Владимирович <mo...@phystech.edu>.
Hi! For fun I've sketched a toy-prototype of workflow manager in Scala. It
makes it easy to impose dependencies on the execution order of tasks. Check
this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ . It
reproduces the flow which is shown in the attached picture.
Xun Liu, It would be great to clarify whether you agree to be a mentor
exactly within GSOC, or without it? :)

----------------------------------------
Best regards, Basil Morkovkin

чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zj...@gmail.com>:

>
> Thanks Liu for taking over this, I will help review the design.
>
> Xun Liu <ne...@163.com> 于2019年3月7日周四 下午4:05写道:
>
>> Hi Vasiliy Morkovkin
>>
>> Thank you very much for your willingness to implement this feature of
>> workflow.
>> I will work with you with the highest priority.
>> I am planning to update the system design documentation for workflow
>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> .
>> Please set the Watcher in ZEPPELIN-4018.
>> This way you can get notification messages for document updates in a
>> timely manner.
>>
>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
>> If you need it, you can email me at liuxun323@gmail.com <mailto:
>> liuxun323@gmail.com> , I will reply you the fastest.
>> Do you think this kind of cooperation is OK?
>>
>>
>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
>> system design. Thanks!
>>
>> :-)
>>
>> > 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
>> morkovkin.vv@phystech.edu> 写道:
>> >
>> > Thank you for such a detailed feedback!
>> > I am definitely interested to work on the workflow implementation with
>> you Xun Liu! Could you become a mentor in GSOC with this task?
>> > Some front-end work is not a problem at all.
>> > I'm ready to work at least 30 hours per week in the summer, while now
>> I'd like to take some smaller tasks to take a closer look at existing
>> codebase and to get familiar with your development workflow. Do you have
>> such tasks on mind?
>> >
>> > ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <mailto:
>> neliuxun@163.com>>:
>> > Hi Vasiliy Morkovkin
>> >
>> > I said my thoughts on workflow,
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>> >
>> > Because there are more than 20 interpreters in zeppelin,
>> > Data analysts can be used to do a variety of data development,
>> > A lot of data development is interdependent. For example,
>> > the development of machine learning algorithms requires relying on
>> spark to preprocess data, and so on.
>> >
>> > Now open source workflow software has Azkaban, airflow,
>> > Azkaban is relatively simple and has been used to meet most scenarios,
>> and our company is using it.
>> > Airflow looks complicated and I have not used it.
>> > In fact, I have previously implemented workflow workflow for notes and
>> paragraphs in zeppelin via azkaban.
>> > https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
>> >
>> > However, I think zeppelin should have built-in workflow capabilities.
>> > Instead of relying on external software to schedule notes in zeppelin
>> for the following reasons:
>> > 1. Now that we have upgraded from the data processing era to the
>> algorithm era,
>> > After zeppelin has its own workflow, it will form a data loop.
>> >
>> > 2. zeppelin's powerful interactive processing capabilities help
>> algorithm engineers improve productivity and work.
>> > Zeppelin should give the algorithm engineer more direct control.
>> > Instead of handing the algorithm to other teams(or software) to do the
>> workflow.
>> >
>> > 3. zeppelin knows more about the processing status of data than Azkaban
>> and airflow.
>> > So the built-in workflow will have better performance, user experience
>> and control.
>> >
>> > If you are interested in workflow(ZEPPELIN-4018),
>> > I am willing to work with you to complete all system design and code
>> development work.
>> >
>> > :-)
>> >
>> >> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <mailto:
>> zjffdu@gmail.com>> 写道:
>> >>
>> >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> Basil,
>> >>
>> >> Thanks for your interest in zeppelin, here's my comments about the
>> tickets
>> >> you interested.
>> >>
>> >> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>
>> >>    This involves 2 sides of work: frontend and backend:
>> >>    In frontend, we should use arrow js to handle the table data,
>> include
>> >> display it and processing it (such as aggregation)
>> >>    In backend, we should use arrow for each language, and allow them to
>> >> exchange data in the same process. And use arrow IPC to exchange data
>> >> across processes.
>> >>   Overall, this is a pretty large task. If you really want to do, I
>> would
>> >> suggest you to just take part of it.
>> >>
>> >> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>
>> >>    Regarding model serving, I don't have clear picture about this.
>> Others
>> >> can comment on this.
>> >>
>> >> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>> >>    Job scheduling is pretty important for zeppelin, I would make this
>> as
>> >> the highest priority for zeppelin among these tickets. airflow is one
>> >> option, but I am open to other solutions. First we need to figure out
>> how
>> >> user schedule jobs in zeppelin, then choose the right framework. It
>> would
>> >> also involves some frontend work
>> >>
>> >> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>
>> >>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
>> >> supported yet. It won't be a big project for GSOC IMO.
>> >>
>> >> 5. OLAP.
>> >>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>> >> Zeppelin can support it very well. But we could create specific
>> interpreter
>> >> for OLAP engine if their native api perform better than jdbc. Another
>> thing
>> >> I can think of improving OLAP is visualization, although Zeppelin
>> already
>> >> support some built-in visualization, there's still some visualization
>> >> missing. We could provide more.
>> >>
>> >> 6. Auto-completions.
>> >>   We have already support ipython[1]  in zeppelin which provide almost
>> the
>> >> same auto-completion like jupyter. But it lacks for accessing python
>> api
>> >> doc. This is also pretty important for python users IMO. SQL is another
>> >> popular language in Zeppelin, but it also doesn't provide good
>> >> code-completion experience, we can do better as well.
>> >>
>> >> 7. Notifications.
>> >>   I think notification can be integrated into job scheduling.
>> Notification
>> >> can be sent when job is failed/succeed.
>> >>
>> >>
>> >> Let us know which jira you are more interested, and also please
>> consider
>> >> how much time you can spent on this. Again, we are very appreciated
>> your
>> >> interest on zeppelin and look forward your contribution.
>> >>
>> >>
>> >> [1]
>> >>
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> >
>> >>
>> >>
>> >>
>> >> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <mailto:
>> morkovkin.vv@phystech.edu>> 于2019年3月6日周三
>> >> 上午7:41写道:
>> >>
>> >>> Thank you for your replies! I've checked existing set of issues and
>> found
>> >>> several curious ones:
>> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> seems to be very
>> >>> nice
>> >>> way to increase analytical processing performance using Arrow project;
>> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> deploying models
>> >>> regardless of ZeppelinServer sounds quite intriguing too. Although
>> there is
>> >>> much to think about;
>> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> at first glance
>> >>> https://airflow.apache.org/ <https://airflow.apache.org/> seems to
>> be useful in implementing complex
>> >>> execution workflows.
>> >>> Those tasks are global and intriguing, requiring complex architectural
>> >>> solutions.
>> >>> Also I've probably found the ticket which is suitable for me to get
>> >>> involved into the project:
>> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>. What do you think?
>> >>> Are there any "low hanging fruits"?
>> >>>
>> >>> And I have several ideas on my own. Some of them might be not
>> relevant due
>> >>> to the vision of the project or other reasons. Just ideas:
>> >>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
>> >>> logical to add more integrations with existing OLAP solutions like
>> Pinot,
>> >>> ClickHouse and Druid. Currently I've found integration only with
>> Kylin;
>> >>> - Better autocompletion. Jupyter offers not only a list of already
>> >>> initialized variables, but also quick access to documentation. It's
>> >>> convenient;
>> >>> - Notifications. Some colleagues would have appreciated the
>> notifications
>> >>> service, which sends you messages (via mail, Slack bot or something
>> else)
>> >>> indicating that your long-running paragraphs has completed.
>> >>>
>> >>> Feedback is very appreciated :)
>> >>>
>> >>> It would be wonderful if someone agreed to sacrifice his time and
>> become a
>> >>> mentor in GSOC program!
>> >>>
>> >>> ----------------------------------------
>> >>> Best regards, Basil Morkovkin.
>> >>>
>> >>>
>> >>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com <mailto:
>> jongyoul@gmail.com>>:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> I've confirmed I could add more issues for GSOC. Can you explain
>> what you
>> >>>> would like to contribute to? I can add more issues
>> >>>>
>> >>>> JL
>> >>>>
>> >>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <mailto:
>> neliuxun@163.com>> wrote:
>> >>>>
>> >>>>> Hi, Vasiliy Morkovkin
>> >>>>>
>> >>>>> Welcome to the zeppelin community! :-)
>> >>>>>
>> >>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <mailto:
>> jongyoul@gmail.com>> 写道:
>> >>>>>>
>> >>>>>> Thanks for contacting Zeppelin with your interest.
>> >>>>>>
>> >>>>>> I added FE topics for GSOC because FE is the most urgent issue I
>> have
>> >>>>>> thought about. We always encourage to contribute Zeppelin with
>> several
>> >>>>>> topics including your idea.
>> >>>>>>
>> >>>>>> Please describe something more.
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>> JL
>> >>>>>>
>> >>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org
>> <ma...@apache.org>> wrote:
>> >>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> Great to see your interest to project. Thanks!
>> >>>>>>> Looks like we need volunteers for a mentor and some backend
>> subject
>> >>> for
>> >>>>>>> GSoC2019.
>> >>>>>>> Any ideas?
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> moon
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>> >>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu>>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>> >>>>> physics
>> >>>>>>>> and technology and eager to contribute to Zeppelin in context of
>> >>> GSOC
>> >>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>> >>>>> months,
>> >>>>>>>> using it at my job. But I have found out only one ticket
>> (front-end
>> >>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have
>> any
>> >>>>>>>> ideas for new features or improvements in Zeppelin, but you don't
>> >>> have
>> >>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>> >>> mentor
>> >>>>>>>> these ideas within GSOC :)
>> >>>>>>>> Currently I am in a position of Scala developer (back-end) for
>> 1.5
>> >>>>> year.
>> >>>>>>>> I also can write in Java or Python without any problems if
>> >>> necessary.
>> >>>>>>>> Really fond of databases and highload. Also I have experience
>> with
>> >>>>> some
>> >>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>> >>>>>>>>
>> >>>>>>>> Best regards, Basil Morkovkin.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> 이종열, Jongyoul Lee, 李宗烈
>> >>>>>> http://madeng.net <http://madeng.net/>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> --
>> >>>> 이종열, Jongyoul Lee, 李宗烈
>> >>>> http://madeng.net <http://madeng.net/>
>> >>>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Best Regards
>> >>
>> >> Jeff Zhang
>> >
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Zeppelin in GSOC 2019

Posted by Jeff Zhang <zj...@gmail.com>.
Thanks Liu for taking over this, I will help review the design.

Xun Liu <ne...@163.com> 于2019年3月7日周四 下午4:05写道:

> Hi Vasiliy Morkovkin
>
> Thank you very much for your willingness to implement this feature of
> workflow.
> I will work with you with the highest priority.
> I am planning to update the system design documentation for workflow first
> at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018> .
> Please set the Watcher in ZEPPELIN-4018.
> This way you can get notification messages for document updates in a
> timely manner.
>
> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
> If you need it, you can email me at liuxun323@gmail.com <mailto:
> liuxun323@gmail.com> , I will reply you the fastest.
> Do you think this kind of cooperation is OK?
>
>
> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
> system design. Thanks!
>
> :-)
>
> > 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
> morkovkin.vv@phystech.edu> 写道:
> >
> > Thank you for such a detailed feedback!
> > I am definitely interested to work on the workflow implementation with
> you Xun Liu! Could you become a mentor in GSOC with this task?
> > Some front-end work is not a problem at all.
> > I'm ready to work at least 30 hours per week in the summer, while now
> I'd like to take some smaller tasks to take a closer look at existing
> codebase and to get familiar with your development workflow. Do you have
> such tasks on mind?
> >
> > ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <mailto:
> neliuxun@163.com>>:
> > Hi Vasiliy Morkovkin
> >
> > I said my thoughts on workflow,
> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> >
> > Because there are more than 20 interpreters in zeppelin,
> > Data analysts can be used to do a variety of data development,
> > A lot of data development is interdependent. For example,
> > the development of machine learning algorithms requires relying on spark
> to preprocess data, and so on.
> >
> > Now open source workflow software has Azkaban, airflow,
> > Azkaban is relatively simple and has been used to meet most scenarios,
> and our company is using it.
> > Airflow looks complicated and I have not used it.
> > In fact, I have previously implemented workflow workflow for notes and
> paragraphs in zeppelin via azkaban.
> > https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
> >
> > However, I think zeppelin should have built-in workflow capabilities.
> > Instead of relying on external software to schedule notes in zeppelin
> for the following reasons:
> > 1. Now that we have upgraded from the data processing era to the
> algorithm era,
> > After zeppelin has its own workflow, it will form a data loop.
> >
> > 2. zeppelin's powerful interactive processing capabilities help
> algorithm engineers improve productivity and work.
> > Zeppelin should give the algorithm engineer more direct control.
> > Instead of handing the algorithm to other teams(or software) to do the
> workflow.
> >
> > 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> > So the built-in workflow will have better performance, user experience
> and control.
> >
> > If you are interested in workflow(ZEPPELIN-4018),
> > I am willing to work with you to complete all system design and code
> development work.
> >
> > :-)
> >
> >> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <mailto:
> zjffdu@gmail.com>> 写道:
> >>
> >> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> Basil,
> >>
> >> Thanks for your interest in zeppelin, here's my comments about the
> tickets
> >> you interested.
> >>
> >> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3651>
> >>    This involves 2 sides of work: frontend and backend:
> >>    In frontend, we should use arrow js to handle the table data, include
> >> display it and processing it (such as aggregation)
> >>    In backend, we should use arrow for each language, and allow them to
> >> exchange data in the same process. And use arrow IPC to exchange data
> >> across processes.
> >>   Overall, this is a pretty large task. If you really want to do, I
> would
> >> suggest you to just take part of it.
> >>
> >> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3994>
> >>    Regarding model serving, I don't have clear picture about this.
> Others
> >> can comment on this.
> >>
> >> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> >>    Job scheduling is pretty important for zeppelin, I would make this as
> >> the highest priority for zeppelin among these tickets. airflow is one
> >> option, but I am open to other solutions. First we need to figure out
> how
> >> user schedule jobs in zeppelin, then choose the right framework. It
> would
> >> also involves some frontend work
> >>
> >> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857>
> >>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
> >> supported yet. It won't be a big project for GSOC IMO.
> >>
> >> 5. OLAP.
> >>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
> >> Zeppelin can support it very well. But we could create specific
> interpreter
> >> for OLAP engine if their native api perform better than jdbc. Another
> thing
> >> I can think of improving OLAP is visualization, although Zeppelin
> already
> >> support some built-in visualization, there's still some visualization
> >> missing. We could provide more.
> >>
> >> 6. Auto-completions.
> >>   We have already support ipython[1]  in zeppelin which provide almost
> the
> >> same auto-completion like jupyter. But it lacks for accessing python api
> >> doc. This is also pretty important for python users IMO. SQL is another
> >> popular language in Zeppelin, but it also doesn't provide good
> >> code-completion experience, we can do better as well.
> >>
> >> 7. Notifications.
> >>   I think notification can be integrated into job scheduling.
> Notification
> >> can be sent when job is failed/succeed.
> >>
> >>
> >> Let us know which jira you are more interested, and also please consider
> >> how much time you can spent on this. Again, we are very appreciated your
> >> interest on zeppelin and look forward your contribution.
> >>
> >>
> >> [1]
> >>
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
> <
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
> >
> >>
> >>
> >>
> >> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <mailto:
> morkovkin.vv@phystech.edu>> 于2019年3月6日周三
> >> 上午7:41写道:
> >>
> >>> Thank you for your replies! I've checked existing set of issues and
> found
> >>> several curious ones:
> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3651> seems to be very
> >>> nice
> >>> way to increase analytical processing performance using Arrow project;
> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3994> deploying models
> >>> regardless of ZeppelinServer sounds quite intriguing too. Although
> there is
> >>> much to think about;
> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018> at first glance
> >>> https://airflow.apache.org/ <https://airflow.apache.org/> seems to be
> useful in implementing complex
> >>> execution workflows.
> >>> Those tasks are global and intriguing, requiring complex architectural
> >>> solutions.
> >>> Also I've probably found the ticket which is suitable for me to get
> >>> involved into the project:
> >>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
> https://issues.apache.org/jira/browse/ZEPPELIN-3857>. What do you think?
> >>> Are there any "low hanging fruits"?
> >>>
> >>> And I have several ideas on my own. Some of them might be not relevant
> due
> >>> to the vision of the project or other reasons. Just ideas:
> >>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
> >>> logical to add more integrations with existing OLAP solutions like
> Pinot,
> >>> ClickHouse and Druid. Currently I've found integration only with Kylin;
> >>> - Better autocompletion. Jupyter offers not only a list of already
> >>> initialized variables, but also quick access to documentation. It's
> >>> convenient;
> >>> - Notifications. Some colleagues would have appreciated the
> notifications
> >>> service, which sends you messages (via mail, Slack bot or something
> else)
> >>> indicating that your long-running paragraphs has completed.
> >>>
> >>> Feedback is very appreciated :)
> >>>
> >>> It would be wonderful if someone agreed to sacrifice his time and
> become a
> >>> mentor in GSOC program!
> >>>
> >>> ----------------------------------------
> >>> Best regards, Basil Morkovkin.
> >>>
> >>>
> >>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com <mailto:
> jongyoul@gmail.com>>:
> >>>
> >>>> Hello,
> >>>>
> >>>> I've confirmed I could add more issues for GSOC. Can you explain what
> you
> >>>> would like to contribute to? I can add more issues
> >>>>
> >>>> JL
> >>>>
> >>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <mailto:
> neliuxun@163.com>> wrote:
> >>>>
> >>>>> Hi, Vasiliy Morkovkin
> >>>>>
> >>>>> Welcome to the zeppelin community! :-)
> >>>>>
> >>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <mailto:
> jongyoul@gmail.com>> 写道:
> >>>>>>
> >>>>>> Thanks for contacting Zeppelin with your interest.
> >>>>>>
> >>>>>> I added FE topics for GSOC because FE is the most urgent issue I
> have
> >>>>>> thought about. We always encourage to contribute Zeppelin with
> several
> >>>>>> topics including your idea.
> >>>>>>
> >>>>>> Please describe something more.
> >>>>>>
> >>>>>> Thanks.
> >>>>>> JL
> >>>>>>
> >>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org
> <ma...@apache.org>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Great to see your interest to project. Thanks!
> >>>>>>> Looks like we need volunteers for a mentor and some backend subject
> >>> for
> >>>>>>> GSoC2019.
> >>>>>>> Any ideas?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> moon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
> >>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
> >>>>> physics
> >>>>>>>> and technology and eager to contribute to Zeppelin in context of
> >>> GSOC
> >>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
> >>>>> months,
> >>>>>>>> using it at my job. But I have found out only one ticket
> (front-end
> >>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have
> any
> >>>>>>>> ideas for new features or improvements in Zeppelin, but you don't
> >>> have
> >>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
> >>> mentor
> >>>>>>>> these ideas within GSOC :)
> >>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5
> >>>>> year.
> >>>>>>>> I also can write in Java or Python without any problems if
> >>> necessary.
> >>>>>>>> Really fond of databases and highload. Also I have experience with
> >>>>> some
> >>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
> >>>>>>>>
> >>>>>>>> Best regards, Basil Morkovkin.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> 이종열, Jongyoul Lee, 李宗烈
> >>>>>> http://madeng.net <http://madeng.net/>
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> 이종열, Jongyoul Lee, 李宗烈
> >>>> http://madeng.net <http://madeng.net/>
> >>>>
> >>>
> >>
> >>
> >> --
> >> Best Regards
> >>
> >> Jeff Zhang
> >
>
>

-- 
Best Regards

Jeff Zhang

Re: Zeppelin in GSOC 2019

Posted by Xun Liu <ne...@163.com>.
Hi Vasiliy Morkovkin

Thank you very much for your willingness to implement this feature of workflow.
I will work with you with the highest priority.
I am planning to update the system design documentation for workflow first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> .
Please set the Watcher in ZEPPELIN-4018.
This way you can get notification messages for document updates in a timely manner.

We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
If you need it, you can email me at liuxun323@gmail.com <ma...@gmail.com> , I will reply you the fastest.
Do you think this kind of cooperation is OK?


@moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our system design. Thanks!

:-)

> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <mo...@phystech.edu> 写道:
> 
> Thank you for such a detailed feedback!
> I am definitely interested to work on the workflow implementation with you Xun Liu! Could you become a mentor in GSOC with this task?
> Some front-end work is not a problem at all.
> I'm ready to work at least 30 hours per week in the summer, while now I'd like to take some smaller tasks to take a closer look at existing codebase and to get familiar with your development workflow. Do you have such tasks on mind?
> 
> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliuxun@163.com <ma...@163.com>>:
> Hi Vasiliy Morkovkin
> 
> I said my thoughts on workflow, https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> 
> 
> Because there are more than 20 interpreters in zeppelin, 
> Data analysts can be used to do a variety of data development,
> A lot of data development is interdependent. For example, 
> the development of machine learning algorithms requires relying on spark to preprocess data, and so on.
> 
> Now open source workflow software has Azkaban, airflow,
> Azkaban is relatively simple and has been used to meet most scenarios, and our company is using it.
> Airflow looks complicated and I have not used it.
> In fact, I have previously implemented workflow workflow for notes and paragraphs in zeppelin via azkaban.
> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> 
> 
> However, I think zeppelin should have built-in workflow capabilities. 
> Instead of relying on external software to schedule notes in zeppelin for the following reasons:
> 1. Now that we have upgraded from the data processing era to the algorithm era,
> After zeppelin has its own workflow, it will form a data loop.
> 
> 2. zeppelin's powerful interactive processing capabilities help algorithm engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control.
> Instead of handing the algorithm to other teams(or software) to do the workflow.
> 
> 3. zeppelin knows more about the processing status of data than Azkaban and airflow.
> So the built-in workflow will have better performance, user experience and control.
> 
> If you are interested in workflow(ZEPPELIN-4018), 
> I am willing to work with you to complete all system design and code development work.
> 
> :-)
> 
>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjffdu@gmail.com <ma...@gmail.com>> 写道:
>> 
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> Basil,
>> 
>> Thanks for your interest in zeppelin, here's my comments about the tickets
>> you interested.
>> 
>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651>
>>    This involves 2 sides of work: frontend and backend:
>>    In frontend, we should use arrow js to handle the table data, include
>> display it and processing it (such as aggregation)
>>    In backend, we should use arrow for each language, and allow them to
>> exchange data in the same process. And use arrow IPC to exchange data
>> across processes.
>>   Overall, this is a pretty large task. If you really want to do, I would
>> suggest you to just take part of it.
>> 
>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994>
>>    Regarding model serving, I don't have clear picture about this. Others
>> can comment on this.
>> 
>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>    Job scheduling is pretty important for zeppelin, I would make this as
>> the highest priority for zeppelin among these tickets. airflow is one
>> option, but I am open to other solutions. First we need to figure out how
>> user schedule jobs in zeppelin, then choose the right framework. It would
>> also involves some frontend work
>> 
>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857>
>>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
>> supported yet. It won't be a big project for GSOC IMO.
>> 
>> 5. OLAP.
>>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>> Zeppelin can support it very well. But we could create specific interpreter
>> for OLAP engine if their native api perform better than jdbc. Another thing
>> I can think of improving OLAP is visualization, although Zeppelin already
>> support some built-in visualization, there's still some visualization
>> missing. We could provide more.
>> 
>> 6. Auto-completions.
>>   We have already support ipython[1]  in zeppelin which provide almost the
>> same auto-completion like jupyter. But it lacks for accessing python api
>> doc. This is also pretty important for python users IMO. SQL is another
>> popular language in Zeppelin, but it also doesn't provide good
>> code-completion experience, we can do better as well.
>> 
>> 7. Notifications.
>>   I think notification can be integrated into job scheduling. Notification
>> can be sent when job is failed/succeed.
>> 
>> 
>> Let us know which jira you are more interested, and also please consider
>> how much time you can spent on this. Again, we are very appreciated your
>> interest on zeppelin and look forward your contribution.
>> 
>> 
>> [1]
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support <http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>
>> 
>> 
>> 
>> Морковкин, Василий Владимирович <morkovkin.vv@phystech.edu <ma...@phystech.edu>> 于2019年3月6日周三
>> 上午7:41写道:
>> 
>>> Thank you for your replies! I've checked existing set of issues and found
>>> several curious ones:
>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <https://issues.apache.org/jira/browse/ZEPPELIN-3651> seems to be very
>>> nice
>>> way to increase analytical processing performance using Arrow project;
>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <https://issues.apache.org/jira/browse/ZEPPELIN-3994> deploying models
>>> regardless of ZeppelinServer sounds quite intriguing too. Although there is
>>> much to think about;
>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> at first glance
>>> https://airflow.apache.org/ <https://airflow.apache.org/> seems to be useful in implementing complex
>>> execution workflows.
>>> Those tasks are global and intriguing, requiring complex architectural
>>> solutions.
>>> Also I've probably found the ticket which is suitable for me to get
>>> involved into the project:
>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <https://issues.apache.org/jira/browse/ZEPPELIN-3857>. What do you think?
>>> Are there any "low hanging fruits"?
>>> 
>>> And I have several ideas on my own. Some of them might be not relevant due
>>> to the vision of the project or other reasons. Just ideas:
>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
>>> logical to add more integrations with existing OLAP solutions like Pinot,
>>> ClickHouse and Druid. Currently I've found integration only with Kylin;
>>> - Better autocompletion. Jupyter offers not only a list of already
>>> initialized variables, but also quick access to documentation. It's
>>> convenient;
>>> - Notifications. Some colleagues would have appreciated the notifications
>>> service, which sends you messages (via mail, Slack bot or something else)
>>> indicating that your long-running paragraphs has completed.
>>> 
>>> Feedback is very appreciated :)
>>> 
>>> It would be wonderful if someone agreed to sacrifice his time and become a
>>> mentor in GSOC program!
>>> 
>>> ----------------------------------------
>>> Best regards, Basil Morkovkin.
>>> 
>>> 
>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com>>:
>>> 
>>>> Hello,
>>>> 
>>>> I've confirmed I could add more issues for GSOC. Can you explain what you
>>>> would like to contribute to? I can add more issues
>>>> 
>>>> JL
>>>> 
>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliuxun@163.com <ma...@163.com>> wrote:
>>>> 
>>>>> Hi, Vasiliy Morkovkin
>>>>> 
>>>>> Welcome to the zeppelin community! :-)
>>>>> 
>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongyoul@gmail.com <ma...@gmail.com>> 写道:
>>>>>> 
>>>>>> Thanks for contacting Zeppelin with your interest.
>>>>>> 
>>>>>> I added FE topics for GSOC because FE is the most urgent issue I have
>>>>>> thought about. We always encourage to contribute Zeppelin with several
>>>>>> topics including your idea.
>>>>>> 
>>>>>> Please describe something more.
>>>>>> 
>>>>>> Thanks.
>>>>>> JL
>>>>>> 
>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Great to see your interest to project. Thanks!
>>>>>>> Looks like we need volunteers for a mentor and some backend subject
>>> for
>>>>>>> GSoC2019.
>>>>>>> Any ideas?
>>>>>>> 
>>>>>>> Best,
>>>>>>> moon
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>>> morkovkin.vv@phystech.edu <ma...@phystech.edu>>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>>> physics
>>>>>>>> and technology and eager to contribute to Zeppelin in context of
>>> GSOC
>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>>> months,
>>>>>>>> using it at my job. But I have found out only one ticket (front-end
>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
>>>>>>>> ideas for new features or improvements in Zeppelin, but you don't
>>> have
>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>>> mentor
>>>>>>>> these ideas within GSOC :)
>>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5
>>>>> year.
>>>>>>>> I also can write in Java or Python without any problems if
>>> necessary.
>>>>>>>> Really fond of databases and highload. Also I have experience with
>>>>> some
>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>>> 
>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>> http://madeng.net <http://madeng.net/>
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> 이종열, Jongyoul Lee, 李宗烈
>>>> http://madeng.net <http://madeng.net/>
>>>> 
>>> 
>> 
>> 
>> -- 
>> Best Regards
>> 
>> Jeff Zhang
> 


Re: Zeppelin in GSOC 2019

Posted by Морковкин, Василий Владимирович <mo...@phystech.edu>.
Thank you for such a detailed feedback!
I am definitely interested to work on the workflow implementation with you
Xun Liu! Could you become a mentor in GSOC with this task?
Some front-end work is not a problem at all.
I'm ready to work at least 30 hours per week in the summer, while now I'd
like to take some smaller tasks to take a closer look at existing codebase
and to get familiar with your development workflow. Do you have such tasks
on mind?

ср, 6 мар. 2019 г. в 05:23, Xun Liu <ne...@163.com>:

> Hi Vasiliy Morkovkin
>
> I said my thoughts on workflow,
> https://issues.apache.org/jira/browse/ZEPPELIN-4018
>
> Because there are more than 20 interpreters in zeppelin,
> Data analysts can be used to do a variety of data development,
> A lot of data development is interdependent. For example,
> the development of machine learning algorithms requires relying on spark
> to preprocess data, and so on.
>
> Now open source workflow software has Azkaban, airflow,
> Azkaban is relatively simple and has been used to meet most scenarios, and
> our company is using it.
> Airflow looks complicated and I have not used it.
> In fact, I have previously implemented workflow workflow for notes and
> paragraphs in zeppelin via azkaban.
> https://youtu.be/2r6q-2Tq7hk?t=33
>
> However, I think zeppelin should have built-in workflow capabilities.
> Instead of relying on external software to schedule notes in zeppelin for
> the following reasons:
> 1. Now that we have upgraded from the data processing era to the algorithm
> era,
> After zeppelin has its own workflow, it will form a data loop.
>
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control.
> Instead of handing the algorithm to other teams(or software) to do the
> workflow.
>
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> If you are interested in workflow(ZEPPELIN-4018),
> I am willing to work with you to complete all system design and code
> development work.
>
> :-)
>
> 在 2019年3月6日,上午9:32,Jeff Zhang <zj...@gmail.com> 写道:
>
> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi Basil,
>
> Thanks for your interest in zeppelin, here's my comments about the tickets
> you interested.
>
> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651
>    This involves 2 sides of work: frontend and backend:
>    In frontend, we should use arrow js to handle the table data, include
> display it and processing it (such as aggregation)
>    In backend, we should use arrow for each language, and allow them to
> exchange data in the same process. And use arrow IPC to exchange data
> across processes.
>   Overall, this is a pretty large task. If you really want to do, I would
> suggest you to just take part of it.
>
> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994
>    Regarding model serving, I don't have clear picture about this. Others
> can comment on this.
>
> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018
>    Job scheduling is pretty important for zeppelin, I would make this as
> the highest priority for zeppelin among these tickets. airflow is one
> option, but I am open to other solutions. First we need to figure out how
> user schedule jobs in zeppelin, then choose the right framework. It would
> also involves some frontend work
>
> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857
>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
> supported yet. It won't be a big project for GSOC IMO.
>
> 5. OLAP.
>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
> Zeppelin can support it very well. But we could create specific interpreter
> for OLAP engine if their native api perform better than jdbc. Another thing
> I can think of improving OLAP is visualization, although Zeppelin already
> support some built-in visualization, there's still some visualization
> missing. We could provide more.
>
> 6. Auto-completions.
>   We have already support ipython[1]  in zeppelin which provide almost the
> same auto-completion like jupyter. But it lacks for accessing python api
> doc. This is also pretty important for python users IMO. SQL is another
> popular language in Zeppelin, but it also doesn't provide good
> code-completion experience, we can do better as well.
>
> 7. Notifications.
>   I think notification can be integrated into job scheduling. Notification
> can be sent when job is failed/succeed.
>
>
> Let us know which jira you are more interested, and also please consider
> how much time you can spent on this. Again, we are very appreciated your
> interest on zeppelin and look forward your contribution.
>
>
> [1]
>
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>
>
>
> Морковкин, Василий Владимирович <mo...@phystech.edu> 于2019年3月6日周三
> 上午7:41写道:
>
> Thank you for your replies! I've checked existing set of issues and found
> several curious ones:
> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 seems to be very
> nice
> way to increase analytical processing performance using Arrow project;
> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 deploying models
> regardless of ZeppelinServer sounds quite intriguing too. Although there is
> much to think about;
> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 at first glance
> https://airflow.apache.org/ seems to be useful in implementing complex
> execution workflows.
> Those tasks are global and intriguing, requiring complex architectural
> solutions.
> Also I've probably found the ticket which is suitable for me to get
> involved into the project:
> - https://issues.apache.org/jira/browse/ZEPPELIN-3857. What do you think?
> Are there any "low hanging fruits"?
>
> And I have several ideas on my own. Some of them might be not relevant due
> to the vision of the project or other reasons. Just ideas:
> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
> logical to add more integrations with existing OLAP solutions like Pinot,
> ClickHouse and Druid. Currently I've found integration only with Kylin;
> - Better autocompletion. Jupyter offers not only a list of already
> initialized variables, but also quick access to documentation. It's
> convenient;
> - Notifications. Some colleagues would have appreciated the notifications
> service, which sends you messages (via mail, Slack bot or something else)
> indicating that your long-running paragraphs has completed.
>
> Feedback is very appreciated :)
>
> It would be wonderful if someone agreed to sacrifice his time and become a
> mentor in GSOC program!
>
> ----------------------------------------
> Best regards, Basil Morkovkin.
>
>
> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jo...@gmail.com>:
>
> Hello,
>
> I've confirmed I could add more issues for GSOC. Can you explain what you
> would like to contribute to? I can add more issues
>
> JL
>
> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <ne...@163.com> wrote:
>
> Hi, Vasiliy Morkovkin
>
> Welcome to the zeppelin community! :-)
>
> 在 2019年3月5日,上午11:49,Jongyoul Lee <jo...@gmail.com> 写道:
>
> Thanks for contacting Zeppelin with your interest.
>
> I added FE topics for GSOC because FE is the most urgent issue I have
> thought about. We always encourage to contribute Zeppelin with several
> topics including your idea.
>
> Please describe something more.
>
> Thanks.
> JL
>
> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <mo...@apache.org> wrote:
>
> Hi,
>
> Great to see your interest to project. Thanks!
> Looks like we need volunteers for a mentor and some backend subject
>
> for
>
> GSoC2019.
> Any ideas?
>
> Best,
> moon
>
>
>
>
> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
> morkovkin.vv@phystech.edu>
> wrote:
>
> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>
> physics
>
> and technology and eager to contribute to Zeppelin in context of
>
> GSOC
>
> 2019. I've become a real fan of Zeppelin over the past couple of
>
> months,
>
> using it at my job. But I have found out only one ticket (front-end
> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
> ideas for new features or improvements in Zeppelin, but you don't
>
> have
>
> enough hands on them. It would be wonderful if anyone agreed to
>
> mentor
>
> these ideas within GSOC :)
> Currently I am in a position of Scala developer (back-end) for 1.5
>
> year.
>
> I also can write in Java or Python without any problems if
>
> necessary.
>
> Really fond of databases and highload. Also I have experience with
>
> some
>
> other great Apache projects like Cassandra, Kafka and Spark.
>
> Best regards, Basil Morkovkin.
>
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>

Re: Zeppelin in GSOC 2019

Posted by Xun Liu <ne...@163.com>.
Hi Vasiliy Morkovkin

I said my thoughts on workflow, https://issues.apache.org/jira/browse/ZEPPELIN-4018 <https://issues.apache.org/jira/browse/ZEPPELIN-4018> 

Because there are more than 20 interpreters in zeppelin, 
Data analysts can be used to do a variety of data development,
A lot of data development is interdependent. For example, 
the development of machine learning algorithms requires relying on spark to preprocess data, and so on.

Now open source workflow software has Azkaban, airflow,
Azkaban is relatively simple and has been used to meet most scenarios, and our company is using it.
Airflow looks complicated and I have not used it.
In fact, I have previously implemented workflow workflow for notes and paragraphs in zeppelin via azkaban.
https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33> 

However, I think zeppelin should have built-in workflow capabilities. 
Instead of relying on external software to schedule notes in zeppelin for the following reasons:
1. Now that we have upgraded from the data processing era to the algorithm era,
After zeppelin has its own workflow, it will form a data loop.

2. zeppelin's powerful interactive processing capabilities help algorithm engineers improve productivity and work.
Zeppelin should give the algorithm engineer more direct control.
Instead of handing the algorithm to other teams(or software) to do the workflow.

3. zeppelin knows more about the processing status of data than Azkaban and airflow.
So the built-in workflow will have better performance, user experience and control.

If you are interested in workflow(ZEPPELIN-4018), 
I am willing to work with you to complete all system design and code development work.

:-)

> 在 2019年3月6日,上午9:32,Jeff Zhang <zj...@gmail.com> 写道:
> 
> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi Basil,
> 
> Thanks for your interest in zeppelin, here's my comments about the tickets
> you interested.
> 
> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651
>    This involves 2 sides of work: frontend and backend:
>    In frontend, we should use arrow js to handle the table data, include
> display it and processing it (such as aggregation)
>    In backend, we should use arrow for each language, and allow them to
> exchange data in the same process. And use arrow IPC to exchange data
> across processes.
>   Overall, this is a pretty large task. If you really want to do, I would
> suggest you to just take part of it.
> 
> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994
>    Regarding model serving, I don't have clear picture about this. Others
> can comment on this.
> 
> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018
>    Job scheduling is pretty important for zeppelin, I would make this as
> the highest priority for zeppelin among these tickets. airflow is one
> option, but I am open to other solutions. First we need to figure out how
> user schedule jobs in zeppelin, then choose the right framework. It would
> also involves some frontend work
> 
> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857
>    Spark 2.4.0 supporting is already there, but scala 2.12 is not
> supported yet. It won't be a big project for GSOC IMO.
> 
> 5. OLAP.
>    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
> Zeppelin can support it very well. But we could create specific interpreter
> for OLAP engine if their native api perform better than jdbc. Another thing
> I can think of improving OLAP is visualization, although Zeppelin already
> support some built-in visualization, there's still some visualization
> missing. We could provide more.
> 
> 6. Auto-completions.
>   We have already support ipython[1]  in zeppelin which provide almost the
> same auto-completion like jupyter. But it lacks for accessing python api
> doc. This is also pretty important for python users IMO. SQL is another
> popular language in Zeppelin, but it also doesn't provide good
> code-completion experience, we can do better as well.
> 
> 7. Notifications.
>   I think notification can be integrated into job scheduling. Notification
> can be sent when job is failed/succeed.
> 
> 
> Let us know which jira you are more interested, and also please consider
> how much time you can spent on this. Again, we are very appreciated your
> interest on zeppelin and look forward your contribution.
> 
> 
> [1]
> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
> 
> 
> 
> Морковкин, Василий Владимирович <mo...@phystech.edu> 于2019年3月6日周三
> 上午7:41写道:
> 
>> Thank you for your replies! I've checked existing set of issues and found
>> several curious ones:
>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 seems to be very
>> nice
>> way to increase analytical processing performance using Arrow project;
>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 deploying models
>> regardless of ZeppelinServer sounds quite intriguing too. Although there is
>> much to think about;
>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 at first glance
>> https://airflow.apache.org/ seems to be useful in implementing complex
>> execution workflows.
>> Those tasks are global and intriguing, requiring complex architectural
>> solutions.
>> Also I've probably found the ticket which is suitable for me to get
>> involved into the project:
>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857. What do you think?
>> Are there any "low hanging fruits"?
>> 
>> And I have several ideas on my own. Some of them might be not relevant due
>> to the vision of the project or other reasons. Just ideas:
>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
>> logical to add more integrations with existing OLAP solutions like Pinot,
>> ClickHouse and Druid. Currently I've found integration only with Kylin;
>> - Better autocompletion. Jupyter offers not only a list of already
>> initialized variables, but also quick access to documentation. It's
>> convenient;
>> - Notifications. Some colleagues would have appreciated the notifications
>> service, which sends you messages (via mail, Slack bot or something else)
>> indicating that your long-running paragraphs has completed.
>> 
>> Feedback is very appreciated :)
>> 
>> It would be wonderful if someone agreed to sacrifice his time and become a
>> mentor in GSOC program!
>> 
>> ----------------------------------------
>> Best regards, Basil Morkovkin.
>> 
>> 
>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jo...@gmail.com>:
>> 
>>> Hello,
>>> 
>>> I've confirmed I could add more issues for GSOC. Can you explain what you
>>> would like to contribute to? I can add more issues
>>> 
>>> JL
>>> 
>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <ne...@163.com> wrote:
>>> 
>>>> Hi, Vasiliy Morkovkin
>>>> 
>>>> Welcome to the zeppelin community! :-)
>>>> 
>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jo...@gmail.com> 写道:
>>>>> 
>>>>> Thanks for contacting Zeppelin with your interest.
>>>>> 
>>>>> I added FE topics for GSOC because FE is the most urgent issue I have
>>>>> thought about. We always encourage to contribute Zeppelin with several
>>>>> topics including your idea.
>>>>> 
>>>>> Please describe something more.
>>>>> 
>>>>> Thanks.
>>>>> JL
>>>>> 
>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <mo...@apache.org> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Great to see your interest to project. Thanks!
>>>>>> Looks like we need volunteers for a mentor and some backend subject
>> for
>>>>>> GSoC2019.
>>>>>> Any ideas?
>>>>>> 
>>>>>> Best,
>>>>>> moon
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>> morkovkin.vv@phystech.edu>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>> physics
>>>>>>> and technology and eager to contribute to Zeppelin in context of
>> GSOC
>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>> months,
>>>>>>> using it at my job. But I have found out only one ticket (front-end
>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
>>>>>>> ideas for new features or improvements in Zeppelin, but you don't
>> have
>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>> mentor
>>>>>>> these ideas within GSOC :)
>>>>>>> Currently I am in a position of Scala developer (back-end) for 1.5
>>>> year.
>>>>>>> I also can write in Java or Python without any problems if
>> necessary.
>>>>>>> Really fond of databases and highload. Also I have experience with
>>>> some
>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>> 
>>>>>>> Best regards, Basil Morkovkin.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>> http://madeng.net
>>>> 
>>>> 
>>> 
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>> 
>> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Zeppelin in GSOC 2019

Posted by Jeff Zhang <zj...@gmail.com>.
https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi Basil,

Thanks for your interest in zeppelin, here's my comments about the tickets
you interested.

1. https://issues.apache.org/jira/browse/ZEPPELIN-3651
    This involves 2 sides of work: frontend and backend:
    In frontend, we should use arrow js to handle the table data, include
display it and processing it (such as aggregation)
    In backend, we should use arrow for each language, and allow them to
exchange data in the same process. And use arrow IPC to exchange data
across processes.
   Overall, this is a pretty large task. If you really want to do, I would
suggest you to just take part of it.

2. https://issues.apache.org/jira/browse/ZEPPELIN-3994
    Regarding model serving, I don't have clear picture about this. Others
can comment on this.

3. https://issues.apache.org/jira/browse/ZEPPELIN-4018
    Job scheduling is pretty important for zeppelin, I would make this as
the highest priority for zeppelin among these tickets. airflow is one
option, but I am open to other solutions. First we need to figure out how
user schedule jobs in zeppelin, then choose the right framework. It would
also involves some frontend work

4. https://issues.apache.org/jira/browse/ZEPPELIN-3857
    Spark 2.4.0 supporting is already there, but scala 2.12 is not
supported yet. It won't be a big project for GSOC IMO.

5. OLAP.
    Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
Zeppelin can support it very well. But we could create specific interpreter
for OLAP engine if their native api perform better than jdbc. Another thing
I can think of improving OLAP is visualization, although Zeppelin already
support some built-in visualization, there's still some visualization
missing. We could provide more.

6. Auto-completions.
   We have already support ipython[1]  in zeppelin which provide almost the
same auto-completion like jupyter. But it lacks for accessing python api
doc. This is also pretty important for python users IMO. SQL is another
popular language in Zeppelin, but it also doesn't provide good
code-completion experience, we can do better as well.

7. Notifications.
   I think notification can be integrated into job scheduling. Notification
can be sent when job is failed/succeed.


Let us know which jira you are more interested, and also please consider
how much time you can spent on this. Again, we are very appreciated your
interest on zeppelin and look forward your contribution.


[1]
http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support



Морковкин, Василий Владимирович <mo...@phystech.edu> 于2019年3月6日周三
上午7:41写道:

> Thank you for your replies! I've checked existing set of issues and found
> several curious ones:
> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 seems to be very
> nice
> way to increase analytical processing performance using Arrow project;
> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 deploying models
> regardless of ZeppelinServer sounds quite intriguing too. Although there is
> much to think about;
> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 at first glance
> https://airflow.apache.org/ seems to be useful in implementing complex
> execution workflows.
> Those tasks are global and intriguing, requiring complex architectural
> solutions.
> Also I've probably found the ticket which is suitable for me to get
> involved into the project:
> - https://issues.apache.org/jira/browse/ZEPPELIN-3857. What do you think?
> Are there any "low hanging fruits"?
>
> And I have several ideas on my own. Some of them might be not relevant due
> to the vision of the project or other reasons. Just ideas:
> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
> logical to add more integrations with existing OLAP solutions like Pinot,
> ClickHouse and Druid. Currently I've found integration only with Kylin;
> - Better autocompletion. Jupyter offers not only a list of already
> initialized variables, but also quick access to documentation. It's
> convenient;
> - Notifications. Some colleagues would have appreciated the notifications
> service, which sends you messages (via mail, Slack bot or something else)
> indicating that your long-running paragraphs has completed.
>
> Feedback is very appreciated :)
>
> It would be wonderful if someone agreed to sacrifice his time and become a
> mentor in GSOC program!
>
> ----------------------------------------
> Best regards, Basil Morkovkin.
>
>
> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jo...@gmail.com>:
>
> > Hello,
> >
> > I've confirmed I could add more issues for GSOC. Can you explain what you
> > would like to contribute to? I can add more issues
> >
> > JL
> >
> > On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <ne...@163.com> wrote:
> >
> >> Hi, Vasiliy Morkovkin
> >>
> >> Welcome to the zeppelin community! :-)
> >>
> >> > 在 2019年3月5日,上午11:49,Jongyoul Lee <jo...@gmail.com> 写道:
> >> >
> >> > Thanks for contacting Zeppelin with your interest.
> >> >
> >> > I added FE topics for GSOC because FE is the most urgent issue I have
> >> > thought about. We always encourage to contribute Zeppelin with several
> >> > topics including your idea.
> >> >
> >> > Please describe something more.
> >> >
> >> > Thanks.
> >> > JL
> >> >
> >> > On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <mo...@apache.org> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Great to see your interest to project. Thanks!
> >> >> Looks like we need volunteers for a mentor and some backend subject
> for
> >> >> GSoC2019.
> >> >> Any ideas?
> >> >>
> >> >> Best,
> >> >> moon
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
> >> >> morkovkin.vv@phystech.edu>
> >> >> wrote:
> >> >>
> >> >>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
> >> physics
> >> >>> and technology and eager to contribute to Zeppelin in context of
> GSOC
> >> >>> 2019. I've become a real fan of Zeppelin over the past couple of
> >> months,
> >> >>> using it at my job. But I have found out only one ticket (front-end
> >> >>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
> >> >>> ideas for new features or improvements in Zeppelin, but you don't
> have
> >> >>> enough hands on them. It would be wonderful if anyone agreed to
> mentor
> >> >>> these ideas within GSOC :)
> >> >>> Currently I am in a position of Scala developer (back-end) for 1.5
> >> year.
> >> >>> I also can write in Java or Python without any problems if
> necessary.
> >> >>> Really fond of databases and highload. Also I have experience with
> >> some
> >> >>> other great Apache projects like Cassandra, Kafka and Spark.
> >> >>>
> >> >>> Best regards, Basil Morkovkin.
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> > --
> >> > 이종열, Jongyoul Lee, 李宗烈
> >> > http://madeng.net
> >>
> >>
> >
> > --
> > 이종열, Jongyoul Lee, 李宗烈
> > http://madeng.net
> >
>


-- 
Best Regards

Jeff Zhang

Re: Zeppelin in GSOC 2019

Posted by Морковкин, Василий Владимирович <mo...@phystech.edu>.
Thank you for your replies! I've checked existing set of issues and found
several curious ones:
- https://issues.apache.org/jira/browse/ZEPPELIN-3651 seems to be very nice
way to increase analytical processing performance using Arrow project;
- https://issues.apache.org/jira/browse/ZEPPELIN-3994 deploying models
regardless of ZeppelinServer sounds quite intriguing too. Although there is
much to think about;
- https://issues.apache.org/jira/browse/ZEPPELIN-4018 at first glance
https://airflow.apache.org/ seems to be useful in implementing complex
execution workflows.
Those tasks are global and intriguing, requiring complex architectural
solutions.
Also I've probably found the ticket which is suitable for me to get
involved into the project:
- https://issues.apache.org/jira/browse/ZEPPELIN-3857. What do you think?
Are there any "low hanging fruits"?

And I have several ideas on my own. Some of them might be not relevant due
to the vision of the project or other reasons. Just ideas:
- OLAP. As Zeppelin is a tool aimed at analytics, it seems to be quite
logical to add more integrations with existing OLAP solutions like Pinot,
ClickHouse and Druid. Currently I've found integration only with Kylin;
- Better autocompletion. Jupyter offers not only a list of already
initialized variables, but also quick access to documentation. It's
convenient;
- Notifications. Some colleagues would have appreciated the notifications
service, which sends you messages (via mail, Slack bot or something else)
indicating that your long-running paragraphs has completed.

Feedback is very appreciated :)

It would be wonderful if someone agreed to sacrifice his time and become a
mentor in GSOC program!

----------------------------------------
Best regards, Basil Morkovkin.


вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jo...@gmail.com>:

> Hello,
>
> I've confirmed I could add more issues for GSOC. Can you explain what you
> would like to contribute to? I can add more issues
>
> JL
>
> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <ne...@163.com> wrote:
>
>> Hi, Vasiliy Morkovkin
>>
>> Welcome to the zeppelin community! :-)
>>
>> > 在 2019年3月5日,上午11:49,Jongyoul Lee <jo...@gmail.com> 写道:
>> >
>> > Thanks for contacting Zeppelin with your interest.
>> >
>> > I added FE topics for GSOC because FE is the most urgent issue I have
>> > thought about. We always encourage to contribute Zeppelin with several
>> > topics including your idea.
>> >
>> > Please describe something more.
>> >
>> > Thanks.
>> > JL
>> >
>> > On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <mo...@apache.org> wrote:
>> >
>> >> Hi,
>> >>
>> >> Great to see your interest to project. Thanks!
>> >> Looks like we need volunteers for a mentor and some backend subject for
>> >> GSoC2019.
>> >> Any ideas?
>> >>
>> >> Best,
>> >> moon
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>> >> morkovkin.vv@phystech.edu>
>> >> wrote:
>> >>
>> >>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>> physics
>> >>> and technology and eager to contribute to Zeppelin in context of GSOC
>> >>> 2019. I've become a real fan of Zeppelin over the past couple of
>> months,
>> >>> using it at my job. But I have found out only one ticket (front-end
>> >>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
>> >>> ideas for new features or improvements in Zeppelin, but you don't have
>> >>> enough hands on them. It would be wonderful if anyone agreed to mentor
>> >>> these ideas within GSOC :)
>> >>> Currently I am in a position of Scala developer (back-end) for 1.5
>> year.
>> >>> I also can write in Java or Python without any problems if necessary.
>> >>> Really fond of databases and highload. Also I have experience with
>> some
>> >>> other great Apache projects like Cassandra, Kafka and Spark.
>> >>>
>> >>> Best regards, Basil Morkovkin.
>> >>>
>> >>>
>> >>
>> >
>> >
>> > --
>> > 이종열, Jongyoul Lee, 李宗烈
>> > http://madeng.net
>>
>>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>

Re: Zeppelin in GSOC 2019

Posted by Jongyoul Lee <jo...@gmail.com>.
Hello,

I've confirmed I could add more issues for GSOC. Can you explain what you
would like to contribute to? I can add more issues

JL

On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <ne...@163.com> wrote:

> Hi, Vasiliy Morkovkin
>
> Welcome to the zeppelin community! :-)
>
> > 在 2019年3月5日,上午11:49,Jongyoul Lee <jo...@gmail.com> 写道:
> >
> > Thanks for contacting Zeppelin with your interest.
> >
> > I added FE topics for GSOC because FE is the most urgent issue I have
> > thought about. We always encourage to contribute Zeppelin with several
> > topics including your idea.
> >
> > Please describe something more.
> >
> > Thanks.
> > JL
> >
> > On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <mo...@apache.org> wrote:
> >
> >> Hi,
> >>
> >> Great to see your interest to project. Thanks!
> >> Looks like we need volunteers for a mentor and some backend subject for
> >> GSoC2019.
> >> Any ideas?
> >>
> >> Best,
> >> moon
> >>
> >>
> >>
> >>
> >> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
> >> morkovkin.vv@phystech.edu>
> >> wrote:
> >>
> >>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
> physics
> >>> and technology and eager to contribute to Zeppelin in context of GSOC
> >>> 2019. I've become a real fan of Zeppelin over the past couple of
> months,
> >>> using it at my job. But I have found out only one ticket (front-end
> >>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
> >>> ideas for new features or improvements in Zeppelin, but you don't have
> >>> enough hands on them. It would be wonderful if anyone agreed to mentor
> >>> these ideas within GSOC :)
> >>> Currently I am in a position of Scala developer (back-end) for 1.5
> year.
> >>> I also can write in Java or Python without any problems if necessary.
> >>> Really fond of databases and highload. Also I have experience with some
> >>> other great Apache projects like Cassandra, Kafka and Spark.
> >>>
> >>> Best regards, Basil Morkovkin.
> >>>
> >>>
> >>
> >
> >
> > --
> > 이종열, Jongyoul Lee, 李宗烈
> > http://madeng.net
>
>

-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Zeppelin in GSOC 2019

Posted by Xun Liu <ne...@163.com>.
Hi, Vasiliy Morkovkin 

Welcome to the zeppelin community! :-)

> 在 2019年3月5日,上午11:49,Jongyoul Lee <jo...@gmail.com> 写道:
> 
> Thanks for contacting Zeppelin with your interest.
> 
> I added FE topics for GSOC because FE is the most urgent issue I have
> thought about. We always encourage to contribute Zeppelin with several
> topics including your idea.
> 
> Please describe something more.
> 
> Thanks.
> JL
> 
> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <mo...@apache.org> wrote:
> 
>> Hi,
>> 
>> Great to see your interest to project. Thanks!
>> Looks like we need volunteers for a mentor and some backend subject for
>> GSoC2019.
>> Any ideas?
>> 
>> Best,
>> moon
>> 
>> 
>> 
>> 
>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>> morkovkin.vv@phystech.edu>
>> wrote:
>> 
>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of physics
>>> and technology and eager to contribute to Zeppelin in context of GSOC
>>> 2019. I've become a real fan of Zeppelin over the past couple of months,
>>> using it at my job. But I have found out only one ticket (front-end
>>> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
>>> ideas for new features or improvements in Zeppelin, but you don't have
>>> enough hands on them. It would be wonderful if anyone agreed to mentor
>>> these ideas within GSOC :)
>>> Currently I am in a position of Scala developer (back-end) for 1.5 year.
>>> I also can write in Java or Python without any problems if necessary.
>>> Really fond of databases and highload. Also I have experience with some
>>> other great Apache projects like Cassandra, Kafka and Spark.
>>> 
>>> Best regards, Basil Morkovkin.
>>> 
>>> 
>> 
> 
> 
> -- 
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net


Re: Zeppelin in GSOC 2019

Posted by Jongyoul Lee <jo...@gmail.com>.
Thanks for contacting Zeppelin with your interest.

I added FE topics for GSOC because FE is the most urgent issue I have
thought about. We always encourage to contribute Zeppelin with several
topics including your idea.

Please describe something more.

Thanks.
JL

On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> Great to see your interest to project. Thanks!
> Looks like we need volunteers for a mentor and some backend subject for
> GSoC2019.
> Any ideas?
>
> Best,
> moon
>
>
>
>
> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
> morkovkin.vv@phystech.edu>
> wrote:
>
> > Hi everyone, I'm pursuing bachelor degree at Moscow institute of physics
> > and technology and eager to contribute to Zeppelin in context of GSOC
> > 2019. I've become a real fan of Zeppelin over the past couple of months,
> > using it at my job. But I have found out only one ticket (front-end
> > task) with label of GSOC 2019 on your Jira. Perhaps you may have any
> > ideas for new features or improvements in Zeppelin, but you don't have
> > enough hands on them. It would be wonderful if anyone agreed to mentor
> > these ideas within GSOC :)
> > Currently I am in a position of Scala developer (back-end) for 1.5 year.
> > I also can write in Java or Python without any problems if necessary.
> > Really fond of databases and highload. Also I have experience with some
> > other great Apache projects like Cassandra, Kafka and Spark.
> >
> > Best regards, Basil Morkovkin.
> >
> >
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Zeppelin in GSOC 2019

Posted by moon soo Lee <mo...@apache.org>.
Hi,

Great to see your interest to project. Thanks!
Looks like we need volunteers for a mentor and some backend subject for
GSoC2019.
Any ideas?

Best,
moon




On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <mo...@phystech.edu>
wrote:

> Hi everyone, I'm pursuing bachelor degree at Moscow institute of physics
> and technology and eager to contribute to Zeppelin in context of GSOC
> 2019. I've become a real fan of Zeppelin over the past couple of months,
> using it at my job. But I have found out only one ticket (front-end
> task) with label of GSOC 2019 on your Jira. Perhaps you may have any
> ideas for new features or improvements in Zeppelin, but you don't have
> enough hands on them. It would be wonderful if anyone agreed to mentor
> these ideas within GSOC :)
> Currently I am in a position of Scala developer (back-end) for 1.5 year.
> I also can write in Java or Python without any problems if necessary.
> Really fond of databases and highload. Also I have experience with some
> other great Apache projects like Cassandra, Kafka and Spark.
>
> Best regards, Basil Morkovkin.
>
>