You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by moon soo Lee <mo...@apache.org> on 2016/02/27 21:48:43 UTC

[DISCUSS] Update Roadmap

Hi Zeppelin users and developers,

The roadmap we have published at
https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
is almost 9 month old, and it doesn't reflect where the community goes
anymore. It's time to update.

Based on mailing list, jira issues, pullrequests, feedbacks from users,
conferences and meetings, I could summarize the major interest of users and
developers in 7 categories. Enterprise ready, Usability improvement,
Pluggability, Documentation, Backend integration, Notebook storage, and
Visualization.

And i could list related subjects under each categories.

   - Enterprise ready
      - Authentication
         - Shiro authentication ZEPPELIN-548
         <https://issues.apache.org/jira/browse/ZEPPELIN-548>
      - Authorization
         - Notebook authorization PR-681
         <https://github.com/apache/incubator-zeppelin/pull/681>
      - Security
      - Multi-tenancy
      - Stability
   - Usability Improvement
      - UX improvement
      - Better Table data support
         - Download data as csv, etc PR-725
         <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
         <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
         <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
         <https://github.com/apache/incubator-zeppelin/pull/89>
         - Featureful table data display (pagenation, etc)
      - Pluggability ZEPPELIN-533
   <https://issues.apache.org/jira/browse/ZEPPELIN-533>
      - Pluggable visualization
      - Dynamic Interpreter, notebook, visualization loading
      - Repository and registry for pluggable components
   - Improve documentation
      - Improve contents and readability
      - more tutorials, examples
   - Interpreter
      - Generic JDBC Interpreter
      - (spark)R Interpreter
      - Cluster manager for interpreter (Proposal
      <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
      )
      - more interpreters
   - Notebook storage
      - Versioning ZEPPELIN-540
      <http://issues.apache.org/jira/browse/ZEPPELIN-540>
      - more notebook storages
   - Visualization
      - More visualizations PR-152
      <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
      <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
      <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
      <https://github.com/apache/incubator-zeppelin/pull/321>
      - Customize graph (show/hide label, color, etc)


It will help anyone quickly get overall interest of project and the
direction. And based on this roadmap, we can discuss and re-define the next
release 0.6.0 scope and it's schedule.

What do you think? Any feedback would be appreciated.

Thanks,
moon

Re: [DISCUSS] Update Roadmap

Posted by DuyHai Doan <do...@gmail.com>.

It's a great update Moon.

 Monday I'll give a talk at Voxxed Days Vienna about Zeppelin, your email
will be helpful to give some hints about the future of Zeppelin



On Sat, Feb 27, 2016 at 9:48 PM, moon soo Lee <mo...@apache.org> wrote:

> Hi Zeppelin users and developers,
>
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
> And i could list related subjects under each categories.
>
>    - Enterprise ready
>       - Authentication
>          - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>       - Authorization
>          - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>       - Security
>       - Multi-tenancy
>       - Stability
>    - Usability Improvement
>       - UX improvement
>       - Better Table data support
>          - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>          - Featureful table data display (pagenation, etc)
>       - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>       - Pluggable visualization
>       - Dynamic Interpreter, notebook, visualization loading
>       - Repository and registry for pluggable components
>    - Improve documentation
>       - Improve contents and readability
>       - more tutorials, examples
>    - Interpreter
>       - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>    - Notebook storage
>       - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>    - Visualization
>       - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>       - Customize graph (show/hide label, color, etc)
>
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
> What do you think? Any feedback would be appreciated.
>
> Thanks,
> moon
>
>

Re: [DISCUSS] Update Roadmap

Posted by Shabeel Syed <sh...@gmail.com>.

Hi Moon,

       Some of my requirements.

   1. Can we achieve better memory management for notebooks ? I'm also
   facing some similar OOM issue, like Dafeng mentioned in other
   discussion.I'm using the iframe view of a paragraph, can we load that
   code+results to memory only when requested ? I think this is one area to be
   focused on.
   2. In table/graph view can we include below features along with
   pagination ?

                a) Search , similar to
https://docs.angularjs.org/api/ng/filter/filter
                b) Sorting of columns. Also custom sorting algorithms ?

    Also any idea on GA for these suggested improvements ?


Regards
Shabeel

On Mon, Feb 29, 2016 at 1:51 PM, Vinayak Agrawal <vinayakagrawal88@gmail.com
> wrote:

> Moon,
> The new roadmap looks very promising. I am very happy to see security in
> the list.
> I have some suggestions regarding Enterprise Ready features:
>
> 1. Job Scheduler - Can this be improved?
> Currently the scheduler can be used with Cron expression or a pre-set
> time. But in an enterprise solution, a notebook might be one piece of the
> workflow. Can we look towards the functionality of scheduling notebook's
> based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream
> users wait for the ETL notebook to finish successfully. Only after that,
> other business oriented notebooks can be executed.
>
> 2. Importing a notebook - Is there a current requirement or future plan to
> implement a feature that allows import-notebook-from-github? This would
> allow users to share notebooks seamlessly.
>
> Thanks
> Vinayak
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Zhong Wang,
>> Right, Folder support would be quite useful. Thanks for the opinion.
>> Hope i can finish the work pr-190
>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>
>> Sourav,
>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>> run paragraph/query concurrently.
>>
>> SparkInterpreter is implemented with FIFO scheduler considering nature of
>> scala compiler. That's why user can not run multiple paragraph concurrently
>> when they work with SparkInterpreter.
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> separate scala compiler so paragraphs run concurrently, while they're in
>> different notebooks.
>> Thanks for the feedback!
>>
>> Best,
>> moon
>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>> wrote:
>>
>>> Sourav: I think this newly merged PR can help you
>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>
>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>>>> Hi Moon,
>>>>
>>>> This looks great.
>>>>
>>>> My only suggestion would be to include a PR/feature - Support for
>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>
>>>> Right now if more than one user tries to run paragraphs in multiple
>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>> queue gets built up within the zeppelin process and interpreter process in
>>>> that scenario as the time taken to move the status from start to pending
>>>> and pending to running is very high compared to the actual running time of
>>>> a paragraph.
>>>>
>>>> Without this the multi tenancy support would be meaningless as no one
>>>> can practically use it in a situation where multiple users are trying to
>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>> possible solution would be to spawn separate instance of the same
>>>> interpreter at every notebook/user level.
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> Hi Zeppelin users and developers,
>>>>>
>>>>> The roadmap we have published at
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>>> anymore. It's time to update.
>>>>>
>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>> storage, and Visualization.
>>>>>
>>>>> And i could list related subjects under each categories.
>>>>>
>>>>>    - Enterprise ready
>>>>>       - Authentication
>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>       - Authorization
>>>>>          - Notebook authorization PR-681
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>       - Security
>>>>>       - Multi-tenancy
>>>>>       - Stability
>>>>>    - Usability Improvement
>>>>>       - UX improvement
>>>>>       - Better Table data support
>>>>>          - Download data as csv, etc PR-725
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>          PR-714
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>          - Featureful table data display (pagenation, etc)
>>>>>       - Pluggability ZEPPELIN-533
>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>       - Pluggable visualization
>>>>>       - Dynamic Interpreter, notebook, visualization loading
>>>>>       - Repository and registry for pluggable components
>>>>>    - Improve documentation
>>>>>       - Improve contents and readability
>>>>>       - more tutorials, examples
>>>>>    - Interpreter
>>>>>       - Generic JDBC Interpreter
>>>>>       - (spark)R Interpreter
>>>>>       - Cluster manager for interpreter (Proposal
>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>       )
>>>>>       - more interpreters
>>>>>    - Notebook storage
>>>>>       - Versioning ZEPPELIN-540
>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>       - more notebook storages
>>>>>    - Visualization
>>>>>       - More visualizations PR-152
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>       - Customize graph (show/hide label, color, etc)
>>>>>
>>>>>
>>>>> It will help anyone quickly get overall interest of project and the
>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>> release 0.6.0 scope and it's schedule.
>>>>>
>>>>> What do you think? Any feedback would be appreciated.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>>
>>>>
>
>
> --
> Vinayak Agrawal
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
> ~Lord Alfred Tennyson
>

Re: [DISCUSS] Update Roadmap

Posted by John Omernik <jo...@omernik.com>.

Jeff in regards to #5 as a close follower of Apache Drill, I would highly
recommend HOCON for config. HOCON, at least how it's implemented in
Drill, is a great way to visualize configs, and still gives administrators
the ability to use ENV through variables in the HOCON.  This way we could
allow people to hard code in the HOCON or if they want to use their own env
variables in the Zeppelin-env they can. For me, as a user and
administrator, this is powerful when using something like mesos and
marathon to deploy Zeppelin instances.  Huge +1 for HOCON.

John


On Thursday, April 7, 2016, Jeff Steinmetz <je...@gmail.com>
wrote:

> Comments:
>
> Regarding #2: Language support.  It would be great to see more Scala (once
> up to speed with Scala I never wanted to look back at Java)
> Regarding #3: Drop old SPARK support.  Seems like low hanging fruit, low
> impact & high reward.
> Regarding #5: Configuration Files. We could take a queue from other great
> open source (Apache license) projects, like ElasticSearch, and migrate to
> .yml files instead of verbose XML files and leave Environment variables for
> per-machine settings & global settings related to the java runtime, JVM
> memory configs and directories paths such as [FOO]_HOME.
> An alternative to .yml is HOCON.  The Play Framework and Spark Job Server
> make use of easy to read HOCON style files, which is a a JSON superset.
> https://github.com/typesafehub/config/blob/master/HOCON.md
>
> Typesafe licenses their entire config library under the Apache library,
> and uses plain Java with no dependencies:
> https://github.com/typesafehub/config
>
>
> Regarding #6: Excluding the more esoteric interpreters by default seems
> reasonable
>
> Addition:  Create a common installer that also bundles a service manager
> upstart script for Debian or CentOS (not sure about Windows).  Install via
> Debian package with a simple `dpkg -i` command.
> Addition:  Build tools,  Does anybody have history with Gradle?  Is a
> Switch from Maven to Gradle worth it - I admit I am not an XML fan and
> realize this is not a simple task.  Gradle may make it easier to organize
> the builds if interpreters ever became plugins.  Each plugin could have its
> own build.gradle file
>
> "Improve documentation” is always a big yes.
>
>
> Regards,
> Jeff Steinmetz
>
>
>
>
>
>
>
>
> On 4/6/16, 7:32 PM, "Amos Elberg" <amos.elberg@gmail.com <javascript:;>>
> wrote:
>
> >A few suggestions for the roadmap:
> >
> >1. Increase unit test coverage.  I suggest we set thresholds -- say, 70%
> for
> >0.6, 85% for 0.7, and aim for 95% before 1.0.
> >
> >2. Language support.  Right now, interpreters essentially have to be
> written
> >in Java, or at least have java wrappers.  This is because the current
> design
> >has each interpreter class call a `static class` method when the class is
> >loaded, to register the Interpreter with zeppelin.  In the long term,
> using
> >static class methods will inevitably be a source of architectural
> problems.
> >(People have been saying that the feature should be removed entirely from
> Java
> >since 1998.)  In the short term, if we fix this, then it would be easy for
> >people to write interpreters in other jvm languages, such as Scala,
> Clojure,
> >Python (by Jython), Elixir (by whatever the Elxir jvm converter is
> called),
> >Groovy, etc.
> >
> >3. Remove Spark-under-zeppelin-home.  Many, many, many of our issues,
> >including many CI issues, trace back to the old system of installing Spark
> >under Zeppelin-home.  This is essentially a legacy thing from when
> Zeppelin
> >was a PR submitted as an add-on to Spark.  Right now, it doesn't buy us
> >anything -- but it does complicate the build process, create dependency
> >conflicts, and lead to user support issues.
> >
> >I suggest we deprecate this ASAP, and remove it entirely before 0.7, or
> 0.8 at
> >the latest.
> >
> >4. Drop support for Spark before 1.3, or better yet before 1.4.  Jeff
> >Steinmetz suggested this the other day.  It would simplify CI and the
> build
> >process, as well as maintenance as Spark heads toward 2.0.  I can't
> imagine
> >more than a tiny number of people who use zeppelin are using it with Spark
> >1.2, or even 1.3.
> >
> >5.  Reform the configuration system.  Right now, Zeppelin configuration
> is set
> >in:
> >       - ZeppelinConfiguration.java (developers must edit)
> >       - The xml configuration (administrator must edit)
> >       - The env configuration file (administrator must edit)
> >       - Multiple json files such as interpreter.json (edited through the
> >interface)
> >
> >The result is kind of a mish-mash, and it creates user support issues when
> >people enter conflicting configurations or configurations in the wrong
> place.
> >
> >It's also a developer issue because we haven't defined what takes
> precedence
> >over what.
> >
> >I suggest we introduce a part of the architecture which acts as an
> arbitrator
> >for all configuraiton issues -- when any class needs to access or change
> >configuration, it can go through one place.  Then we can figure out how we
> >want to present configuration to the users.
> >
> >6.  Disable most interpreters other than Spark-related (and MD) by
> default.
> >At this point, we've proliferated so many interpreters, that it
> complicates
> >the build cycle and, well, just isn't necessary.
> >
> >On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote:
> >> This is a great list.
> >>
> >> In the enterprise ready section, what do you think about adding "High
> >> Availability and Disaster Recovery"? We can start with updating the
> >> documentation with best practices and scripts for a cold standby
> solution
> >> and work towards active-active
> >> <
> https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a
> >> vailability_cold_warm_hot?lang=en> solution.
> >>
> >> Another suggestion is to store meta-data for notes like creator, last
> >> updated (time and user) and number of views. We can show this
> information
> >> in the top level page in a table format with ability to sort by any
> column.
> >>
> >> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <bbuild11@gmail.com
> <javascript:;>> wrote:
> >> > I concur with this suggestion. In the enterprise, management would
> like to
> >> > see scheduled runs to be tracked, monitored, and given SLA
> constraints for
> >> > the mission critical. Alerts and notifications are crucial for DevOps
> to
> >> > respond with error clarification within it. If the Zeppelin notebooks
> can
> >> > be executed by a third party scheduling application, such as Oozie,
> then
> >> > this requirement can be satisfied if there are no immediate plans for
> a
> >> > built-in one.
> >> >
> >> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <eranwitkon@gmail.com
> <javascript:;>> wrote:
> >> >
> >> > @Vinayak Agrawal I would suggest adding the ability to connect
> zeppelin
> >> > to existing scheduling tools\workflow tools such as
> >> > https://oozie.apache.org/. this requires betters hooks and status
> >> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
> >> >
> >> >
> >> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> >> >
> >> > vinayakagrawal88@gmail.com <javascript:;>> wrote:
> >> >> Moon,
> >> >> The new roadmap looks very promising. I am very happy to see
> security in
> >> >> the list.
> >> >> I have some suggestions regarding Enterprise Ready features:
> >> >>
> >> >> 1. Job Scheduler - Can this be improved?
> >> >> Currently the scheduler can be used with Cron expression or a pre-set
> >> >> time. But in an enterprise solution, a notebook might be one piece
> of the
> >> >> workflow. Can we look towards the functionality of scheduling
> notebook's
> >> >> based on other notebooks finishing their job successfully?
> >> >> This requirement would arise in any ETL workflow, where all the
> >> >> downstream users wait for the ETL notebook to finish successfully.
> Only
> >> >> after that, other business oriented notebooks can be executed.
> >> >>
> >> >> 2. Importing a notebook - Is there a current requirement or future
> plan
> >> >> to implement a feature that allows import-notebook-from-github? This
> >> >> would
> >> >> allow users to share notebooks seamlessly.
> >> >>
> >> >> Thanks
> >> >> Vinayak
> >> >>
> >> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <moon@apache.org
> <javascript:;>> wrote:
> >> >>> Zhong Wang,
> >> >>> Right, Folder support would be quite useful. Thanks for the opinion.
> >> >>
> >> >> Hope i can finish the work pr-190
> >> >>
> >> >>> <https://github.com/apache/incubator-zeppelin/pull/190>.
> >> >>>
> >> >>>
> >> >>> Sourav,
> >> >>> Regarding concurrent running, Zeppelin doesn't have limitation of
> run
> >> >>> paragraph/query concurrently. Interpreter can implement it's own
> >> >>> scheduling
> >> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can
> >> >>> already
> >> >>> run paragraph/query concurrently.
> >> >>>
> >> >>> SparkInterpreter is implemented with FIFO scheduler considering
> nature
> >> >>> of scala compiler. That's why user can not run multiple paragraph
> >> >>> concurrently when they work with SparkInterpreter.
> >> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> >> >>> separate scala compiler so paragraphs run concurrently, while
> they're in
> >> >>> different notebooks.
> >> >>> Thanks for the feedback!
> >> >>>
> >> >>> Best,
> >> >>> moon
> >> >>
> >> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong.neu@gmail.com
> <javascript:;>>
> >> >>
> >> >>> wrote:
> >> >> Sourav: I think this newly merged PR can help you
> >> >>
> >> >>>>
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855
> >> >>>> 82537
> >> >>>>
> >> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> >> >>>
> >> >>>> sourav.mazumder00@gmail.com <javascript:;>> wrote:
> >> >>> Hi Moon,
> >> >>>
> >> >>>>> This looks great.
> >> >>>>>
> >> >>>>> My only suggestion would be to include a PR/feature - Support for
> >> >>>>> Running Concurrent paragraphs/queries in Zeppelin.
> >> >>>>>
> >> >>>>> Right now if more than one user tries to run paragraphs in
> multiple
> >> >>>>> notebooks concurrently through a single Zeppelin instance (and
> single
> >> >>>>> interpreter instance) the performance is very slow. It is obvious
> that
> >> >>>>> the
> >> >>>>> queue gets built up within the zeppelin process and interpreter
> >> >>>>> process in
> >> >>>>> that scenario as the time taken to move the status from start to
> >> >>>>> pending
> >> >>>>> and pending to running is very high compared to the actual running
> >> >>>>> time of
> >> >>>>> a paragraph.
> >> >>>>>
> >> >>>>> Without this the multi tenancy support would be meaningless as no
> one
> >> >>>>> can practically use it in a situation where multiple users are
> trying
> >> >>>>> to
> >> >>>>> connect to the same instance of Zeppelin (and the related
> >> >>>>> interpreter). A
> >> >>>>> possible solution would be to spawn separate instance of the same
> >> >>>>> interpreter at every notebook/user level.
> >> >>>>>
> >> >>>>> Regards,
> >> >>>>> Sourav
> >> >>>>
> >> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <moon@apache.org
> <javascript:;>> wrote:
> >> >>>>
> >> >>>> Hi Zeppelin users and developers,
> >> >>>>
> >> >>>>>> The roadmap we have published at
> >> >>>>>>
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> >> >>>>>> is almost 9 month old, and it doesn't reflect where the community
> >> >>>>>> goes anymore. It's time to update.
> >> >>>>>>
> >> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
> >> >>>>>> users, conferences and meetings, I could summarize the major
> interest
> >> >>>>>> of
> >> >>>>>> users and developers in 7 categories. Enterprise ready, Usability
> >> >>>>>> improvement, Pluggability, Documentation, Backend integration,
> >> >>>>>> Notebook
> >> >>>>>> storage, and Visualization.
> >> >>>>>>
> >> >>>>>> And i could list related subjects under each categories.
> >> >>>>>>
> >> >>>>>>    - Enterprise ready
> >> >>>>>>
> >> >>>>>>       - Authentication
> >> >>>>>>
> >> >>>>>>          - Shiro authentication ZEPPELIN-548
> >> >>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> >> >>>>>>
> >> >>>>>>       - Authorization
> >> >>>>>>
> >> >>>>>>          - Notebook authorization PR-681
> >> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
> >> >>>>>>
> >> >>>>>>       - Security
> >> >>>>>>       - Multi-tenancy
> >> >>>>>>       - Stability
> >> >>>>>>
> >> >>>>>>    - Usability Improvement
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - UX improvement
> >> >>>>>>
> >> >>>>>>       - Better Table data support
> >> >>>>>>
> >> >>>>>>    - Download data as csv, etc PR-725
> >> >>>>>>
> >> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725
> >,
> >> >>>>>>          PR-714
> >> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714
> >,
> >> >>>>>>          PR-6 <
> https://github.com/apache/incubator-zeppelin/pull/6>,
> >> >>>>>>          PR-89 <
> https://github.com/apache/incubator-zeppelin/pull/89>
> >> >>>>>>
> >> >>>>>>    - Featureful table data display (pagenation, etc)
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - Pluggability ZEPPELIN-533
> >> >>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> >> >>>>>>
> >> >>>>>>       - Pluggable visualization
> >> >>>>>>
> >> >>>>>>    - Dynamic Interpreter, notebook, visualization loading
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - Repository and registry for pluggable components
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - Improve documentation
> >> >>>>>>
> >> >>>>>>       - Improve contents and readability
> >> >>>>>>       - more tutorials, examples
> >> >>>>>>
> >> >>>>>>    - Interpreter
> >> >>>>>>
> >> >>>>>>       - Generic JDBC Interpreter
> >> >>>>>>       - (spark)R Interpreter
> >> >>>>>>       - Cluster manager for interpreter (Proposal
> >> >>>>>>       <
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M
> >> >>>>>>       anager+Proposal> )
> >> >>>>>>       - more interpreters
> >> >>>>>>
> >> >>>>>>    - Notebook storage
> >> >>>>>>
> >> >>>>>>       - Versioning ZEPPELIN-540
> >> >>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> >> >>>>>>       - more notebook storages
> >> >>>>>>
> >> >>>>>>    - Visualization
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>    - More visualizations PR-152
> >> >>>>>>
> >> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
> PR-728
> >> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>,
> PR-336
> >> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
> PR-321
> >> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
> >> >>>>>>
> >> >>>>>>    - Customize graph (show/hide label, color, etc)
> >> >>>>>>
> >> >>>>>> It will help anyone quickly get overall interest of project and
> the
> >> >>>>>> direction. And based on this roadmap, we can discuss and
> re-define
> >> >>>>>> the next
> >> >>>>>> release 0.6.0 scope and it's schedule.
> >> >>>>>>
> >> >>>>>> What do you think? Any feedback would be appreciated.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> moon
> >> >>
> >> >> --
> >> >> Vinayak Agrawal
> >> >>
> >> >>
> >> >> "To Strive, To Seek, To Find and Not to Yield!"
> >> >> ~Lord Alfred Tennyson
> >
> >
>
>

-- 
Sent from my iThing

Re: [DISCUSS] Update Roadmap

Posted by Jeff Steinmetz <je...@gmail.com>.

Comments:

Regarding #2: Language support.  It would be great to see more Scala (once up to speed with Scala I never wanted to look back at Java)
Regarding #3: Drop old SPARK support.  Seems like low hanging fruit, low impact & high reward.
Regarding #5: Configuration Files. We could take a queue from other great open source (Apache license) projects, like ElasticSearch, and migrate to .yml files instead of verbose XML files and leave Environment variables for per-machine settings & global settings related to the java runtime, JVM memory configs and directories paths such as [FOO]_HOME.
An alternative to .yml is HOCON.  The Play Framework and Spark Job Server make use of easy to read HOCON style files, which is a a JSON superset.
https://github.com/typesafehub/config/blob/master/HOCON.md

Typesafe licenses their entire config library under the Apache library, and uses plain Java with no dependencies:
https://github.com/typesafehub/config


Regarding #6: Excluding the more esoteric interpreters by default seems reasonable

Addition:  Create a common installer that also bundles a service manager upstart script for Debian or CentOS (not sure about Windows).  Install via Debian package with a simple `dpkg -i` command.
Addition:  Build tools,  Does anybody have history with Gradle?  Is a Switch from Maven to Gradle worth it - I admit I am not an XML fan and realize this is not a simple task.  Gradle may make it easier to organize the builds if interpreters ever became plugins.  Each plugin could have its own build.gradle file

"Improve documentation” is always a big yes.


Regards,
Jeff Steinmetz








On 4/6/16, 7:32 PM, "Amos Elberg" <am...@gmail.com> wrote:

>A few suggestions for the roadmap:
>
>1. Increase unit test coverage.  I suggest we set thresholds -- say, 70% for 
>0.6, 85% for 0.7, and aim for 95% before 1.0.
>
>2. Language support.  Right now, interpreters essentially have to be written 
>in Java, or at least have java wrappers.  This is because the current design 
>has each interpreter class call a `static class` method when the class is 
>loaded, to register the Interpreter with zeppelin.  In the long term, using 
>static class methods will inevitably be a source of architectural problems.  
>(People have been saying that the feature should be removed entirely from Java 
>since 1998.)  In the short term, if we fix this, then it would be easy for 
>people to write interpreters in other jvm languages, such as Scala, Clojure, 
>Python (by Jython), Elixir (by whatever the Elxir jvm converter is called), 
>Groovy, etc.  
>
>3. Remove Spark-under-zeppelin-home.  Many, many, many of our issues, 
>including many CI issues, trace back to the old system of installing Spark 
>under Zeppelin-home.  This is essentially a legacy thing from when Zeppelin 
>was a PR submitted as an add-on to Spark.  Right now, it doesn't buy us 
>anything -- but it does complicate the build process, create dependency 
>conflicts, and lead to user support issues.  
>
>I suggest we deprecate this ASAP, and remove it entirely before 0.7, or 0.8 at 
>the latest.  
>
>4. Drop support for Spark before 1.3, or better yet before 1.4.  Jeff 
>Steinmetz suggested this the other day.  It would simplify CI and the build 
>process, as well as maintenance as Spark heads toward 2.0.  I can't imagine 
>more than a tiny number of people who use zeppelin are using it with Spark 
>1.2, or even 1.3. 
>
>5.  Reform the configuration system.  Right now, Zeppelin configuration is set 
>in:  
>	- ZeppelinConfiguration.java (developers must edit)
>	- The xml configuration (administrator must edit)
>	- The env configuration file (administrator must edit)
>	- Multiple json files such as interpreter.json (edited through the 
>interface)
>
>The result is kind of a mish-mash, and it creates user support issues when 
>people enter conflicting configurations or configurations in the wrong place.
>
>It's also a developer issue because we haven't defined what takes precedence 
>over what.  
>
>I suggest we introduce a part of the architecture which acts as an arbitrator 
>for all configuraiton issues -- when any class needs to access or change 
>configuration, it can go through one place.  Then we can figure out how we 
>want to present configuration to the users. 
>
>6.  Disable most interpreters other than Spark-related (and MD) by default.   
>At this point, we've proliferated so many interpreters, that it complicates 
>the build cycle and, well, just isn't necessary.
>
>On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote:
>> This is a great list.
>> 
>> In the enterprise ready section, what do you think about adding "High
>> Availability and Disaster Recovery"? We can start with updating the
>> documentation with best practices and scripts for a cold standby solution
>> and work towards active-active
>> <https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a
>> vailability_cold_warm_hot?lang=en> solution.
>> 
>> Another suggestion is to store meta-data for notes like creator, last
>> updated (time and user) and number of views. We can show this information
>> in the top level page in a table format with ability to sort by any column.
>> 
>> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <bb...@gmail.com> wrote:
>> > I concur with this suggestion. In the enterprise, management would like to
>> > see scheduled runs to be tracked, monitored, and given SLA constraints for
>> > the mission critical. Alerts and notifications are crucial for DevOps to
>> > respond with error clarification within it. If the Zeppelin notebooks can
>> > be executed by a third party scheduling application, such as Oozie, then
>> > this requirement can be satisfied if there are no immediate plans for a
>> > built-in one.
>> > 
>> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <er...@gmail.com> wrote:
>> > 
>> > @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
>> > to existing scheduling tools\workflow tools such as
>> > https://oozie.apache.org/. this requires betters hooks and status
>> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>> > 
>> > 
>> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>> > 
>> > vinayakagrawal88@gmail.com> wrote:
>> >> Moon,
>> >> The new roadmap looks very promising. I am very happy to see security in
>> >> the list.
>> >> I have some suggestions regarding Enterprise Ready features:
>> >> 
>> >> 1. Job Scheduler - Can this be improved?
>> >> Currently the scheduler can be used with Cron expression or a pre-set
>> >> time. But in an enterprise solution, a notebook might be one piece of the
>> >> workflow. Can we look towards the functionality of scheduling notebook's
>> >> based on other notebooks finishing their job successfully?
>> >> This requirement would arise in any ETL workflow, where all the
>> >> downstream users wait for the ETL notebook to finish successfully. Only
>> >> after that, other business oriented notebooks can be executed.
>> >> 
>> >> 2. Importing a notebook - Is there a current requirement or future plan
>> >> to implement a feature that allows import-notebook-from-github? This
>> >> would
>> >> allow users to share notebooks seamlessly.
>> >> 
>> >> Thanks
>> >> Vinayak
>> >> 
>> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>> >>> Zhong Wang,
>> >>> Right, Folder support would be quite useful. Thanks for the opinion.
>> >> 
>> >> Hope i can finish the work pr-190
>> >> 
>> >>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>> >>> 
>> >>> 
>> >>> Sourav,
>> >>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> >>> paragraph/query concurrently. Interpreter can implement it's own
>> >>> scheduling
>> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can
>> >>> already
>> >>> run paragraph/query concurrently.
>> >>> 
>> >>> SparkInterpreter is implemented with FIFO scheduler considering nature
>> >>> of scala compiler. That's why user can not run multiple paragraph
>> >>> concurrently when they work with SparkInterpreter.
>> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> >>> separate scala compiler so paragraphs run concurrently, while they're in
>> >>> different notebooks.
>> >>> Thanks for the feedback!
>> >>> 
>> >>> Best,
>> >>> moon
>> >> 
>> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>> >> 
>> >>> wrote:
>> >> Sourav: I think this newly merged PR can help you
>> >> 
>> >>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855
>> >>>> 82537
>> >>>> 
>> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>> >>> 
>> >>>> sourav.mazumder00@gmail.com> wrote:
>> >>> Hi Moon,
>> >>> 
>> >>>>> This looks great.
>> >>>>> 
>> >>>>> My only suggestion would be to include a PR/feature - Support for
>> >>>>> Running Concurrent paragraphs/queries in Zeppelin.
>> >>>>> 
>> >>>>> Right now if more than one user tries to run paragraphs in multiple
>> >>>>> notebooks concurrently through a single Zeppelin instance (and single
>> >>>>> interpreter instance) the performance is very slow. It is obvious that
>> >>>>> the
>> >>>>> queue gets built up within the zeppelin process and interpreter
>> >>>>> process in
>> >>>>> that scenario as the time taken to move the status from start to
>> >>>>> pending
>> >>>>> and pending to running is very high compared to the actual running
>> >>>>> time of
>> >>>>> a paragraph.
>> >>>>> 
>> >>>>> Without this the multi tenancy support would be meaningless as no one
>> >>>>> can practically use it in a situation where multiple users are trying
>> >>>>> to
>> >>>>> connect to the same instance of Zeppelin (and the related
>> >>>>> interpreter). A
>> >>>>> possible solution would be to spawn separate instance of the same
>> >>>>> interpreter at every notebook/user level.
>> >>>>> 
>> >>>>> Regards,
>> >>>>> Sourav
>> >>>> 
>> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>> >>>> 
>> >>>> Hi Zeppelin users and developers,
>> >>>> 
>> >>>>>> The roadmap we have published at
>> >>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> >>>>>> is almost 9 month old, and it doesn't reflect where the community
>> >>>>>> goes anymore. It's time to update.
>> >>>>>> 
>> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>> >>>>>> users, conferences and meetings, I could summarize the major interest
>> >>>>>> of
>> >>>>>> users and developers in 7 categories. Enterprise ready, Usability
>> >>>>>> improvement, Pluggability, Documentation, Backend integration,
>> >>>>>> Notebook
>> >>>>>> storage, and Visualization.
>> >>>>>> 
>> >>>>>> And i could list related subjects under each categories.
>> >>>>>> 
>> >>>>>>    - Enterprise ready
>> >>>>>>    
>> >>>>>>       - Authentication
>> >>>>>>       
>> >>>>>>          - Shiro authentication ZEPPELIN-548
>> >>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>> >>>>>>       
>> >>>>>>       - Authorization
>> >>>>>>       
>> >>>>>>          - Notebook authorization PR-681
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>> >>>>>>       
>> >>>>>>       - Security
>> >>>>>>       - Multi-tenancy
>> >>>>>>       - Stability
>> >>>>>>    
>> >>>>>>    - Usability Improvement
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - UX improvement
>> >>>>>>    
>> >>>>>>       - Better Table data support
>> >>>>>>    
>> >>>>>>    - Download data as csv, etc PR-725
>> >>>>>>    
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>> >>>>>>          PR-714
>> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>> >>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>> >>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>> >>>>>>    
>> >>>>>>    - Featureful table data display (pagenation, etc)
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Pluggability ZEPPELIN-533
>> >>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>> >>>>>>    
>> >>>>>>       - Pluggable visualization
>> >>>>>>    
>> >>>>>>    - Dynamic Interpreter, notebook, visualization loading
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Repository and registry for pluggable components
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - Improve documentation
>> >>>>>>    
>> >>>>>>       - Improve contents and readability
>> >>>>>>       - more tutorials, examples
>> >>>>>>    
>> >>>>>>    - Interpreter
>> >>>>>>    
>> >>>>>>       - Generic JDBC Interpreter
>> >>>>>>       - (spark)R Interpreter
>> >>>>>>       - Cluster manager for interpreter (Proposal
>> >>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M
>> >>>>>>       anager+Proposal> )
>> >>>>>>       - more interpreters
>> >>>>>>    
>> >>>>>>    - Notebook storage
>> >>>>>>    
>> >>>>>>       - Versioning ZEPPELIN-540
>> >>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>> >>>>>>       - more notebook storages
>> >>>>>>    
>> >>>>>>    - Visualization
>> >>>>>>    
>> >>>>>>    
>> >>>>>>    - More visualizations PR-152
>> >>>>>>    
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>> >>>>>>    
>> >>>>>>    - Customize graph (show/hide label, color, etc)
>> >>>>>> 
>> >>>>>> It will help anyone quickly get overall interest of project and the
>> >>>>>> direction. And based on this roadmap, we can discuss and re-define
>> >>>>>> the next
>> >>>>>> release 0.6.0 scope and it's schedule.
>> >>>>>> 
>> >>>>>> What do you think? Any feedback would be appreciated.
>> >>>>>> 
>> >>>>>> Thanks,
>> >>>>>> moon
>> >> 
>> >> --
>> >> Vinayak Agrawal
>> >> 
>> >> 
>> >> "To Strive, To Seek, To Find and Not to Yield!"
>> >> ~Lord Alfred Tennyson
>
>

Re: [DISCUSS] Update Roadmap

Posted by Amos Elberg <am...@gmail.com>.

A few suggestions for the roadmap:

1. Increase unit test coverage.  I suggest we set thresholds -- say, 70% for 
0.6, 85% for 0.7, and aim for 95% before 1.0.

2. Language support.  Right now, interpreters essentially have to be written 
in Java, or at least have java wrappers.  This is because the current design 
has each interpreter class call a `static class` method when the class is 
loaded, to register the Interpreter with zeppelin.  In the long term, using 
static class methods will inevitably be a source of architectural problems.  
(People have been saying that the feature should be removed entirely from Java 
since 1998.)  In the short term, if we fix this, then it would be easy for 
people to write interpreters in other jvm languages, such as Scala, Clojure, 
Python (by Jython), Elixir (by whatever the Elxir jvm converter is called), 
Groovy, etc.  

3. Remove Spark-under-zeppelin-home.  Many, many, many of our issues, 
including many CI issues, trace back to the old system of installing Spark 
under Zeppelin-home.  This is essentially a legacy thing from when Zeppelin 
was a PR submitted as an add-on to Spark.  Right now, it doesn't buy us 
anything -- but it does complicate the build process, create dependency 
conflicts, and lead to user support issues.  

I suggest we deprecate this ASAP, and remove it entirely before 0.7, or 0.8 at 
the latest.  

4. Drop support for Spark before 1.3, or better yet before 1.4.  Jeff 
Steinmetz suggested this the other day.  It would simplify CI and the build 
process, as well as maintenance as Spark heads toward 2.0.  I can't imagine 
more than a tiny number of people who use zeppelin are using it with Spark 
1.2, or even 1.3. 

5.  Reform the configuration system.  Right now, Zeppelin configuration is set 
in:  
	- ZeppelinConfiguration.java (developers must edit)
	- The xml configuration (administrator must edit)
	- The env configuration file (administrator must edit)
	- Multiple json files such as interpreter.json (edited through the 
interface)

The result is kind of a mish-mash, and it creates user support issues when 
people enter conflicting configurations or configurations in the wrong place.

It's also a developer issue because we haven't defined what takes precedence 
over what.  

I suggest we introduce a part of the architecture which acts as an arbitrator 
for all configuraiton issues -- when any class needs to access or change 
configuration, it can go through one place.  Then we can figure out how we 
want to present configuration to the users. 

6.  Disable most interpreters other than Spark-related (and MD) by default.   
At this point, we've proliferated so many interpreters, that it complicates 
the build cycle and, well, just isn't necessary.

On Monday, February 29, 2016 8:04:09 AM EDT Prasad Wagle wrote:
> This is a great list.
> 
> In the enterprise ready section, what do you think about adding "High
> Availability and Disaster Recovery"? We can start with updating the
> documentation with best practices and scripts for a cold standby solution
> and work towards active-active
> <https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_a
> vailability_cold_warm_hot?lang=en> solution.
> 
> Another suggestion is to store meta-data for notes like creator, last
> updated (time and user) and number of views. We can show this information
> in the top level page in a table format with ability to sort by any column.
> 
> On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <bb...@gmail.com> wrote:
> > I concur with this suggestion. In the enterprise, management would like to
> > see scheduled runs to be tracked, monitored, and given SLA constraints for
> > the mission critical. Alerts and notifications are crucial for DevOps to
> > respond with error clarification within it. If the Zeppelin notebooks can
> > be executed by a third party scheduling application, such as Oozie, then
> > this requirement can be satisfied if there are no immediate plans for a
> > built-in one.
> > 
> > On Feb 29, 2016, at 1:17 AM, Eran Witkon <er...@gmail.com> wrote:
> > 
> > @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
> > to existing scheduling tools\workflow tools such as
> > https://oozie.apache.org/. this requires betters hooks and status
> > reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
> > 
> > 
> > On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> > 
> > vinayakagrawal88@gmail.com> wrote:
> >> Moon,
> >> The new roadmap looks very promising. I am very happy to see security in
> >> the list.
> >> I have some suggestions regarding Enterprise Ready features:
> >> 
> >> 1. Job Scheduler - Can this be improved?
> >> Currently the scheduler can be used with Cron expression or a pre-set
> >> time. But in an enterprise solution, a notebook might be one piece of the
> >> workflow. Can we look towards the functionality of scheduling notebook's
> >> based on other notebooks finishing their job successfully?
> >> This requirement would arise in any ETL workflow, where all the
> >> downstream users wait for the ETL notebook to finish successfully. Only
> >> after that, other business oriented notebooks can be executed.
> >> 
> >> 2. Importing a notebook - Is there a current requirement or future plan
> >> to implement a feature that allows import-notebook-from-github? This
> >> would
> >> allow users to share notebooks seamlessly.
> >> 
> >> Thanks
> >> Vinayak
> >> 
> >> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
> >>> Zhong Wang,
> >>> Right, Folder support would be quite useful. Thanks for the opinion.
> >> 
> >> Hope i can finish the work pr-190
> >> 
> >>> <https://github.com/apache/incubator-zeppelin/pull/190>.
> >>> 
> >>> 
> >>> Sourav,
> >>> Regarding concurrent running, Zeppelin doesn't have limitation of run
> >>> paragraph/query concurrently. Interpreter can implement it's own
> >>> scheduling
> >>> policy. For example, SparkSQL interpreter and ShellInterpreter can
> >>> already
> >>> run paragraph/query concurrently.
> >>> 
> >>> SparkInterpreter is implemented with FIFO scheduler considering nature
> >>> of scala compiler. That's why user can not run multiple paragraph
> >>> concurrently when they work with SparkInterpreter.
> >>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> >>> separate scala compiler so paragraphs run concurrently, while they're in
> >>> different notebooks.
> >>> Thanks for the feedback!
> >>> 
> >>> Best,
> >>> moon
> >> 
> >> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
> >> 
> >>> wrote:
> >> Sourav: I think this newly merged PR can help you
> >> 
> >>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-1855
> >>>> 82537
> >>>> 
> >>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> >>> 
> >>>> sourav.mazumder00@gmail.com> wrote:
> >>> Hi Moon,
> >>> 
> >>>>> This looks great.
> >>>>> 
> >>>>> My only suggestion would be to include a PR/feature - Support for
> >>>>> Running Concurrent paragraphs/queries in Zeppelin.
> >>>>> 
> >>>>> Right now if more than one user tries to run paragraphs in multiple
> >>>>> notebooks concurrently through a single Zeppelin instance (and single
> >>>>> interpreter instance) the performance is very slow. It is obvious that
> >>>>> the
> >>>>> queue gets built up within the zeppelin process and interpreter
> >>>>> process in
> >>>>> that scenario as the time taken to move the status from start to
> >>>>> pending
> >>>>> and pending to running is very high compared to the actual running
> >>>>> time of
> >>>>> a paragraph.
> >>>>> 
> >>>>> Without this the multi tenancy support would be meaningless as no one
> >>>>> can practically use it in a situation where multiple users are trying
> >>>>> to
> >>>>> connect to the same instance of Zeppelin (and the related
> >>>>> interpreter). A
> >>>>> possible solution would be to spawn separate instance of the same
> >>>>> interpreter at every notebook/user level.
> >>>>> 
> >>>>> Regards,
> >>>>> Sourav
> >>>> 
> >>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
> >>>> 
> >>>> Hi Zeppelin users and developers,
> >>>> 
> >>>>>> The roadmap we have published at
> >>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> >>>>>> is almost 9 month old, and it doesn't reflect where the community
> >>>>>> goes anymore. It's time to update.
> >>>>>> 
> >>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
> >>>>>> users, conferences and meetings, I could summarize the major interest
> >>>>>> of
> >>>>>> users and developers in 7 categories. Enterprise ready, Usability
> >>>>>> improvement, Pluggability, Documentation, Backend integration,
> >>>>>> Notebook
> >>>>>> storage, and Visualization.
> >>>>>> 
> >>>>>> And i could list related subjects under each categories.
> >>>>>> 
> >>>>>>    - Enterprise ready
> >>>>>>    
> >>>>>>       - Authentication
> >>>>>>       
> >>>>>>          - Shiro authentication ZEPPELIN-548
> >>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> >>>>>>       
> >>>>>>       - Authorization
> >>>>>>       
> >>>>>>          - Notebook authorization PR-681
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
> >>>>>>       
> >>>>>>       - Security
> >>>>>>       - Multi-tenancy
> >>>>>>       - Stability
> >>>>>>    
> >>>>>>    - Usability Improvement
> >>>>>>    
> >>>>>>    
> >>>>>>    - UX improvement
> >>>>>>    
> >>>>>>       - Better Table data support
> >>>>>>    
> >>>>>>    - Download data as csv, etc PR-725
> >>>>>>    
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
> >>>>>>          PR-714
> >>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
> >>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
> >>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
> >>>>>>    
> >>>>>>    - Featureful table data display (pagenation, etc)
> >>>>>>    
> >>>>>>    
> >>>>>>    - Pluggability ZEPPELIN-533
> >>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> >>>>>>    
> >>>>>>       - Pluggable visualization
> >>>>>>    
> >>>>>>    - Dynamic Interpreter, notebook, visualization loading
> >>>>>>    
> >>>>>>    
> >>>>>>    - Repository and registry for pluggable components
> >>>>>>    
> >>>>>>    
> >>>>>>    - Improve documentation
> >>>>>>    
> >>>>>>       - Improve contents and readability
> >>>>>>       - more tutorials, examples
> >>>>>>    
> >>>>>>    - Interpreter
> >>>>>>    
> >>>>>>       - Generic JDBC Interpreter
> >>>>>>       - (spark)R Interpreter
> >>>>>>       - Cluster manager for interpreter (Proposal
> >>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+M
> >>>>>>       anager+Proposal> )
> >>>>>>       - more interpreters
> >>>>>>    
> >>>>>>    - Notebook storage
> >>>>>>    
> >>>>>>       - Versioning ZEPPELIN-540
> >>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> >>>>>>       - more notebook storages
> >>>>>>    
> >>>>>>    - Visualization
> >>>>>>    
> >>>>>>    
> >>>>>>    - More visualizations PR-152
> >>>>>>    
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
> >>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
> >>>>>>    
> >>>>>>    - Customize graph (show/hide label, color, etc)
> >>>>>> 
> >>>>>> It will help anyone quickly get overall interest of project and the
> >>>>>> direction. And based on this roadmap, we can discuss and re-define
> >>>>>> the next
> >>>>>> release 0.6.0 scope and it's schedule.
> >>>>>> 
> >>>>>> What do you think? Any feedback would be appreciated.
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> moon
> >> 
> >> --
> >> Vinayak Agrawal
> >> 
> >> 
> >> "To Strive, To Seek, To Find and Not to Yield!"
> >> ~Lord Alfred Tennyson

Re: [DISCUSS] Update Roadmap

Posted by Prasad Wagle <pr...@gmail.com>.

This is a great list.

In the enterprise ready section, what do you think about adding "High
Availability and Disaster Recovery"? We can start with updating the
documentation with best practices and scripts for a cold standby solution
and work towards active-active
<https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en>
 solution.

Another suggestion is to store meta-data for notes like creator, last
updated (time and user) and number of views. We can show this information
in the top level page in a table format with ability to sort by any column.

On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <bb...@gmail.com> wrote:

> I concur with this suggestion. In the enterprise, management would like to
> see scheduled runs to be tracked, monitored, and given SLA constraints for
> the mission critical. Alerts and notifications are crucial for DevOps to
> respond with error clarification within it. If the Zeppelin notebooks can
> be executed by a third party scheduling application, such as Oozie, then
> this requirement can be satisfied if there are no immediate plans for a
> built-in one.
>
> On Feb 29, 2016, at 1:17 AM, Eran Witkon <er...@gmail.com> wrote:
>
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
> to existing scheduling tools\workflow tools such as
> https://oozie.apache.org/. this requires betters hooks and status
> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>
>
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> vinayakagrawal88@gmail.com> wrote:
>
>> Moon,
>> The new roadmap looks very promising. I am very happy to see security in
>> the list.
>> I have some suggestions regarding Enterprise Ready features:
>>
>> 1. Job Scheduler - Can this be improved?
>> Currently the scheduler can be used with Cron expression or a pre-set
>> time. But in an enterprise solution, a notebook might be one piece of the
>> workflow. Can we look towards the functionality of scheduling notebook's
>> based on other notebooks finishing their job successfully?
>> This requirement would arise in any ETL workflow, where all the
>> downstream users wait for the ETL notebook to finish successfully. Only
>> after that, other business oriented notebooks can be executed.
>>
>> 2. Importing a notebook - Is there a current requirement or future plan
>> to implement a feature that allows import-notebook-from-github? This would
>> allow users to share notebooks seamlessly.
>>
>> Thanks
>> Vinayak
>>
>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Zhong Wang,
>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>
>> Hope i can finish the work pr-190
>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>
>>
>>> Sourav,
>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>> run paragraph/query concurrently.
>>>
>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>> of scala compiler. That's why user can not run multiple paragraph
>>> concurrently when they work with SparkInterpreter.
>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>> separate scala compiler so paragraphs run concurrently, while they're in
>>> different notebooks.
>>> Thanks for the feedback!
>>>
>>> Best,
>>> moon
>>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>> Sourav: I think this newly merged PR can help you
>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>
>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>> sourav.mazumder00@gmail.com> wrote:
>>>>
>>> Hi Moon,
>>>>>
>>>>> This looks great.
>>>>>
>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>
>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>> that scenario as the time taken to move the status from start to pending
>>>>> and pending to running is very high compared to the actual running time of
>>>>> a paragraph.
>>>>>
>>>>> Without this the multi tenancy support would be meaningless as no one
>>>>> can practically use it in a situation where multiple users are trying to
>>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>>> possible solution would be to spawn separate instance of the same
>>>>> interpreter at every notebook/user level.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>> Hi Zeppelin users and developers,
>>>>>>
>>>>>> The roadmap we have published at
>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>> goes anymore. It's time to update.
>>>>>>
>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>> storage, and Visualization.
>>>>>>
>>>>>> And i could list related subjects under each categories.
>>>>>>
>>>>>
>>>>>>    - Enterprise ready
>>>>>>       - Authentication
>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>       - Authorization
>>>>>>          - Notebook authorization PR-681
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>       - Security
>>>>>>       - Multi-tenancy
>>>>>>       - Stability
>>>>>>    - Usability Improvement
>>>>>>
>>>>>>
>>>>>>    - UX improvement
>>>>>>       - Better Table data support
>>>>>>
>>>>>>
>>>>>>    - Download data as csv, etc PR-725
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>>          PR-714
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>
>>>>>>
>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>
>>>>>>
>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>       - Pluggable visualization
>>>>>>
>>>>>>
>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>
>>>>>>
>>>>>>    - Repository and registry for pluggable components
>>>>>>
>>>>>>
>>>>>>    - Improve documentation
>>>>>>       - Improve contents and readability
>>>>>>       - more tutorials, examples
>>>>>>    - Interpreter
>>>>>>       - Generic JDBC Interpreter
>>>>>>       - (spark)R Interpreter
>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>       )
>>>>>>       - more interpreters
>>>>>>    - Notebook storage
>>>>>>       - Versioning ZEPPELIN-540
>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>       - more notebook storages
>>>>>>    - Visualization
>>>>>>
>>>>>>
>>>>>>    - More visualizations PR-152
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>
>>>>>>
>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>
>>>>>> It will help anyone quickly get overall interest of project and the
>>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>>> release 0.6.0 scope and it's schedule.
>>>>>>
>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>>
>>
>>
>> --
>> Vinayak Agrawal
>>
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>> ~Lord Alfred Tennyson
>>
>
>

Re: [DISCUSS] Update Roadmap

Posted by Prasad Wagle <pr...@gmail.com>.

This is a great list.

In the enterprise ready section, what do you think about adding "High
Availability and Disaster Recovery"? We can start with updating the
documentation with best practices and scripts for a cold standby solution
and work towards active-active
<https://www.ibm.com/developerworks/community/blogs/RohitShetty/entry/high_availability_cold_warm_hot?lang=en>
 solution.

Another suggestion is to store meta-data for notes like creator, last
updated (time and user) and number of views. We can show this information
in the top level page in a table format with ability to sort by any column.

On Mon, Feb 29, 2016 at 7:15 AM, Benjamin Kim <bb...@gmail.com> wrote:

> I concur with this suggestion. In the enterprise, management would like to
> see scheduled runs to be tracked, monitored, and given SLA constraints for
> the mission critical. Alerts and notifications are crucial for DevOps to
> respond with error clarification within it. If the Zeppelin notebooks can
> be executed by a third party scheduling application, such as Oozie, then
> this requirement can be satisfied if there are no immediate plans for a
> built-in one.
>
> On Feb 29, 2016, at 1:17 AM, Eran Witkon <er...@gmail.com> wrote:
>
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
> to existing scheduling tools\workflow tools such as
> https://oozie.apache.org/. this requires betters hooks and status
> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>
>
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> vinayakagrawal88@gmail.com> wrote:
>
>> Moon,
>> The new roadmap looks very promising. I am very happy to see security in
>> the list.
>> I have some suggestions regarding Enterprise Ready features:
>>
>> 1. Job Scheduler - Can this be improved?
>> Currently the scheduler can be used with Cron expression or a pre-set
>> time. But in an enterprise solution, a notebook might be one piece of the
>> workflow. Can we look towards the functionality of scheduling notebook's
>> based on other notebooks finishing their job successfully?
>> This requirement would arise in any ETL workflow, where all the
>> downstream users wait for the ETL notebook to finish successfully. Only
>> after that, other business oriented notebooks can be executed.
>>
>> 2. Importing a notebook - Is there a current requirement or future plan
>> to implement a feature that allows import-notebook-from-github? This would
>> allow users to share notebooks seamlessly.
>>
>> Thanks
>> Vinayak
>>
>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Zhong Wang,
>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>
>> Hope i can finish the work pr-190
>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>
>>
>>> Sourav,
>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>> run paragraph/query concurrently.
>>>
>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>> of scala compiler. That's why user can not run multiple paragraph
>>> concurrently when they work with SparkInterpreter.
>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>> separate scala compiler so paragraphs run concurrently, while they're in
>>> different notebooks.
>>> Thanks for the feedback!
>>>
>>> Best,
>>> moon
>>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>> Sourav: I think this newly merged PR can help you
>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>
>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>> sourav.mazumder00@gmail.com> wrote:
>>>>
>>> Hi Moon,
>>>>>
>>>>> This looks great.
>>>>>
>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>
>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>> that scenario as the time taken to move the status from start to pending
>>>>> and pending to running is very high compared to the actual running time of
>>>>> a paragraph.
>>>>>
>>>>> Without this the multi tenancy support would be meaningless as no one
>>>>> can practically use it in a situation where multiple users are trying to
>>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>>> possible solution would be to spawn separate instance of the same
>>>>> interpreter at every notebook/user level.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>> Hi Zeppelin users and developers,
>>>>>>
>>>>>> The roadmap we have published at
>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>> goes anymore. It's time to update.
>>>>>>
>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>> storage, and Visualization.
>>>>>>
>>>>>> And i could list related subjects under each categories.
>>>>>>
>>>>>
>>>>>>    - Enterprise ready
>>>>>>       - Authentication
>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>       - Authorization
>>>>>>          - Notebook authorization PR-681
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>       - Security
>>>>>>       - Multi-tenancy
>>>>>>       - Stability
>>>>>>    - Usability Improvement
>>>>>>
>>>>>>
>>>>>>    - UX improvement
>>>>>>       - Better Table data support
>>>>>>
>>>>>>
>>>>>>    - Download data as csv, etc PR-725
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>>          PR-714
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>
>>>>>>
>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>
>>>>>>
>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>       - Pluggable visualization
>>>>>>
>>>>>>
>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>
>>>>>>
>>>>>>    - Repository and registry for pluggable components
>>>>>>
>>>>>>
>>>>>>    - Improve documentation
>>>>>>       - Improve contents and readability
>>>>>>       - more tutorials, examples
>>>>>>    - Interpreter
>>>>>>       - Generic JDBC Interpreter
>>>>>>       - (spark)R Interpreter
>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>       )
>>>>>>       - more interpreters
>>>>>>    - Notebook storage
>>>>>>       - Versioning ZEPPELIN-540
>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>       - more notebook storages
>>>>>>    - Visualization
>>>>>>
>>>>>>
>>>>>>    - More visualizations PR-152
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>
>>>>>>
>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>
>>>>>> It will help anyone quickly get overall interest of project and the
>>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>>> release 0.6.0 scope and it's schedule.
>>>>>>
>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>>
>>
>>
>> --
>> Vinayak Agrawal
>>
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>> ~Lord Alfred Tennyson
>>
>
>

Re: [DISCUSS] Update Roadmap

Posted by Benjamin Kim <bb...@gmail.com>.

FYI. Here are some points I got out of Databricks’ Cloud product as it pertains to Notebooks and Job Scheduling.
Job Scheduling
email alerts
all code and results are shown for each job run
notebooks can reference and run other notebooks creating basic workflows
job runs can be drilled down into and metrics visualized
Notebooks
can comment on code alongside
read directly using mounts, for instance, from s3
revision history included and can sync to a github repo

Hope this is helpful.

Cheers,
Ben


> On Feb 29, 2016, at 10:37 AM, Guilherme Silveira <gu...@gmail.com> wrote:
> 
> I agree.  Jobs schedulling should be a core feature.
> 
> Em 29 de fev de 2016 12:15, "Benjamin Kim" <bbuild11@gmail.com <ma...@gmail.com>> escreveu:
> I concur with this suggestion. In the enterprise, management would like to see scheduled runs to be tracked, monitored, and given SLA constraints for the mission critical. Alerts and notifications are crucial for DevOps to respond with error clarification within it. If the Zeppelin notebooks can be executed by a third party scheduling application, such as Oozie, then this requirement can be satisfied if there are no immediate plans for a built-in one.
> 
>> On Feb 29, 2016, at 1:17 AM, Eran Witkon <eranwitkon@gmail.com <ma...@gmail.com>> wrote:
>> 
>> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin to existing scheduling tools\workflow tools such as  https://oozie.apache.org/ <https://oozie.apache.org/>. this requires betters hooks and status reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>> 
>> 
>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vinayakagrawal88@gmail.com <ma...@gmail.com>> wrote:
>> Moon,
>> The new roadmap looks very promising. I am very happy to see security in the list.
>> I have some suggestions regarding Enterprise Ready features:
>> 
>> 1. Job Scheduler - Can this be improved? 
>> Currently the scheduler can be used with Cron expression or a pre-set time. But in an enterprise solution, a notebook might be one piece of the workflow. Can we look towards the functionality of scheduling notebook's based on other notebooks finishing their job successfully?
>> This requirement would arise in any ETL workflow, where all the downstream users wait for the ETL notebook to finish successfully. Only after that, other business oriented notebooks can be executed.  
>> 
>> 2. Importing a notebook - Is there a current requirement or future plan to implement a feature that allows import-notebook-from-github? This would allow users to share notebooks seamlessly. 
>> 
>> Thanks 
>> Vinayak
>> 
>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
>> Zhong Wang, 
>> Right, Folder support would be quite useful. Thanks for the opinion. 
>> Hope i can finish the work pr-190 <https://github.com/apache/incubator-zeppelin/pull/190>.
>> 
>> Sourav,
>> Regarding concurrent running, Zeppelin doesn't have limitation of run paragraph/query concurrently. Interpreter can implement it's own scheduling policy. For example, SparkSQL interpreter and ShellInterpreter can already run paragraph/query concurrently.
>> 
>> SparkInterpreter is implemented with FIFO scheduler considering nature of scala compiler. That's why user can not run multiple paragraph concurrently when they work with SparkInterpreter.
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have separate scala compiler so paragraphs run concurrently, while they're in different notebooks.
>> Thanks for the feedback!
>> 
>> Best,
>> moon
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong.neu@gmail.com <ma...@gmail.com>> wrote:
>> Sourav: I think this newly merged PR can help you https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 <https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537>
>> 
>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <sourav.mazumder00@gmail.com <ma...@gmail.com>> wrote:
>> Hi Moon,
>> 
>> This looks great.
>> 
>> My only suggestion would be to include a PR/feature - Support for Running Concurrent paragraphs/queries in Zeppelin. 
>> 
>> Right now if more than one user tries to run paragraphs in multiple notebooks concurrently through a single Zeppelin instance (and single interpreter instance) the performance is very slow. It is obvious that the queue gets built up within the zeppelin process and interpreter process in that scenario as the time taken to move the status from start to pending and pending to running is very high compared to the actual running time of a paragraph.
>> 
>> Without this the multi tenancy support would be meaningless as no one can practically use it in a situation where multiple users are trying to connect to the same instance of Zeppelin (and the related interpreter). A possible solution would be to spawn separate instance of the same interpreter at every notebook/user level.
>> 
>> Regards,
>> Sourav
>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
>> Hi Zeppelin users and developers,
>> 
>> The roadmap we have published at
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap <https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap>
>> is almost 9 month old, and it doesn't reflect where the community goes anymore. It's time to update.
>> 
>> Based on mailing list, jira issues, pullrequests, feedbacks from users, conferences and meetings, I could summarize the major interest of users and developers in 7 categories. Enterprise ready, Usability improvement, Pluggability, Documentation, Backend integration, Notebook storage, and Visualization.
>> 
>> And i could list related subjects under each categories.
>> Enterprise ready
>> Authentication 
>> Shiro authentication ZEPPELIN-548 <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>> Authorization 
>> Notebook authorization PR-681 <https://github.com/apache/incubator-zeppelin/pull/681>
>> Security
>> Multi-tenancy
>> Stability
>> Usability Improvement
>> UX improvement
>> Better Table data support
>> Download data as csv, etc PR-725 <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>> Featureful table data display (pagenation, etc)
>> Pluggability ZEPPELIN-533 <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>> Pluggable visualization
>> Dynamic Interpreter, notebook, visualization loading
>> Repository and registry for pluggable components
>> Improve documentation
>> Improve contents and readability
>> more tutorials, examples
>> Interpreter
>> Generic JDBC Interpreter
>> (spark)R Interpreter
>> Cluster manager for interpreter (Proposal <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>)
>> more interpreters
>> Notebook storage
>> Versioning ZEPPELIN-540 <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>> more notebook storages
>> Visualization
>> More visualizations PR-152 <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 <https://github.com/apache/incubator-zeppelin/pull/321>
>> Customize graph (show/hide label, color, etc)
>> It will help anyone quickly get overall interest of project and the direction. And based on this roadmap, we can discuss and re-define the next release 0.6.0 scope and it's schedule.
>> 
>> What do you think? Any feedback would be appreciated.
>> 
>> Thanks,
>> moon
>> 
>> 
>> 
>> 
>> -- 
>> Vinayak Agrawal
>> 
>> 
>> "To Strive, To Seek, To Find and Not to Yield!" 
>> ~Lord Alfred Tennyson
>

Re: [DISCUSS] Update Roadmap

Posted by Guilherme Silveira <gu...@gmail.com>.

I agree.  Jobs schedulling should be a core feature.
Em 29 de fev de 2016 12:15, "Benjamin Kim" <bb...@gmail.com> escreveu:

> I concur with this suggestion. In the enterprise, management would like to
> see scheduled runs to be tracked, monitored, and given SLA constraints for
> the mission critical. Alerts and notifications are crucial for DevOps to
> respond with error clarification within it. If the Zeppelin notebooks can
> be executed by a third party scheduling application, such as Oozie, then
> this requirement can be satisfied if there are no immediate plans for a
> built-in one.
>
> On Feb 29, 2016, at 1:17 AM, Eran Witkon <er...@gmail.com> wrote:
>
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
> to existing scheduling tools\workflow tools such as
> https://oozie.apache.org/. this requires betters hooks and status
> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>
>
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> vinayakagrawal88@gmail.com> wrote:
>
>> Moon,
>> The new roadmap looks very promising. I am very happy to see security in
>> the list.
>> I have some suggestions regarding Enterprise Ready features:
>>
>> 1. Job Scheduler - Can this be improved?
>> Currently the scheduler can be used with Cron expression or a pre-set
>> time. But in an enterprise solution, a notebook might be one piece of the
>> workflow. Can we look towards the functionality of scheduling notebook's
>> based on other notebooks finishing their job successfully?
>> This requirement would arise in any ETL workflow, where all the
>> downstream users wait for the ETL notebook to finish successfully. Only
>> after that, other business oriented notebooks can be executed.
>>
>> 2. Importing a notebook - Is there a current requirement or future plan
>> to implement a feature that allows import-notebook-from-github? This would
>> allow users to share notebooks seamlessly.
>>
>> Thanks
>> Vinayak
>>
>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Zhong Wang,
>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>
>> Hope i can finish the work pr-190
>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>
>>
>>> Sourav,
>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>> run paragraph/query concurrently.
>>>
>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>> of scala compiler. That's why user can not run multiple paragraph
>>> concurrently when they work with SparkInterpreter.
>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>> separate scala compiler so paragraphs run concurrently, while they're in
>>> different notebooks.
>>> Thanks for the feedback!
>>>
>>> Best,
>>> moon
>>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>> Sourav: I think this newly merged PR can help you
>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>
>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>> sourav.mazumder00@gmail.com> wrote:
>>>>
>>> Hi Moon,
>>>>>
>>>>> This looks great.
>>>>>
>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>
>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>> that scenario as the time taken to move the status from start to pending
>>>>> and pending to running is very high compared to the actual running time of
>>>>> a paragraph.
>>>>>
>>>>> Without this the multi tenancy support would be meaningless as no one
>>>>> can practically use it in a situation where multiple users are trying to
>>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>>> possible solution would be to spawn separate instance of the same
>>>>> interpreter at every notebook/user level.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>> Hi Zeppelin users and developers,
>>>>>>
>>>>>> The roadmap we have published at
>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>> goes anymore. It's time to update.
>>>>>>
>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>> storage, and Visualization.
>>>>>>
>>>>>> And i could list related subjects under each categories.
>>>>>>
>>>>>
>>>>>>    - Enterprise ready
>>>>>>       - Authentication
>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>       - Authorization
>>>>>>          - Notebook authorization PR-681
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>       - Security
>>>>>>       - Multi-tenancy
>>>>>>       - Stability
>>>>>>    - Usability Improvement
>>>>>>
>>>>>>
>>>>>>    - UX improvement
>>>>>>       - Better Table data support
>>>>>>
>>>>>>
>>>>>>    - Download data as csv, etc PR-725
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>>          PR-714
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>
>>>>>>
>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>
>>>>>>
>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>       - Pluggable visualization
>>>>>>
>>>>>>
>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>
>>>>>>
>>>>>>    - Repository and registry for pluggable components
>>>>>>
>>>>>>
>>>>>>    - Improve documentation
>>>>>>       - Improve contents and readability
>>>>>>       - more tutorials, examples
>>>>>>    - Interpreter
>>>>>>       - Generic JDBC Interpreter
>>>>>>       - (spark)R Interpreter
>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>       )
>>>>>>       - more interpreters
>>>>>>    - Notebook storage
>>>>>>       - Versioning ZEPPELIN-540
>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>       - more notebook storages
>>>>>>    - Visualization
>>>>>>
>>>>>>
>>>>>>    - More visualizations PR-152
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>
>>>>>>
>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>
>>>>>> It will help anyone quickly get overall interest of project and the
>>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>>> release 0.6.0 scope and it's schedule.
>>>>>>
>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>>
>>
>>
>> --
>> Vinayak Agrawal
>>
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>> ~Lord Alfred Tennyson
>>
>
>

Re: [DISCUSS] Update Roadmap

Posted by Benjamin Kim <bb...@gmail.com>.

I concur with this suggestion. In the enterprise, management would like to see scheduled runs to be tracked, monitored, and given SLA constraints for the mission critical. Alerts and notifications are crucial for DevOps to respond with error clarification within it. If the Zeppelin notebooks can be executed by a third party scheduling application, such as Oozie, then this requirement can be satisfied if there are no immediate plans for a built-in one.

> On Feb 29, 2016, at 1:17 AM, Eran Witkon <er...@gmail.com> wrote:
> 
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin to existing scheduling tools\workflow tools such as  https://oozie.apache.org/ <https://oozie.apache.org/>. this requires betters hooks and status reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
> 
> 
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vinayakagrawal88@gmail.com <ma...@gmail.com>> wrote:
> Moon,
> The new roadmap looks very promising. I am very happy to see security in the list.
> I have some suggestions regarding Enterprise Ready features:
> 
> 1. Job Scheduler - Can this be improved? 
> Currently the scheduler can be used with Cron expression or a pre-set time. But in an enterprise solution, a notebook might be one piece of the workflow. Can we look towards the functionality of scheduling notebook's based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream users wait for the ETL notebook to finish successfully. Only after that, other business oriented notebooks can be executed.  
> 
> 2. Importing a notebook - Is there a current requirement or future plan to implement a feature that allows import-notebook-from-github? This would allow users to share notebooks seamlessly. 
> 
> Thanks 
> Vinayak
> 
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
> Zhong Wang, 
> Right, Folder support would be quite useful. Thanks for the opinion. 
> Hope i can finish the work pr-190 <https://github.com/apache/incubator-zeppelin/pull/190>.
> 
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run paragraph/query concurrently. Interpreter can implement it's own scheduling policy. For example, SparkSQL interpreter and ShellInterpreter can already run paragraph/query concurrently.
> 
> SparkInterpreter is implemented with FIFO scheduler considering nature of scala compiler. That's why user can not run multiple paragraph concurrently when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have separate scala compiler so paragraphs run concurrently, while they're in different notebooks.
> Thanks for the feedback!
> 
> Best,
> moon
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong.neu@gmail.com <ma...@gmail.com>> wrote:
> Sourav: I think this newly merged PR can help you https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 <https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537>
> 
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <sourav.mazumder00@gmail.com <ma...@gmail.com>> wrote:
> Hi Moon,
> 
> This looks great.
> 
> My only suggestion would be to include a PR/feature - Support for Running Concurrent paragraphs/queries in Zeppelin. 
> 
> Right now if more than one user tries to run paragraphs in multiple notebooks concurrently through a single Zeppelin instance (and single interpreter instance) the performance is very slow. It is obvious that the queue gets built up within the zeppelin process and interpreter process in that scenario as the time taken to move the status from start to pending and pending to running is very high compared to the actual running time of a paragraph.
> 
> Without this the multi tenancy support would be meaningless as no one can practically use it in a situation where multiple users are trying to connect to the same instance of Zeppelin (and the related interpreter). A possible solution would be to spawn separate instance of the same interpreter at every notebook/user level.
> 
> Regards,
> Sourav
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
> Hi Zeppelin users and developers,
> 
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap <https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap>
> is almost 9 month old, and it doesn't reflect where the community goes anymore. It's time to update.
> 
> Based on mailing list, jira issues, pullrequests, feedbacks from users, conferences and meetings, I could summarize the major interest of users and developers in 7 categories. Enterprise ready, Usability improvement, Pluggability, Documentation, Backend integration, Notebook storage, and Visualization.
> 
> And i could list related subjects under each categories.
> Enterprise ready
> Authentication 
> Shiro authentication ZEPPELIN-548 <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> Authorization 
> Notebook authorization PR-681 <https://github.com/apache/incubator-zeppelin/pull/681>
> Security
> Multi-tenancy
> Stability
> Usability Improvement
> UX improvement
> Better Table data support
> Download data as csv, etc PR-725 <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
> Featureful table data display (pagenation, etc)
> Pluggability ZEPPELIN-533 <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> Pluggable visualization
> Dynamic Interpreter, notebook, visualization loading
> Repository and registry for pluggable components
> Improve documentation
> Improve contents and readability
> more tutorials, examples
> Interpreter
> Generic JDBC Interpreter
> (spark)R Interpreter
> Cluster manager for interpreter (Proposal <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>)
> more interpreters
> Notebook storage
> Versioning ZEPPELIN-540 <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> more notebook storages
> Visualization
> More visualizations PR-152 <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 <https://github.com/apache/incubator-zeppelin/pull/321>
> Customize graph (show/hide label, color, etc)
> It will help anyone quickly get overall interest of project and the direction. And based on this roadmap, we can discuss and re-define the next release 0.6.0 scope and it's schedule.
> 
> What do you think? Any feedback would be appreciated.
> 
> Thanks,
> moon
> 
> 
> 
> 
> -- 
> Vinayak Agrawal
> 
> 
> "To Strive, To Seek, To Find and Not to Yield!" 
> ~Lord Alfred Tennyson

Re: [DISCUSS] Update Roadmap

Posted by Benjamin Kim <bb...@gmail.com>.

I concur with this suggestion. In the enterprise, management would like to see scheduled runs to be tracked, monitored, and given SLA constraints for the mission critical. Alerts and notifications are crucial for DevOps to respond with error clarification within it. If the Zeppelin notebooks can be executed by a third party scheduling application, such as Oozie, then this requirement can be satisfied if there are no immediate plans for a built-in one.

> On Feb 29, 2016, at 1:17 AM, Eran Witkon <er...@gmail.com> wrote:
> 
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin to existing scheduling tools\workflow tools such as  https://oozie.apache.org/ <https://oozie.apache.org/>. this requires betters hooks and status reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
> 
> 
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vinayakagrawal88@gmail.com <ma...@gmail.com>> wrote:
> Moon,
> The new roadmap looks very promising. I am very happy to see security in the list.
> I have some suggestions regarding Enterprise Ready features:
> 
> 1. Job Scheduler - Can this be improved? 
> Currently the scheduler can be used with Cron expression or a pre-set time. But in an enterprise solution, a notebook might be one piece of the workflow. Can we look towards the functionality of scheduling notebook's based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream users wait for the ETL notebook to finish successfully. Only after that, other business oriented notebooks can be executed.  
> 
> 2. Importing a notebook - Is there a current requirement or future plan to implement a feature that allows import-notebook-from-github? This would allow users to share notebooks seamlessly. 
> 
> Thanks 
> Vinayak
> 
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
> Zhong Wang, 
> Right, Folder support would be quite useful. Thanks for the opinion. 
> Hope i can finish the work pr-190 <https://github.com/apache/incubator-zeppelin/pull/190>.
> 
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run paragraph/query concurrently. Interpreter can implement it's own scheduling policy. For example, SparkSQL interpreter and ShellInterpreter can already run paragraph/query concurrently.
> 
> SparkInterpreter is implemented with FIFO scheduler considering nature of scala compiler. That's why user can not run multiple paragraph concurrently when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have separate scala compiler so paragraphs run concurrently, while they're in different notebooks.
> Thanks for the feedback!
> 
> Best,
> moon
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong.neu@gmail.com <ma...@gmail.com>> wrote:
> Sourav: I think this newly merged PR can help you https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 <https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537>
> 
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <sourav.mazumder00@gmail.com <ma...@gmail.com>> wrote:
> Hi Moon,
> 
> This looks great.
> 
> My only suggestion would be to include a PR/feature - Support for Running Concurrent paragraphs/queries in Zeppelin. 
> 
> Right now if more than one user tries to run paragraphs in multiple notebooks concurrently through a single Zeppelin instance (and single interpreter instance) the performance is very slow. It is obvious that the queue gets built up within the zeppelin process and interpreter process in that scenario as the time taken to move the status from start to pending and pending to running is very high compared to the actual running time of a paragraph.
> 
> Without this the multi tenancy support would be meaningless as no one can practically use it in a situation where multiple users are trying to connect to the same instance of Zeppelin (and the related interpreter). A possible solution would be to spawn separate instance of the same interpreter at every notebook/user level.
> 
> Regards,
> Sourav
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
> Hi Zeppelin users and developers,
> 
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap <https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap>
> is almost 9 month old, and it doesn't reflect where the community goes anymore. It's time to update.
> 
> Based on mailing list, jira issues, pullrequests, feedbacks from users, conferences and meetings, I could summarize the major interest of users and developers in 7 categories. Enterprise ready, Usability improvement, Pluggability, Documentation, Backend integration, Notebook storage, and Visualization.
> 
> And i could list related subjects under each categories.
> Enterprise ready
> Authentication 
> Shiro authentication ZEPPELIN-548 <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> Authorization 
> Notebook authorization PR-681 <https://github.com/apache/incubator-zeppelin/pull/681>
> Security
> Multi-tenancy
> Stability
> Usability Improvement
> UX improvement
> Better Table data support
> Download data as csv, etc PR-725 <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
> Featureful table data display (pagenation, etc)
> Pluggability ZEPPELIN-533 <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> Pluggable visualization
> Dynamic Interpreter, notebook, visualization loading
> Repository and registry for pluggable components
> Improve documentation
> Improve contents and readability
> more tutorials, examples
> Interpreter
> Generic JDBC Interpreter
> (spark)R Interpreter
> Cluster manager for interpreter (Proposal <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>)
> more interpreters
> Notebook storage
> Versioning ZEPPELIN-540 <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> more notebook storages
> Visualization
> More visualizations PR-152 <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 <https://github.com/apache/incubator-zeppelin/pull/321>
> Customize graph (show/hide label, color, etc)
> It will help anyone quickly get overall interest of project and the direction. And based on this roadmap, we can discuss and re-define the next release 0.6.0 scope and it's schedule.
> 
> What do you think? Any feedback would be appreciated.
> 
> Thanks,
> moon
> 
> 
> 
> 
> -- 
> Vinayak Agrawal
> 
> 
> "To Strive, To Seek, To Find and Not to Yield!" 
> ~Lord Alfred Tennyson

Re: [DISCUSS] Update Roadmap

Posted by Sherif Akoush <sh...@gmail.com>.

Hi all,

We have been using Zeppelin for the past few months and it is has been
very useful. Well done guys.

We would like to see:
-Being able to customize charts, for example add axis label, change
number format,...
-Being able to export charts to an image easily (currently we use svg
export browser plugin).
-Being able to share notebooks in read access mode while still being
able to change the chart layout and type in/choose parameters from the
dynamic forms (currently it is only allowed for write access mode).

Regards,
Sherif

On Tue, Mar 22, 2016 at 12:38 AM, Nikolay Voronchikhin
<nv...@gmail.com> wrote:
> Hi Zeppelin Users and Developers,
>
> Do you know if MapR will be adding Zeppelin to its roadmap for the next
> version after MapR 5.1?
>
> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
> notebook.
> We are looking for an Apache Project that focuses on a Drill Notebook UI
> that performs better than the Drill Web Console UI itself.
>
> Sincerely,
> Nikolay Voronchikhin
> Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco
> https://www.linkedin.com/in/nvoronchikhin
> E-mail: nvoronchikhin@gmail.com
> Mobile: 951-288-2778
>
>
> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rc...@gmail.com> wrote:
>>
>> Dear All,
>>
>> I think direction setting is important for Enterprise readiness. I have a
>> little bit of an overview of Ambari Views, which is very similar in nature
>> to Zeppelin. Please let me explain:
>>
>> Hive View - interacts with Hive
>> Pig View - interacts with Pig
>> Workflow Designer - interacts with Oozie
>>
>> We have a very similar architecture in Zeppelin where we interact with
>> these systems through Interpreters. The usage will also be similar, as both
>> with interact with Hadoop clusters or in some cases Spark with Yarn on HDFS.
>> Our priorities should include:
>>
>> - Design & implement for multi-tenancy
>> - Auditability from Data/State and Lineage perspective
>> - Ability to share Notebooks/Data/State across users, preferably through
>> SparkContext sharing
>> - Security between Zeppelin and the other systems, not limited to Spark
>> through Kerberos. (@Rick +1)
>>
>> I will share an initial draft of the thoughts I have in mind, in the next
>> couple of days.
>>
>> Thanks,
>> Rohit.
>>
>>
>>
>> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>> Shabeel, thanks for the feedback about rest api and custom id. that might
>>> help avoid multiple rest api calls.
>>>
>>> Thanks everyone for valuable feedback. Looks like all we're going to the
>>> same direction. I have updated wiki.
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>> Please take a look.
>>>
>>> I'm sure there're many missing details in this roadmap. I must say
>>> something not on this roadmap doesn't mean community is not working on or
>>> can't be included in the Zeppelin. Roadmap represents more like community
>>> interest and overall direction.
>>> We're not changing roadmap everyday, but that doesn't mean roadmap is set
>>> in stone and never be changed. We can improve it continuously.
>>>
>>> Please feel free to fork the this mail thread for any further discussion
>>> on specific subject. (e.g. job scheduling)
>>>
>>> Thanks,
>>> moon
>>>
>>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>
>>> wrote:
>>>>
>>>> Also we need better rest api support for creating and fetching the
>>>> notebooks and paragraphs.
>>>> for example if I can set custom defined notebookid and paragraphid , we
>>>> can avoid multiple rest api calls.
>>>>
>>>>
>>>> http://localhost:8080/#/notebook/<notebookid>/paragraph/<paragraphid>?asIframe
>>>> should return me error if notebook or paragraph deos not exists.
>>>>
>>>> and while creating notebook or paragraph I should be able to mention my
>>>> custom ids.
>>>>
>>>> Regards
>>>> Shabeel
>>>>
>>>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
>>>> wrote:
>>>>>
>>>>> +1 on @rick. quality is really important... I am still encountering
>>>>> bugs consistently
>>>>>
>>>>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV
>>>>> <te...@gmail.com> wrote:
>>>>>>
>>>>>> +1 on @rick
>>>>>>
>>>>>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I see in the Enterprise section that multi-tenancy will be included,
>>>>>>> will this have user impersonation too? In this way, the user executing will
>>>>>>> be the user owning the process.
>>>>>>>
>>>>>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Hi Tamas,
>>>>>>>    Pluggable external visualization is really a GREAT feature to
>>>>>>> have. I'm looking forward to this :)
>>>>>>>
>>>>>>> Regards
>>>>>>> Shabeel
>>>>>>>
>>>>>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi
>>>>>>> <ta...@odigeo.com> wrote:
>>>>>>>>
>>>>>>>> Hey,
>>>>>>>>
>>>>>>>> Really promising roadmap.
>>>>>>>>
>>>>>>>> I'd only push more visualization options. I agree built in
>>>>>>>> visualization is needed with limited charting options but I think we also
>>>>>>>> need somehow 'inject' external js visualizations also.
>>>>>>>>
>>>>>>>>
>>>>>>>> For scheduling Zeppelin notebooks  we use
>>>>>>>> https://github.com/airbnb/airflow through the job rest api. It's an
>>>>>>>> enterprise ready and very robust solution right now.
>>>>>>>>
>>>>>>>> Tamas
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> One point to clarify, I don't want to suggest Oozie in specific, I
>>>>>>>>> want to think about which features we develop and which ones we integrate
>>>>>>>>> external, preferred Apache, technology? We don't think about building our
>>>>>>>>> own storage services so why build our own scheduler?
>>>>>>>>> Eran
>>>>>>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>>>>>>> Now I can see a lot of demands around enterprise level job
>>>>>>>>>> scheduling. Either external or built-in, I completely agree having
>>>>>>>>>> enterprise level job scheduling support on the roadmap.
>>>>>>>>>> ZEPPELIN-137, ZEPPELIN-531 are related issues i can find in our
>>>>>>>>>> JIRA.
>>>>>>>>>>
>>>>>>>>>> @Vinayak
>>>>>>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>>>>>>> notebook storage layer (see related package). So, github notebook sync can
>>>>>>>>>> be implemented easily.
>>>>>>>>>>
>>>>>>>>>> @Shabeel
>>>>>>>>>> Right, we need better manage management to prevent such OOM.
>>>>>>>>>> And i think table is one of the most frequently used way of
>>>>>>>>>> displaying data. So definitely, we'll need more features like filter, sort,
>>>>>>>>>> etc.
>>>>>>>>>> After this roadmap discussion, discussion for the next release
>>>>>>>>>> will follow. Then we'll get idea when those features will be available.
>>>>>>>>>>
>>>>>>>>>> @Prasad
>>>>>>>>>> Thanks for mentioning HA and DR. They're really important subject
>>>>>>>>>> for enterprise use. Definitely Zeppelin will need to address them.
>>>>>>>>>> And displaying meta information of notebook on top level page is
>>>>>>>>>> good idea.
>>>>>>>>>>
>>>>>>>>>> It's really great to hear many opinions and ideas.
>>>>>>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> moon
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> For one, I know that there is rudimentary scheduling built into
>>>>>>>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>>>>>>>> feature a few months ago).
>>>>>>>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>>>>>>>> reproduceability and portability.
>>>>>>>>>>> Although this doesn't offer exciting new features, it would make
>>>>>>>>>>> development much easier.
>>>>>>>>>>>
>>>>>>>>>>> Cross-platform testability, Tests that pass when run
>>>>>>>>>>> sequentially, compatibility with Firefox, and many more open issues that
>>>>>>>>>>> make it so much harder to enhance Zeppelin and add features should be
>>>>>>>>>>> addressed soon, preferably before more features are added. Already Zeppelin
>>>>>>>>>>> is suffering - in my opinion - from quite a lot of feature creep, and we
>>>>>>>>>>> should avoid putting in the kitchen sink, at the cost of quality and
>>>>>>>>>>> maintainability. Instead modularity (ZEPPELIN-533 in particular) should be
>>>>>>>>>>> targeted.
>>>>>>>>>>>
>>>>>>>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in
>>>>>>>>>>> use on many clusters, but it's not getting the love it needs, and I wouldn't
>>>>>>>>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>>>>>>>>> tool should be able to use the REST-API to trigger executions, if you want
>>>>>>>>>>> external scheduling.
>>>>>>>>>>>
>>>>>>>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>>>>>>>> priorities, I fully agree, under the condition that code quality is included
>>>>>>>>>>> as a subset of enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO
>>>>>>>>>>> support is what we really want) with user and group rights assignment on the
>>>>>>>>>>> notebook level. We probably also need Knox-integration (ODP-Members looking
>>>>>>>>>>> at integrating Zeppelin should consider contributing this), and integration
>>>>>>>>>>> of something like Spree (https://github.com/hammerlab/spree) to be able to
>>>>>>>>>>> profile jobs.
>>>>>>>>>>>
>>>>>>>>>>> I'm hopeful that soon I can resume contributing some
>>>>>>>>>>> quality-oriented code, to drive this "necessary evil" forward ;)
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder
>>>>>>>>>>> <so...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>>>>>>>
>>>>>>>>>>>> Rather one should be able to call it from any scheduler
>>>>>>>>>>>> typically used in enterprise level. May be support for BPML.
>>>>>>>>>>>>
>>>>>>>>>>>> I believe the existing ability to call/execute a Zeppelin
>>>>>>>>>>>> Notebook or a specific paragraph within a notebook using REST API should
>>>>>>>>>>>> take care of this requirement to some extent.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Sourav
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal
>>>>>>>>>>>> <vi...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> @Eran Witkon,
>>>>>>>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>>>>>>>> If Zepplin can be integrated with oozie, that would be
>>>>>>>>>>>>> wonderful. Users will also be able to leverage their Oozie skills.
>>>>>>>>>>>>> This would be promising for now.
>>>>>>>>>>>>> However, in the future Hadoop might not necessarily be
>>>>>>>>>>>>> installed in Spark Cluster and Oozie (since its installs with Hadoop
>>>>>>>>>>>>> Distribution) might not be available.
>>>>>>>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>>>>>>>> scheduling?
>>>>>>>>>>>>>
>>>>>>>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>>>>>>>> notebook feature.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>>>>>>>> github" feature?
>>>>>>>>>>>>> -Exporting notebook to Github
>>>>>>>>>>>>> -Importing notebook from Github
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Vinayak
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon
>>>>>>>>>>>>> <er...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>>>>>>>> https://oozie.apache.org/. this requires betters hooks and status reporting
>>>>>>>>>>>>>> but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal
>>>>>>>>>>>>>> <vi...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Moon,
>>>>>>>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>>>>>>>> security in the list.
>>>>>>>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>>>>>>>> This requirement would arise in any ETL workflow, where all
>>>>>>>>>>>>>>> the downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. Importing a notebook - Is there a current requirement or
>>>>>>>>>>>>>>> future plan to implement a feature that allows import-notebook-from-github?
>>>>>>>>>>>>>>> This would allow users to share notebooks seamlessly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Vinayak
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee
>>>>>>>>>>>>>>> <mo...@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Zhong Wang,
>>>>>>>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>>>>>>>> opinion.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hope i can finish the work pr-190.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sourav,
>>>>>>>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have
>>>>>>>>>>>>>>>> limitation of run paragraph/query concurrently. Interpreter can implement
>>>>>>>>>>>>>>>> it's own scheduling policy. For example, SparkSQL interpreter and
>>>>>>>>>>>>>>>> ShellInterpreter can already run paragraph/query concurrently.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler
>>>>>>>>>>>>>>>> considering nature of scala compiler. That's why user can not run multiple
>>>>>>>>>>>>>>>> paragraph concurrently when they work with SparkInterpreter.
>>>>>>>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook
>>>>>>>>>>>>>>>> will have separate scala compiler so paragraphs run concurrently, while
>>>>>>>>>>>>>>>> they're in different notebooks.
>>>>>>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang
>>>>>>>>>>>>>>>> <wa...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder
>>>>>>>>>>>>>>>>> <so...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My only suggestion would be to include a PR/feature -
>>>>>>>>>>>>>>>>>> Support for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Without this the multi tenancy support would be
>>>>>>>>>>>>>>>>>> meaningless as no one can practically use it in a situation where multiple
>>>>>>>>>>>>>>>>>> users are trying to connect to the same instance of Zeppelin (and the
>>>>>>>>>>>>>>>>>> related interpreter). A possible solution would be to spawn separate
>>>>>>>>>>>>>>>>>> instance of the same interpreter at every notebook/user level.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee
>>>>>>>>>>>>>>>>>> <mo...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests,
>>>>>>>>>>>>>>>>>>> feedbacks from users, conferences and meetings, I could summarize the major
>>>>>>>>>>>>>>>>>>> interest of users and developers in 7 categories. Enterprise ready,
>>>>>>>>>>>>>>>>>>> Usability improvement, Pluggability, Documentation, Backend integration,
>>>>>>>>>>>>>>>>>>> Notebook storage, and Visualization.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Enterprise ready
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Authentication
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Authorization
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Notebook authorization PR-681
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Security
>>>>>>>>>>>>>>>>>>> Multi-tenancy
>>>>>>>>>>>>>>>>>>> Stability
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Usability Improvement
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> UX improvement
>>>>>>>>>>>>>>>>>>> Better Table data support
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Download data as csv, etc PR-725, PR-714, PR-6, PR-89
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Pluggability ZEPPELIN-533
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Pluggable visualization
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Repository and registry for pluggable components
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Improve documentation
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Improve contents and readability
>>>>>>>>>>>>>>>>>>> more tutorials, examples
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Interpreter
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Generic JDBC Interpreter
>>>>>>>>>>>>>>>>>>> (spark)R Interpreter
>>>>>>>>>>>>>>>>>>> Cluster manager for interpreter (Proposal)
>>>>>>>>>>>>>>>>>>> more interpreters
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Notebook storage
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Versioning ZEPPELIN-540
>>>>>>>>>>>>>>>>>>> more notebook storages
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Visualization
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> More visualizations PR-152, PR-728, PR-336, PR-321
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It will help anyone quickly get overall interest of
>>>>>>>>>>>>>>>>>>> project and the direction. And based on this roadmap, we can discuss and
>>>>>>>>>>>>>>>>>>> re-define the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>>>> Big Data Analytics
>>>>>>>>>>>>> IBM
>>>>>>>>>>>>>
>>>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

Re: [DISCUSS] Update Roadmap

Posted by rohit choudhary <rc...@gmail.com>.

Hi Moon,

Yes, I intend to contribute code for both Livy integration and subsequently
for securing the rest of the interpreters as well. Look forward for
suggestions.

Thanks,
Rohit.

On Wed, Mar 30, 2016 at 4:06 AM, moon soo Lee <mo...@apache.org> wrote:

> Hi Rohit,
>
> I read the documentation attached and it looks very promising and spark
> interpreter based on Livy is interesting!
> Thanks for sharing the document.
>
> Do you have any plan to contribute code?
>
> Thanks,
> moon
>
>
> On Tue, Mar 29, 2016 at 5:50 AM rohit choudhary <rc...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I've submitted a design approach for Multi-tenancy and Security for
>> Zeppelin - https://issues.apache.org/jira/browse/ZEPPELIN-773.
>>
>> Look forward for the reviews and suggestions on the topic.
>>
>> Thanks,
>> Rohit.
>>
>> On Sat, Mar 26, 2016 at 10:04 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> There is an discussion thread for Release Policy.
>>> https://s.apache.org/3JCm please check this thread, too.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Thu, Mar 24, 2016 at 12:02 PM Guilherme Silveira <
>>> guilhermecgsspam@gmail.com> wrote:
>>>
>>>> Is there a predefined release interval,  lets say,  6 months or 1
>>>> year,  between one version and another?
>>>> Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" <
>>>> Joel.VanVeluwen@quantium.com.au> escreveu:
>>>>
>>>>> Hi Nikolay,
>>>>>
>>>>>
>>>>>
>>>>> I raised this with MapR and there doesn’t appear to be plans to add
>>>>> Zeppelin to 5.1
>>>>>
>>>>>
>>>>>
>>>>> https://community.mapr.com/message/40332
>>>>>
>>>>>
>>>>>
>>>>> We are deploying it manually and everything is pretty stable – but it
>>>>> will vary depending on your environment.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>>
>>>>> Joel Van Veluwen
>>>>> *QUANTIUM*
>>>>> Level 25, 8 Chifley
>>>>> 8-12 Chifley Square
>>>>> Sydney NSW 2000
>>>>>
>>>>> T: +61 2 8224 8981
>>>>> M: +61 403 153 265
>>>>> F: +61 2 9292 6444
>>>>>
>>>>> W: quantium.com.au <http://www.quantium.com.au>
>>>>> ------------------------------
>>>>>
>>>>> linkedin.com/company/quantium
>>>>> <http://www.linkedin.com/company/quantium>
>>>>> facebook.com/QuantiumAustralia
>>>>> <http://www.facebook.com/QuantiumAustralia>
>>>>> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU>
>>>>>
>>>>> The contents of this email, including attachments, may be confidential
>>>>> information. If you are not the intended recipient, any use, disclosure or
>>>>> copying of the information is unauthorised. If you have received this email
>>>>> in error, we would be grateful if you would notify us immediately by email
>>>>> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete
>>>>> the message from your system.
>>>>>
>>>>>
>>>>>
>>>>> *From:* Nikolay Voronchikhin [mailto:nvoronchikhin@gmail.com]
>>>>> *Sent:* Tuesday, 22 March 2016 11:39 AM
>>>>> *To:* users@zeppelin.incubator.apache.org
>>>>> *Subject:* Re: [DISCUSS] Update Roadmap
>>>>>
>>>>>
>>>>>
>>>>> Hi Zeppelin Users and Developers,
>>>>>
>>>>>
>>>>>
>>>>> Do you know if MapR will be adding Zeppelin to its roadmap for the
>>>>> next version after MapR 5.1?
>>>>>
>>>>>
>>>>>
>>>>> We see in Hue 3.9 that it provides notebooks for R Shell, Python
>>>>> Shell, PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill
>>>>> SQL notebook.
>>>>>
>>>>> We are looking for an Apache Project that focuses on a Drill Notebook
>>>>> UI that performs better than the Drill Web Console UI itself.
>>>>>
>>>>>
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> *Nikolay Voronchikhin*
>>>>>
>>>>> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
>>>>>
>>>>> *https://www.linkedin.com/in/nvoronchikhin
>>>>> <https://www.linkedin.com/in/nvoronchikhin>*
>>>>>
>>>>> *E-mail: nvoronchikhin@gmail.com <nv...@gmail.com>*
>>>>>
>>>>> *Mobile: 951-288-2778 <951-288-2778>*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rc...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Dear All,
>>>>>
>>>>>
>>>>>
>>>>> I think direction setting is important for Enterprise readiness. I
>>>>> have a little bit of an overview of Ambari Views, which is very similar in
>>>>> nature to Zeppelin. Please let me explain:
>>>>>
>>>>>
>>>>>
>>>>> Hive View - interacts with Hive
>>>>>
>>>>> Pig View - interacts with Pig
>>>>>
>>>>> Workflow Designer - interacts with Oozie
>>>>>
>>>>>
>>>>>
>>>>> We have a very similar architecture in Zeppelin where we interact with
>>>>> these systems through Interpreters. The usage will also be similar, as both
>>>>> with interact with Hadoop clusters or in some cases Spark with Yarn on
>>>>> HDFS. Our priorities should include:
>>>>>
>>>>>
>>>>>
>>>>> - Design & implement for multi-tenancy
>>>>>
>>>>> - Auditability from Data/State and Lineage perspective
>>>>>
>>>>> - Ability to share Notebooks/Data/State across users, preferably
>>>>> through SparkContext sharing
>>>>>
>>>>> - Security between Zeppelin and the other systems, not limited to
>>>>> Spark through Kerberos. (@Rick +1)
>>>>>
>>>>>
>>>>>
>>>>> I will share an initial draft of the thoughts I have in mind, in the
>>>>> next couple of days.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Rohit.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>>> Shabeel, thanks for the feedback about rest api and custom id. that
>>>>> might help avoid multiple rest api calls.
>>>>>
>>>>>
>>>>>
>>>>> Thanks everyone for valuable feedback. Looks like all we're going to
>>>>> the same direction. I have updated wiki.
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>
>>>>> Please take a look.
>>>>>
>>>>>
>>>>>
>>>>> I'm sure there're many missing details in this roadmap. I must say
>>>>> something not on this roadmap doesn't mean community is not working on or
>>>>> can't be included in the Zeppelin. Roadmap represents more like community
>>>>> interest and overall direction.
>>>>>
>>>>> We're not changing roadmap everyday, but that doesn't mean roadmap is
>>>>> set in stone and never be changed. We can improve it continuously.
>>>>>
>>>>>
>>>>>
>>>>> Please feel free to fork the this mail thread for any further
>>>>> discussion on specific subject. (e.g. job scheduling)
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> moon
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Also we need better rest api support for creating and fetching the
>>>>> notebooks and paragraphs.
>>>>>
>>>>> for example if I can set custom defined notebookid and paragraphid ,
>>>>> we can avoid multiple rest api calls.
>>>>>
>>>>>
>>>>>
>>>>> http://localhost:8080/#/notebook/
>>>>> <notebookid>/paragraph/<paragraphid>?asIframe
>>>>>
>>>>> should return me error if notebook or paragraph deos not exists.
>>>>>
>>>>>
>>>>>
>>>>> and while creating notebook or paragraph I should be able to mention
>>>>> my custom ids.
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> Shabeel
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> +1 on @rick. quality is really important... I am still encountering
>>>>> bugs consistently
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <
>>>>> tejasrivastav@gmail.com> wrote:
>>>>>
>>>>> +1 on @rick
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I see in the Enterprise section that multi-tenancy will be included,
>>>>> will this have user impersonation too? In this way, the user executing will
>>>>> be the user owning the process.
>>>>>
>>>>>
>>>>>
>>>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> +1
>>>>>
>>>>>
>>>>>
>>>>> Hi Tamas,
>>>>>
>>>>>    Pluggable external visualization is really a GREAT feature to have.
>>>>> I'm looking forward to this :)
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> Shabeel
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <
>>>>> tamas.szuromi@odigeo.com> wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>>
>>>>>
>>>>> Really promising roadmap.
>>>>>
>>>>>
>>>>>
>>>>> I'd only push more visualization options. I agree built in
>>>>> visualization is needed with limited charting options but I think we also
>>>>> need somehow 'inject' external js visualizations also.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> For scheduling Zeppelin notebooks  we use
>>>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow>
>>>>>  through the job rest api. It's an enterprise ready and very robust
>>>>> solution right now.
>>>>>
>>>>>
>>>>>
>>>>> *Tamas*
>>>>>
>>>>>
>>>>>
>>>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>>>
>>>>> One point to clarify, I don't want to suggest Oozie in specific, I
>>>>> want to think about which features we develop and which ones we integrate
>>>>> external, preferred Apache, technology? We don't think about building our
>>>>> own storage services so why build our own scheduler?
>>>>> Eran
>>>>>
>>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>>
>>>>> Now I can see a lot of demands around enterprise level job scheduling.
>>>>> Either external or built-in, I completely agree having enterprise level job
>>>>> scheduling support on the roadmap.
>>>>>
>>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>>> related issues i can find in our JIRA.
>>>>>
>>>>>
>>>>>
>>>>> @Vinayak
>>>>>
>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>> notebook storage layer (see related package
>>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>>> So, github notebook sync can be implemented easily.
>>>>>
>>>>>
>>>>>
>>>>> @Shabeel
>>>>>
>>>>> Right, we need better manage management to prevent such OOM.
>>>>>
>>>>> And i think table is one of the most frequently used way of displaying
>>>>> data. So definitely, we'll need more features like filter, sort, etc.
>>>>>
>>>>> After this roadmap discussion, discussion for the next release will
>>>>> follow. Then we'll get idea when those features will be available.
>>>>>
>>>>>
>>>>>
>>>>> @Prasad
>>>>>
>>>>> Thanks for mentioning HA and DR. They're really important subject for
>>>>> enterprise use. Definitely Zeppelin will need to address them.
>>>>>
>>>>> And displaying meta information of notebook on top level page is good
>>>>> idea.
>>>>>
>>>>>
>>>>>
>>>>> It's really great to hear many opinions and ideas.
>>>>>
>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> moon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> For one, I know that there is rudimentary scheduling built into
>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>> feature a few months ago).
>>>>>
>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>> reproduceability and portability.
>>>>>
>>>>> Although this doesn't offer exciting new features, it would make
>>>>> development much easier.
>>>>>
>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>> compatibility with Firefox, and many more open issues that make it so much
>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>
>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>>>>> on many clusters, but it's not getting the love it needs, and I wouldn't
>>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>>> tool should be able to use the REST-API to trigger executions, if you want
>>>>> external scheduling.
>>>>>
>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>> priorities, I fully agree, under the condition that code quality is
>>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>> this), and integration of something like Spree (
>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>
>>>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>>>> code, to drive this "necessary evil" forward ;)
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>
>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>
>>>>> Rather one should be able to call it from any scheduler typically used
>>>>> in enterprise level. May be support for BPML.
>>>>>
>>>>> I believe the existing ability to call/execute a Zeppelin Notebook or
>>>>> a specific paragraph within a notebook using REST API should take care of
>>>>> this requirement to some extent.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Sourav
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>
>>>>> @Eran Witkon,
>>>>>
>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>
>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>> Users will also be able to leverage their Oozie skills.
>>>>>
>>>>> This would be promising for now.
>>>>>
>>>>> However, in the future Hadoop might not necessarily be installed in
>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>>>> not be available.
>>>>>
>>>>> So perhaps we should give a thought about this feature for the future.
>>>>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>>>>
>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>> notebook feature.
>>>>>
>>>>>
>>>>>
>>>>> Also, would anybody give any suggestions regarding "sync with github"
>>>>> feature?
>>>>>
>>>>> -Exporting notebook to Github
>>>>>
>>>>> -Importing notebook from Github
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Vinayak
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> @*Vinayak Agrawal *I would suggest adding the ability to connect
>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>
>>>>> Moon,
>>>>>
>>>>> The new roadmap looks very promising. I am very happy to see security
>>>>> in the list.
>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>
>>>>>
>>>>> 1. Job Scheduler - Can this be improved?
>>>>>
>>>>> Currently the scheduler can be used with Cron expression or a pre-set
>>>>> time. But in an enterprise solution, a notebook might be one piece of the
>>>>> workflow. Can we look towards the functionality of scheduling notebook's
>>>>> based on other notebooks finishing their job successfully?
>>>>>
>>>>> This requirement would arise in any ETL workflow, where all the
>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>> after that, other business oriented notebooks can be executed.
>>>>>
>>>>> 2. Importing a notebook - Is there a current requirement or future
>>>>> plan to implement a feature that allows import-notebook-from-github? This
>>>>> would allow users to share notebooks seamlessly.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Vinayak
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>
>>>>> wrote:
>>>>>
>>>>> Zhong Wang,
>>>>>
>>>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>>>
>>>>> Hope i can finish the work pr-190
>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>
>>>>>
>>>>>
>>>>> Sourav,
>>>>>
>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>>>> run paragraph/query concurrently.
>>>>>
>>>>>
>>>>>
>>>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>>>> of scala compiler. That's why user can not run multiple paragraph
>>>>> concurrently when they work with SparkInterpreter.
>>>>>
>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>>>> separate scala compiler so paragraphs run concurrently, while they're in
>>>>> different notebooks.
>>>>>
>>>>> Thanks for the feedback!
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> moon
>>>>>
>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Sourav: I think this newly merged PR can help you
>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>
>>>>> Hi Moon,
>>>>>
>>>>> This looks great.
>>>>>
>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>
>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>> that scenario as the time taken to move the status from start to pending
>>>>> and pending to running is very high compared to the actual running time of
>>>>> a paragraph.
>>>>>
>>>>> Without this the multi tenancy support would be meaningless as no one
>>>>> can practically use it in a situation where multiple users are trying to
>>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>>> possible solution would be to spawn separate instance of the same
>>>>> interpreter at every notebook/user level.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Sourav
>>>>>
>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>
>>>>> wrote:
>>>>>
>>>>> Hi Zeppelin users and developers,
>>>>>
>>>>>
>>>>>
>>>>> The roadmap we have published at
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>
>>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>>> anymore. It's time to update.
>>>>>
>>>>>
>>>>>
>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>> storage, and Visualization.
>>>>>
>>>>>
>>>>>
>>>>> And i could list related subjects under each categories.
>>>>>
>>>>>
>>>>>    - Enterprise ready
>>>>>
>>>>>
>>>>>    - Authentication
>>>>>
>>>>>
>>>>>    - Shiro authentication ZEPPELIN-548
>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>
>>>>>
>>>>>    - Authorization
>>>>>
>>>>>
>>>>>    - Notebook authorization PR-681
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>
>>>>>
>>>>>    - Security
>>>>>       - Multi-tenancy
>>>>>       - Stability
>>>>>
>>>>>
>>>>>    - Usability Improvement
>>>>>
>>>>>
>>>>>    - UX improvement
>>>>>       - Better Table data support
>>>>>
>>>>>
>>>>>    - Download data as csv, etc PR-725
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>          PR-714
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>
>>>>>
>>>>>    - Featureful table data display (pagenation, etc)
>>>>>
>>>>>
>>>>>    - Pluggability ZEPPELIN-533
>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>
>>>>>
>>>>>    - Pluggable visualization
>>>>>
>>>>>
>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>
>>>>>
>>>>>    - Repository and registry for pluggable components
>>>>>
>>>>>
>>>>>    - Improve documentation
>>>>>
>>>>>
>>>>>    - Improve contents and readability
>>>>>       - more tutorials, examples
>>>>>
>>>>>
>>>>>    - Interpreter
>>>>>
>>>>>
>>>>>    - Generic JDBC Interpreter
>>>>>       - (spark)R Interpreter
>>>>>       - Cluster manager for interpreter (Proposal
>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>       )
>>>>>       - more interpreters
>>>>>
>>>>>
>>>>>    - Notebook storage
>>>>>
>>>>>
>>>>>    - Versioning ZEPPELIN-540
>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>       - more notebook storages
>>>>>
>>>>>
>>>>>    - Visualization
>>>>>
>>>>>
>>>>>    - More visualizations PR-152
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>
>>>>>
>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>
>>>>> It will help anyone quickly get overall interest of project and the
>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>> release 0.6.0 scope and it's schedule.
>>>>>
>>>>>
>>>>>
>>>>> What do you think? Any feedback would be appreciated.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> moon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Vinayak Agrawal
>>>>>
>>>>>
>>>>>
>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>
>>>>> ~Lord Alfred Tennyson
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Vinayak Agrawal
>>>>>
>>>>> Big Data Analytics
>>>>>
>>>>> IBM
>>>>>
>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>
>>>>> ~Lord Alfred Tennyson
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>

Re: [DISCUSS] Update Roadmap

Posted by moon soo Lee <mo...@apache.org>.

Hi Rohit,

I read the documentation attached and it looks very promising and spark
interpreter based on Livy is interesting!
Thanks for sharing the document.

Do you have any plan to contribute code?

Thanks,
moon


On Tue, Mar 29, 2016 at 5:50 AM rohit choudhary <rc...@gmail.com> wrote:

> Hi All,
>
> I've submitted a design approach for Multi-tenancy and Security for
> Zeppelin - https://issues.apache.org/jira/browse/ZEPPELIN-773.
>
> Look forward for the reviews and suggestions on the topic.
>
> Thanks,
> Rohit.
>
> On Sat, Mar 26, 2016 at 10:04 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> There is an discussion thread for Release Policy.
>> https://s.apache.org/3JCm please check this thread, too.
>>
>> Thanks,
>> moon
>>
>>
>> On Thu, Mar 24, 2016 at 12:02 PM Guilherme Silveira <
>> guilhermecgsspam@gmail.com> wrote:
>>
>>> Is there a predefined release interval,  lets say,  6 months or 1 year,
>>> between one version and another?
>>> Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" <
>>> Joel.VanVeluwen@quantium.com.au> escreveu:
>>>
>>>> Hi Nikolay,
>>>>
>>>>
>>>>
>>>> I raised this with MapR and there doesn’t appear to be plans to add
>>>> Zeppelin to 5.1
>>>>
>>>>
>>>>
>>>> https://community.mapr.com/message/40332
>>>>
>>>>
>>>>
>>>> We are deploying it manually and everything is pretty stable – but it
>>>> will vary depending on your environment.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>>
>>>> Joel Van Veluwen
>>>> *QUANTIUM*
>>>> Level 25, 8 Chifley
>>>> 8-12 Chifley Square
>>>> Sydney NSW 2000
>>>>
>>>> T: +61 2 8224 8981
>>>> M: +61 403 153 265
>>>> F: +61 2 9292 6444
>>>>
>>>> W: quantium.com.au <http://www.quantium.com.au>
>>>> ------------------------------
>>>>
>>>> linkedin.com/company/quantium
>>>> <http://www.linkedin.com/company/quantium>
>>>> facebook.com/QuantiumAustralia
>>>> <http://www.facebook.com/QuantiumAustralia>
>>>> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU>
>>>>
>>>> The contents of this email, including attachments, may be confidential
>>>> information. If you are not the intended recipient, any use, disclosure or
>>>> copying of the information is unauthorised. If you have received this email
>>>> in error, we would be grateful if you would notify us immediately by email
>>>> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete
>>>> the message from your system.
>>>>
>>>>
>>>>
>>>> *From:* Nikolay Voronchikhin [mailto:nvoronchikhin@gmail.com]
>>>> *Sent:* Tuesday, 22 March 2016 11:39 AM
>>>> *To:* users@zeppelin.incubator.apache.org
>>>> *Subject:* Re: [DISCUSS] Update Roadmap
>>>>
>>>>
>>>>
>>>> Hi Zeppelin Users and Developers,
>>>>
>>>>
>>>>
>>>> Do you know if MapR will be adding Zeppelin to its roadmap for the next
>>>> version after MapR 5.1?
>>>>
>>>>
>>>>
>>>> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
>>>> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
>>>> notebook.
>>>>
>>>> We are looking for an Apache Project that focuses on a Drill Notebook
>>>> UI that performs better than the Drill Web Console UI itself.
>>>>
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>> *Nikolay Voronchikhin*
>>>>
>>>> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
>>>>
>>>> *https://www.linkedin.com/in/nvoronchikhin
>>>> <https://www.linkedin.com/in/nvoronchikhin>*
>>>>
>>>> *E-mail: nvoronchikhin@gmail.com <nv...@gmail.com>*
>>>>
>>>> *Mobile: 951-288-2778 <951-288-2778>*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rc...@gmail.com>
>>>> wrote:
>>>>
>>>> Dear All,
>>>>
>>>>
>>>>
>>>> I think direction setting is important for Enterprise readiness. I have
>>>> a little bit of an overview of Ambari Views, which is very similar in
>>>> nature to Zeppelin. Please let me explain:
>>>>
>>>>
>>>>
>>>> Hive View - interacts with Hive
>>>>
>>>> Pig View - interacts with Pig
>>>>
>>>> Workflow Designer - interacts with Oozie
>>>>
>>>>
>>>>
>>>> We have a very similar architecture in Zeppelin where we interact with
>>>> these systems through Interpreters. The usage will also be similar, as both
>>>> with interact with Hadoop clusters or in some cases Spark with Yarn on
>>>> HDFS. Our priorities should include:
>>>>
>>>>
>>>>
>>>> - Design & implement for multi-tenancy
>>>>
>>>> - Auditability from Data/State and Lineage perspective
>>>>
>>>> - Ability to share Notebooks/Data/State across users, preferably
>>>> through SparkContext sharing
>>>>
>>>> - Security between Zeppelin and the other systems, not limited to Spark
>>>> through Kerberos. (@Rick +1)
>>>>
>>>>
>>>>
>>>> I will share an initial draft of the thoughts I have in mind, in the
>>>> next couple of days.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Rohit.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>> Shabeel, thanks for the feedback about rest api and custom id. that
>>>> might help avoid multiple rest api calls.
>>>>
>>>>
>>>>
>>>> Thanks everyone for valuable feedback. Looks like all we're going to
>>>> the same direction. I have updated wiki.
>>>>
>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>
>>>> Please take a look.
>>>>
>>>>
>>>>
>>>> I'm sure there're many missing details in this roadmap. I must say
>>>> something not on this roadmap doesn't mean community is not working on or
>>>> can't be included in the Zeppelin. Roadmap represents more like community
>>>> interest and overall direction.
>>>>
>>>> We're not changing roadmap everyday, but that doesn't mean roadmap is
>>>> set in stone and never be changed. We can improve it continuously.
>>>>
>>>>
>>>>
>>>> Please feel free to fork the this mail thread for any further
>>>> discussion on specific subject. (e.g. job scheduling)
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>
>>>> wrote:
>>>>
>>>> Also we need better rest api support for creating and fetching the
>>>> notebooks and paragraphs.
>>>>
>>>> for example if I can set custom defined notebookid and paragraphid , we
>>>> can avoid multiple rest api calls.
>>>>
>>>>
>>>>
>>>> http://localhost:8080/#/notebook/
>>>> <notebookid>/paragraph/<paragraphid>?asIframe
>>>>
>>>> should return me error if notebook or paragraph deos not exists.
>>>>
>>>>
>>>>
>>>> and while creating notebook or paragraph I should be able to mention my
>>>> custom ids.
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> Shabeel
>>>>
>>>>
>>>>
>>>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
>>>> wrote:
>>>>
>>>> +1 on @rick. quality is really important... I am still encountering
>>>> bugs consistently
>>>>
>>>>
>>>>
>>>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <
>>>> tejasrivastav@gmail.com> wrote:
>>>>
>>>> +1 on @rick
>>>>
>>>>
>>>>
>>>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com>
>>>> wrote:
>>>>
>>>> I see in the Enterprise section that multi-tenancy will be included,
>>>> will this have user impersonation too? In this way, the user executing will
>>>> be the user owning the process.
>>>>
>>>>
>>>>
>>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> Hi Tamas,
>>>>
>>>>    Pluggable external visualization is really a GREAT feature to have.
>>>> I'm looking forward to this :)
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> Shabeel
>>>>
>>>>
>>>>
>>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>
>>>> wrote:
>>>>
>>>> Hey,
>>>>
>>>>
>>>>
>>>> Really promising roadmap.
>>>>
>>>>
>>>>
>>>> I'd only push more visualization options. I agree built in
>>>> visualization is needed with limited charting options but I think we also
>>>> need somehow 'inject' external js visualizations also.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> For scheduling Zeppelin notebooks  we use
>>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
>>>> the job rest api. It's an enterprise ready and very robust solution
>>>> right now.
>>>>
>>>>
>>>>
>>>> *Tamas*
>>>>
>>>>
>>>>
>>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>>
>>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>>> to think about which features we develop and which ones we integrate
>>>> external, preferred Apache, technology? We don't think about building our
>>>> own storage services so why build our own scheduler?
>>>> Eran
>>>>
>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>
>>>> Now I can see a lot of demands around enterprise level job scheduling.
>>>> Either external or built-in, I completely agree having enterprise level job
>>>> scheduling support on the roadmap.
>>>>
>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>> related issues i can find in our JIRA.
>>>>
>>>>
>>>>
>>>> @Vinayak
>>>>
>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>> notebook storage layer (see related package
>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>> So, github notebook sync can be implemented easily.
>>>>
>>>>
>>>>
>>>> @Shabeel
>>>>
>>>> Right, we need better manage management to prevent such OOM.
>>>>
>>>> And i think table is one of the most frequently used way of displaying
>>>> data. So definitely, we'll need more features like filter, sort, etc.
>>>>
>>>> After this roadmap discussion, discussion for the next release will
>>>> follow. Then we'll get idea when those features will be available.
>>>>
>>>>
>>>>
>>>> @Prasad
>>>>
>>>> Thanks for mentioning HA and DR. They're really important subject for
>>>> enterprise use. Definitely Zeppelin will need to address them.
>>>>
>>>> And displaying meta information of notebook on top level page is good
>>>> idea.
>>>>
>>>>
>>>>
>>>> It's really great to hear many opinions and ideas.
>>>>
>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> For one, I know that there is rudimentary scheduling built into
>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>> feature a few months ago).
>>>>
>>>> But another point is, that Zeppelin should also focus on quality,
>>>> reproduceability and portability.
>>>>
>>>> Although this doesn't offer exciting new features, it would make
>>>> development much easier.
>>>>
>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>> compatibility with Firefox, and many more open issues that make it so much
>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>
>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>>>> on many clusters, but it's not getting the love it needs, and I wouldn't
>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>> tool should be able to use the REST-API to trigger executions, if you want
>>>> external scheduling.
>>>>
>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>> priorities, I fully agree, under the condition that code quality is
>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>> assignment on the notebook level. We probably also need Knox-integration
>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>> this), and integration of something like Spree (
>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>
>>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>>> code, to drive this "necessary evil" forward ;)
>>>>
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>> sourav.mazumder00@gmail.com> wrote:
>>>>
>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>
>>>> Rather one should be able to call it from any scheduler typically used
>>>> in enterprise level. May be support for BPML.
>>>>
>>>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>>>> specific paragraph within a notebook using REST API should take care of
>>>> this requirement to some extent.
>>>>
>>>> Regards,
>>>>
>>>> Sourav
>>>>
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>> vinayakagrawal88@gmail.com> wrote:
>>>>
>>>> @Eran Witkon,
>>>>
>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>
>>>> If Zepplin can be integrated with oozie, that would be wonderful. Users
>>>> will also be able to leverage their Oozie skills.
>>>>
>>>> This would be promising for now.
>>>>
>>>> However, in the future Hadoop might not necessarily be installed in
>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>>> not be available.
>>>>
>>>> So perhaps we should give a thought about this feature for the future.
>>>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>>>
>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>> notebook feature.
>>>>
>>>>
>>>>
>>>> Also, would anybody give any suggestions regarding "sync with github"
>>>> feature?
>>>>
>>>> -Exporting notebook to Github
>>>>
>>>> -Importing notebook from Github
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Vinayak
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>>> wrote:
>>>>
>>>> @*Vinayak Agrawal *I would suggest adding the ability to connect
>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>> vinayakagrawal88@gmail.com> wrote:
>>>>
>>>> Moon,
>>>>
>>>> The new roadmap looks very promising. I am very happy to see security
>>>> in the list.
>>>> I have some suggestions regarding Enterprise Ready features:
>>>>
>>>>
>>>> 1. Job Scheduler - Can this be improved?
>>>>
>>>> Currently the scheduler can be used with Cron expression or a pre-set
>>>> time. But in an enterprise solution, a notebook might be one piece of the
>>>> workflow. Can we look towards the functionality of scheduling notebook's
>>>> based on other notebooks finishing their job successfully?
>>>>
>>>> This requirement would arise in any ETL workflow, where all the
>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>> after that, other business oriented notebooks can be executed.
>>>>
>>>> 2. Importing a notebook - Is there a current requirement or future plan
>>>> to implement a feature that allows import-notebook-from-github? This would
>>>> allow users to share notebooks seamlessly.
>>>>
>>>> Thanks
>>>>
>>>> Vinayak
>>>>
>>>>
>>>>
>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>> Zhong Wang,
>>>>
>>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>>
>>>> Hope i can finish the work pr-190
>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>
>>>>
>>>>
>>>> Sourav,
>>>>
>>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>>> run paragraph/query concurrently.
>>>>
>>>>
>>>>
>>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>>> of scala compiler. That's why user can not run multiple paragraph
>>>> concurrently when they work with SparkInterpreter.
>>>>
>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>>> separate scala compiler so paragraphs run concurrently, while they're in
>>>> different notebooks.
>>>>
>>>> Thanks for the feedback!
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> moon
>>>>
>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>>> wrote:
>>>>
>>>> Sourav: I think this newly merged PR can help you
>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>
>>>>
>>>>
>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>> sourav.mazumder00@gmail.com> wrote:
>>>>
>>>> Hi Moon,
>>>>
>>>> This looks great.
>>>>
>>>> My only suggestion would be to include a PR/feature - Support for
>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>
>>>> Right now if more than one user tries to run paragraphs in multiple
>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>> queue gets built up within the zeppelin process and interpreter process in
>>>> that scenario as the time taken to move the status from start to pending
>>>> and pending to running is very high compared to the actual running time of
>>>> a paragraph.
>>>>
>>>> Without this the multi tenancy support would be meaningless as no one
>>>> can practically use it in a situation where multiple users are trying to
>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>> possible solution would be to spawn separate instance of the same
>>>> interpreter at every notebook/user level.
>>>>
>>>> Regards,
>>>>
>>>> Sourav
>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>> Hi Zeppelin users and developers,
>>>>
>>>>
>>>>
>>>> The roadmap we have published at
>>>>
>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>
>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>> anymore. It's time to update.
>>>>
>>>>
>>>>
>>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>>> conferences and meetings, I could summarize the major interest of users and
>>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>>> Visualization.
>>>>
>>>>
>>>>
>>>> And i could list related subjects under each categories.
>>>>
>>>>
>>>>    - Enterprise ready
>>>>
>>>>
>>>>    - Authentication
>>>>
>>>>
>>>>    - Shiro authentication ZEPPELIN-548
>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>
>>>>
>>>>    - Authorization
>>>>
>>>>
>>>>    - Notebook authorization PR-681
>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>
>>>>
>>>>    - Security
>>>>       - Multi-tenancy
>>>>       - Stability
>>>>
>>>>
>>>>    - Usability Improvement
>>>>
>>>>
>>>>    - UX improvement
>>>>       - Better Table data support
>>>>
>>>>
>>>>    - Download data as csv, etc PR-725
>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>          PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>          , PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>
>>>>
>>>>    - Featureful table data display (pagenation, etc)
>>>>
>>>>
>>>>    - Pluggability ZEPPELIN-533
>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>
>>>>
>>>>    - Pluggable visualization
>>>>
>>>>
>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>
>>>>
>>>>    - Repository and registry for pluggable components
>>>>
>>>>
>>>>    - Improve documentation
>>>>
>>>>
>>>>    - Improve contents and readability
>>>>       - more tutorials, examples
>>>>
>>>>
>>>>    - Interpreter
>>>>
>>>>
>>>>    - Generic JDBC Interpreter
>>>>       - (spark)R Interpreter
>>>>       - Cluster manager for interpreter (Proposal
>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>       )
>>>>       - more interpreters
>>>>
>>>>
>>>>    - Notebook storage
>>>>
>>>>
>>>>    - Versioning ZEPPELIN-540
>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>       - more notebook storages
>>>>
>>>>
>>>>    - Visualization
>>>>
>>>>
>>>>    - More visualizations PR-152
>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>
>>>>
>>>>    - Customize graph (show/hide label, color, etc)
>>>>
>>>> It will help anyone quickly get overall interest of project and the
>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>> release 0.6.0 scope and it's schedule.
>>>>
>>>>
>>>>
>>>> What do you think? Any feedback would be appreciated.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Vinayak Agrawal
>>>>
>>>>
>>>>
>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>
>>>> ~Lord Alfred Tennyson
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Vinayak Agrawal
>>>>
>>>> Big Data Analytics
>>>>
>>>> IBM
>>>>
>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>
>>>> ~Lord Alfred Tennyson
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

Re: [DISCUSS] Update Roadmap

Posted by rohit choudhary <rc...@gmail.com>.

Hi All,

I've submitted a design approach for Multi-tenancy and Security for
Zeppelin - https://issues.apache.org/jira/browse/ZEPPELIN-773.

Look forward for the reviews and suggestions on the topic.

Thanks,
Rohit.

On Sat, Mar 26, 2016 at 10:04 PM, moon soo Lee <mo...@apache.org> wrote:

> There is an discussion thread for Release Policy.
> https://s.apache.org/3JCm please check this thread, too.
>
> Thanks,
> moon
>
>
> On Thu, Mar 24, 2016 at 12:02 PM Guilherme Silveira <
> guilhermecgsspam@gmail.com> wrote:
>
>> Is there a predefined release interval,  lets say,  6 months or 1 year,
>> between one version and another?
>> Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" <
>> Joel.VanVeluwen@quantium.com.au> escreveu:
>>
>>> Hi Nikolay,
>>>
>>>
>>>
>>> I raised this with MapR and there doesn’t appear to be plans to add
>>> Zeppelin to 5.1
>>>
>>>
>>>
>>> https://community.mapr.com/message/40332
>>>
>>>
>>>
>>> We are deploying it manually and everything is pretty stable – but it
>>> will vary depending on your environment.
>>>
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Joel Van Veluwen
>>> *QUANTIUM*
>>> Level 25, 8 Chifley
>>> 8-12 Chifley Square
>>> Sydney NSW 2000
>>>
>>> T: +61 2 8224 8981
>>> M: +61 403 153 265
>>> F: +61 2 9292 6444
>>>
>>> W: quantium.com.au <http://www.quantium.com.au>
>>> ------------------------------
>>>
>>> linkedin.com/company/quantium <http://www.linkedin.com/company/quantium>
>>> facebook.com/QuantiumAustralia
>>> <http://www.facebook.com/QuantiumAustralia>
>>> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU>
>>>
>>> The contents of this email, including attachments, may be confidential
>>> information. If you are not the intended recipient, any use, disclosure or
>>> copying of the information is unauthorised. If you have received this email
>>> in error, we would be grateful if you would notify us immediately by email
>>> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete
>>> the message from your system.
>>>
>>>
>>>
>>> *From:* Nikolay Voronchikhin [mailto:nvoronchikhin@gmail.com]
>>> *Sent:* Tuesday, 22 March 2016 11:39 AM
>>> *To:* users@zeppelin.incubator.apache.org
>>> *Subject:* Re: [DISCUSS] Update Roadmap
>>>
>>>
>>>
>>> Hi Zeppelin Users and Developers,
>>>
>>>
>>>
>>> Do you know if MapR will be adding Zeppelin to its roadmap for the next
>>> version after MapR 5.1?
>>>
>>>
>>>
>>> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
>>> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
>>> notebook.
>>>
>>> We are looking for an Apache Project that focuses on a Drill Notebook UI
>>> that performs better than the Drill Web Console UI itself.
>>>
>>>
>>>
>>> Sincerely,
>>>
>>> *Nikolay Voronchikhin*
>>>
>>> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
>>>
>>> *https://www.linkedin.com/in/nvoronchikhin
>>> <https://www.linkedin.com/in/nvoronchikhin>*
>>>
>>> *E-mail: nvoronchikhin@gmail.com <nv...@gmail.com>*
>>>
>>> *Mobile: 951-288-2778 <951-288-2778>*
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rc...@gmail.com>
>>> wrote:
>>>
>>> Dear All,
>>>
>>>
>>>
>>> I think direction setting is important for Enterprise readiness. I have
>>> a little bit of an overview of Ambari Views, which is very similar in
>>> nature to Zeppelin. Please let me explain:
>>>
>>>
>>>
>>> Hive View - interacts with Hive
>>>
>>> Pig View - interacts with Pig
>>>
>>> Workflow Designer - interacts with Oozie
>>>
>>>
>>>
>>> We have a very similar architecture in Zeppelin where we interact with
>>> these systems through Interpreters. The usage will also be similar, as both
>>> with interact with Hadoop clusters or in some cases Spark with Yarn on
>>> HDFS. Our priorities should include:
>>>
>>>
>>>
>>> - Design & implement for multi-tenancy
>>>
>>> - Auditability from Data/State and Lineage perspective
>>>
>>> - Ability to share Notebooks/Data/State across users, preferably through
>>> SparkContext sharing
>>>
>>> - Security between Zeppelin and the other systems, not limited to Spark
>>> through Kerberos. (@Rick +1)
>>>
>>>
>>>
>>> I will share an initial draft of the thoughts I have in mind, in the
>>> next couple of days.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Rohit.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>> Shabeel, thanks for the feedback about rest api and custom id. that
>>> might help avoid multiple rest api calls.
>>>
>>>
>>>
>>> Thanks everyone for valuable feedback. Looks like all we're going to the
>>> same direction. I have updated wiki.
>>>
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>
>>> Please take a look.
>>>
>>>
>>>
>>> I'm sure there're many missing details in this roadmap. I must say
>>> something not on this roadmap doesn't mean community is not working on or
>>> can't be included in the Zeppelin. Roadmap represents more like community
>>> interest and overall direction.
>>>
>>> We're not changing roadmap everyday, but that doesn't mean roadmap is
>>> set in stone and never be changed. We can improve it continuously.
>>>
>>>
>>>
>>> Please feel free to fork the this mail thread for any further discussion
>>> on specific subject. (e.g. job scheduling)
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>
>>> wrote:
>>>
>>> Also we need better rest api support for creating and fetching the
>>> notebooks and paragraphs.
>>>
>>> for example if I can set custom defined notebookid and paragraphid , we
>>> can avoid multiple rest api calls.
>>>
>>>
>>>
>>> http://localhost:8080/#/notebook/
>>> <notebookid>/paragraph/<paragraphid>?asIframe
>>>
>>> should return me error if notebook or paragraph deos not exists.
>>>
>>>
>>>
>>> and while creating notebook or paragraph I should be able to mention my
>>> custom ids.
>>>
>>>
>>>
>>> Regards
>>>
>>> Shabeel
>>>
>>>
>>>
>>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>>> +1 on @rick. quality is really important... I am still encountering bugs
>>> consistently
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <te...@gmail.com>
>>> wrote:
>>>
>>> +1 on @rick
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com> wrote:
>>>
>>> I see in the Enterprise section that multi-tenancy will be included,
>>> will this have user impersonation too? In this way, the user executing will
>>> be the user owning the process.
>>>
>>>
>>>
>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com> wrote:
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> Hi Tamas,
>>>
>>>    Pluggable external visualization is really a GREAT feature to have.
>>> I'm looking forward to this :)
>>>
>>>
>>>
>>> Regards
>>>
>>> Shabeel
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>
>>> wrote:
>>>
>>> Hey,
>>>
>>>
>>>
>>> Really promising roadmap.
>>>
>>>
>>>
>>> I'd only push more visualization options. I agree built in
>>> visualization is needed with limited charting options but I think we also
>>> need somehow 'inject' external js visualizations also.
>>>
>>>
>>>
>>>
>>>
>>> For scheduling Zeppelin notebooks  we use
>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
>>> the job rest api. It's an enterprise ready and very robust solution
>>> right now.
>>>
>>>
>>>
>>> *Tamas*
>>>
>>>
>>>
>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>
>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>> to think about which features we develop and which ones we integrate
>>> external, preferred Apache, technology? We don't think about building our
>>> own storage services so why build our own scheduler?
>>> Eran
>>>
>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>
>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>
>>> Now I can see a lot of demands around enterprise level job scheduling.
>>> Either external or built-in, I completely agree having enterprise level job
>>> scheduling support on the roadmap.
>>>
>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>> related issues i can find in our JIRA.
>>>
>>>
>>>
>>> @Vinayak
>>>
>>> Regarding importing notebook from github, Zeppelin has pluggable
>>> notebook storage layer (see related package
>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>> So, github notebook sync can be implemented easily.
>>>
>>>
>>>
>>> @Shabeel
>>>
>>> Right, we need better manage management to prevent such OOM.
>>>
>>> And i think table is one of the most frequently used way of displaying
>>> data. So definitely, we'll need more features like filter, sort, etc.
>>>
>>> After this roadmap discussion, discussion for the next release will
>>> follow. Then we'll get idea when those features will be available.
>>>
>>>
>>>
>>> @Prasad
>>>
>>> Thanks for mentioning HA and DR. They're really important subject for
>>> enterprise use. Definitely Zeppelin will need to address them.
>>>
>>> And displaying meta information of notebook on top level page is good
>>> idea.
>>>
>>>
>>>
>>> It's really great to hear many opinions and ideas.
>>>
>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> For one, I know that there is rudimentary scheduling built into Zeppelin
>>> already (at least I fixed a bug in the test for a scheduling feature a few
>>> months ago).
>>>
>>> But another point is, that Zeppelin should also focus on quality,
>>> reproduceability and portability.
>>>
>>> Although this doesn't offer exciting new features, it would make
>>> development much easier.
>>>
>>> Cross-platform testability, Tests that pass when run sequentially,
>>> compatibility with Firefox, and many more open issues that make it so much
>>> harder to enhance Zeppelin and add features should be addressed soon,
>>> preferably before more features are added. Already Zeppelin is suffering -
>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>
>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
>>> many clusters, but it's not getting the love it needs, and I wouldn't bet
>>> on it, when it comes to integrating scheduling. Instead, any external tool
>>> should be able to use the REST-API to trigger executions, if you want
>>> external scheduling.
>>>
>>> So, in conclusion, if we take Moon's list as a list of descending
>>> priorities, I fully agree, under the condition that code quality is
>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>> SPNEGO SSO support is what we really want) with user and group rights
>>> assignment on the notebook level. We probably also need Knox-integration
>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>> this), and integration of something like Spree (
>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>
>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>> code, to drive this "necessary evil" forward ;)
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>
>>> Rather one should be able to call it from any scheduler typically used
>>> in enterprise level. May be support for BPML.
>>>
>>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>>> specific paragraph within a notebook using REST API should take care of
>>> this requirement to some extent.
>>>
>>> Regards,
>>>
>>> Sourav
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>> vinayakagrawal88@gmail.com> wrote:
>>>
>>> @Eran Witkon,
>>>
>>> Thanks for the suggestion Eran. I concur with your thought.
>>>
>>> If Zepplin can be integrated with oozie, that would be wonderful. Users
>>> will also be able to leverage their Oozie skills.
>>>
>>> This would be promising for now.
>>>
>>> However, in the future Hadoop might not necessarily be installed in
>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>> not be available.
>>>
>>> So perhaps we should give a thought about this feature for the future.
>>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>>
>>> As Benjamin has iterated, Databrick notebook has this as a core notebook
>>> feature.
>>>
>>>
>>>
>>> Also, would anybody give any suggestions regarding "sync with github"
>>> feature?
>>>
>>> -Exporting notebook to Github
>>>
>>> -Importing notebook from Github
>>>
>>>
>>>
>>> Thanks
>>>
>>> Vinayak
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>> wrote:
>>>
>>> @*Vinayak Agrawal *I would suggest adding the ability to connect
>>> zeppelin to existing scheduling tools\workflow tools such as
>>> https://oozie.apache.org/. this requires betters hooks and status
>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>> vinayakagrawal88@gmail.com> wrote:
>>>
>>> Moon,
>>>
>>> The new roadmap looks very promising. I am very happy to see security in
>>> the list.
>>> I have some suggestions regarding Enterprise Ready features:
>>>
>>>
>>> 1. Job Scheduler - Can this be improved?
>>>
>>> Currently the scheduler can be used with Cron expression or a pre-set
>>> time. But in an enterprise solution, a notebook might be one piece of the
>>> workflow. Can we look towards the functionality of scheduling notebook's
>>> based on other notebooks finishing their job successfully?
>>>
>>> This requirement would arise in any ETL workflow, where all the
>>> downstream users wait for the ETL notebook to finish successfully. Only
>>> after that, other business oriented notebooks can be executed.
>>>
>>> 2. Importing a notebook - Is there a current requirement or future plan
>>> to implement a feature that allows import-notebook-from-github? This would
>>> allow users to share notebooks seamlessly.
>>>
>>> Thanks
>>>
>>> Vinayak
>>>
>>>
>>>
>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>> Zhong Wang,
>>>
>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>
>>> Hope i can finish the work pr-190
>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>
>>>
>>>
>>> Sourav,
>>>
>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>> run paragraph/query concurrently.
>>>
>>>
>>>
>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>> of scala compiler. That's why user can not run multiple paragraph
>>> concurrently when they work with SparkInterpreter.
>>>
>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>> separate scala compiler so paragraphs run concurrently, while they're in
>>> different notebooks.
>>>
>>> Thanks for the feedback!
>>>
>>>
>>>
>>> Best,
>>>
>>> moon
>>>
>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>>> Sourav: I think this newly merged PR can help you
>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>
>>>
>>>
>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>>> Hi Moon,
>>>
>>> This looks great.
>>>
>>> My only suggestion would be to include a PR/feature - Support for
>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>
>>> Right now if more than one user tries to run paragraphs in multiple
>>> notebooks concurrently through a single Zeppelin instance (and single
>>> interpreter instance) the performance is very slow. It is obvious that the
>>> queue gets built up within the zeppelin process and interpreter process in
>>> that scenario as the time taken to move the status from start to pending
>>> and pending to running is very high compared to the actual running time of
>>> a paragraph.
>>>
>>> Without this the multi tenancy support would be meaningless as no one
>>> can practically use it in a situation where multiple users are trying to
>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>> possible solution would be to spawn separate instance of the same
>>> interpreter at every notebook/user level.
>>>
>>> Regards,
>>>
>>> Sourav
>>>
>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>> Hi Zeppelin users and developers,
>>>
>>>
>>>
>>> The roadmap we have published at
>>>
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>
>>> is almost 9 month old, and it doesn't reflect where the community goes
>>> anymore. It's time to update.
>>>
>>>
>>>
>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>> conferences and meetings, I could summarize the major interest of users and
>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>> Visualization.
>>>
>>>
>>>
>>> And i could list related subjects under each categories.
>>>
>>>
>>>    - Enterprise ready
>>>
>>>
>>>    - Authentication
>>>
>>>
>>>    - Shiro authentication ZEPPELIN-548
>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>
>>>
>>>    - Authorization
>>>
>>>
>>>    - Notebook authorization PR-681
>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>
>>>
>>>    - Security
>>>       - Multi-tenancy
>>>       - Stability
>>>
>>>
>>>    - Usability Improvement
>>>
>>>
>>>    - UX improvement
>>>       - Better Table data support
>>>
>>>
>>>    - Download data as csv, etc PR-725
>>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>
>>>
>>>    - Featureful table data display (pagenation, etc)
>>>
>>>
>>>    - Pluggability ZEPPELIN-533
>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>
>>>
>>>    - Pluggable visualization
>>>
>>>
>>>    - Dynamic Interpreter, notebook, visualization loading
>>>
>>>
>>>    - Repository and registry for pluggable components
>>>
>>>
>>>    - Improve documentation
>>>
>>>
>>>    - Improve contents and readability
>>>       - more tutorials, examples
>>>
>>>
>>>    - Interpreter
>>>
>>>
>>>    - Generic JDBC Interpreter
>>>       - (spark)R Interpreter
>>>       - Cluster manager for interpreter (Proposal
>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>       )
>>>       - more interpreters
>>>
>>>
>>>    - Notebook storage
>>>
>>>
>>>    - Versioning ZEPPELIN-540
>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>       - more notebook storages
>>>
>>>
>>>    - Visualization
>>>
>>>
>>>    - More visualizations PR-152
>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>
>>>
>>>    - Customize graph (show/hide label, color, etc)
>>>
>>> It will help anyone quickly get overall interest of project and the
>>> direction. And based on this roadmap, we can discuss and re-define the next
>>> release 0.6.0 scope and it's schedule.
>>>
>>>
>>>
>>> What do you think? Any feedback would be appreciated.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Vinayak Agrawal
>>>
>>>
>>>
>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>
>>> ~Lord Alfred Tennyson
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Vinayak Agrawal
>>>
>>> Big Data Analytics
>>>
>>> IBM
>>>
>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>
>>> ~Lord Alfred Tennyson
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>

Re: [DISCUSS] Update Roadmap

Posted by moon soo Lee <mo...@apache.org>.

There is an discussion thread for Release Policy.
https://s.apache.org/3JCm please check this thread, too.

Thanks,
moon

On Thu, Mar 24, 2016 at 12:02 PM Guilherme Silveira <
guilhermecgsspam@gmail.com> wrote:

> Is there a predefined release interval,  lets say,  6 months or 1 year,
> between one version and another?
> Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" <
> Joel.VanVeluwen@quantium.com.au> escreveu:
>
>> Hi Nikolay,
>>
>>
>>
>> I raised this with MapR and there doesn’t appear to be plans to add
>> Zeppelin to 5.1
>>
>>
>>
>> https://community.mapr.com/message/40332
>>
>>
>>
>> We are deploying it manually and everything is pretty stable – but it
>> will vary depending on your environment.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Joel Van Veluwen
>> *QUANTIUM*
>> Level 25, 8 Chifley
>> 8-12 Chifley Square
>> Sydney NSW 2000
>>
>> T: +61 2 8224 8981
>> M: +61 403 153 265
>> F: +61 2 9292 6444
>>
>> W: quantium.com.au <http://www.quantium.com.au>
>> ------------------------------
>>
>> linkedin.com/company/quantium <http://www.linkedin.com/company/quantium>
>> facebook.com/QuantiumAustralia
>> <http://www.facebook.com/QuantiumAustralia>
>> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU>
>>
>> The contents of this email, including attachments, may be confidential
>> information. If you are not the intended recipient, any use, disclosure or
>> copying of the information is unauthorised. If you have received this email
>> in error, we would be grateful if you would notify us immediately by email
>> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the
>> message from your system.
>>
>>
>>
>> *From:* Nikolay Voronchikhin [mailto:nvoronchikhin@gmail.com]
>> *Sent:* Tuesday, 22 March 2016 11:39 AM
>> *To:* users@zeppelin.incubator.apache.org
>> *Subject:* Re: [DISCUSS] Update Roadmap
>>
>>
>>
>> Hi Zeppelin Users and Developers,
>>
>>
>>
>> Do you know if MapR will be adding Zeppelin to its roadmap for the next
>> version after MapR 5.1?
>>
>>
>>
>> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
>> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
>> notebook.
>>
>> We are looking for an Apache Project that focuses on a Drill Notebook UI
>> that performs better than the Drill Web Console UI itself.
>>
>>
>>
>> Sincerely,
>>
>> *Nikolay Voronchikhin*
>>
>> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
>>
>> *https://www.linkedin.com/in/nvoronchikhin
>> <https://www.linkedin.com/in/nvoronchikhin>*
>>
>> *E-mail: nvoronchikhin@gmail.com <nv...@gmail.com>*
>>
>> *Mobile: 951-288-2778 <951-288-2778>*
>>
>>
>>
>>
>>
>> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rc...@gmail.com>
>> wrote:
>>
>> Dear All,
>>
>>
>>
>> I think direction setting is important for Enterprise readiness. I have a
>> little bit of an overview of Ambari Views, which is very similar in nature
>> to Zeppelin. Please let me explain:
>>
>>
>>
>> Hive View - interacts with Hive
>>
>> Pig View - interacts with Pig
>>
>> Workflow Designer - interacts with Oozie
>>
>>
>>
>> We have a very similar architecture in Zeppelin where we interact with
>> these systems through Interpreters. The usage will also be similar, as both
>> with interact with Hadoop clusters or in some cases Spark with Yarn on
>> HDFS. Our priorities should include:
>>
>>
>>
>> - Design & implement for multi-tenancy
>>
>> - Auditability from Data/State and Lineage perspective
>>
>> - Ability to share Notebooks/Data/State across users, preferably through
>> SparkContext sharing
>>
>> - Security between Zeppelin and the other systems, not limited to Spark
>> through Kerberos. (@Rick +1)
>>
>>
>>
>> I will share an initial draft of the thoughts I have in mind, in the next
>> couple of days.
>>
>>
>>
>> Thanks,
>>
>> Rohit.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org> wrote:
>>
>> Shabeel, thanks for the feedback about rest api and custom id. that might
>> help avoid multiple rest api calls.
>>
>>
>>
>> Thanks everyone for valuable feedback. Looks like all we're going to the
>> same direction. I have updated wiki.
>>
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>
>> Please take a look.
>>
>>
>>
>> I'm sure there're many missing details in this roadmap. I must say
>> something not on this roadmap doesn't mean community is not working on or
>> can't be included in the Zeppelin. Roadmap represents more like community
>> interest and overall direction.
>>
>> We're not changing roadmap everyday, but that doesn't mean roadmap is set
>> in stone and never be changed. We can improve it continuously.
>>
>>
>>
>> Please feel free to fork the this mail thread for any further discussion
>> on specific subject. (e.g. job scheduling)
>>
>>
>>
>> Thanks,
>>
>> moon
>>
>>
>>
>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>
>> wrote:
>>
>> Also we need better rest api support for creating and fetching the
>> notebooks and paragraphs.
>>
>> for example if I can set custom defined notebookid and paragraphid , we
>> can avoid multiple rest api calls.
>>
>>
>>
>> http://localhost:8080/#/notebook/
>> <notebookid>/paragraph/<paragraphid>?asIframe
>>
>> should return me error if notebook or paragraph deos not exists.
>>
>>
>>
>> and while creating notebook or paragraph I should be able to mention my
>> custom ids.
>>
>>
>>
>> Regards
>>
>> Shabeel
>>
>>
>>
>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
>> wrote:
>>
>> +1 on @rick. quality is really important... I am still encountering bugs
>> consistently
>>
>>
>>
>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <te...@gmail.com>
>> wrote:
>>
>> +1 on @rick
>>
>>
>>
>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com> wrote:
>>
>> I see in the Enterprise section that multi-tenancy will be included, will
>> this have user impersonation too? In this way, the user executing will be
>> the user owning the process.
>>
>>
>>
>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com> wrote:
>>
>>
>>
>> +1
>>
>>
>>
>> Hi Tamas,
>>
>>    Pluggable external visualization is really a GREAT feature to have.
>> I'm looking forward to this :)
>>
>>
>>
>> Regards
>>
>> Shabeel
>>
>>
>>
>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>
>> wrote:
>>
>> Hey,
>>
>>
>>
>> Really promising roadmap.
>>
>>
>>
>> I'd only push more visualization options. I agree built in visualization
>> is needed with limited charting options but I think we also need somehow
>> 'inject' external js visualizations also.
>>
>>
>>
>>
>>
>> For scheduling Zeppelin notebooks  we use
>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
>> the job rest api. It's an enterprise ready and very robust solution
>> right now.
>>
>>
>>
>> *Tamas*
>>
>>
>>
>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>
>> One point to clarify, I don't want to suggest Oozie in specific, I want
>> to think about which features we develop and which ones we integrate
>> external, preferred Apache, technology? We don't think about building our
>> own storage services so why build our own scheduler?
>> Eran
>>
>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>
>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>
>> Now I can see a lot of demands around enterprise level job scheduling.
>> Either external or built-in, I completely agree having enterprise level job
>> scheduling support on the roadmap.
>>
>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>> related issues i can find in our JIRA.
>>
>>
>>
>> @Vinayak
>>
>> Regarding importing notebook from github, Zeppelin has pluggable notebook
>> storage layer (see related package
>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>> So, github notebook sync can be implemented easily.
>>
>>
>>
>> @Shabeel
>>
>> Right, we need better manage management to prevent such OOM.
>>
>> And i think table is one of the most frequently used way of displaying
>> data. So definitely, we'll need more features like filter, sort, etc.
>>
>> After this roadmap discussion, discussion for the next release will
>> follow. Then we'll get idea when those features will be available.
>>
>>
>>
>> @Prasad
>>
>> Thanks for mentioning HA and DR. They're really important subject for
>> enterprise use. Definitely Zeppelin will need to address them.
>>
>> And displaying meta information of notebook on top level page is good
>> idea.
>>
>>
>>
>> It's really great to hear many opinions and ideas.
>>
>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>
>>
>>
>> Thanks,
>>
>> moon
>>
>>
>>
>>
>>
>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>>
>> Hi,
>>
>> For one, I know that there is rudimentary scheduling built into Zeppelin
>> already (at least I fixed a bug in the test for a scheduling feature a few
>> months ago).
>>
>> But another point is, that Zeppelin should also focus on quality,
>> reproduceability and portability.
>>
>> Although this doesn't offer exciting new features, it would make
>> development much easier.
>>
>> Cross-platform testability, Tests that pass when run sequentially,
>> compatibility with Firefox, and many more open issues that make it so much
>> harder to enhance Zeppelin and add features should be addressed soon,
>> preferably before more features are added. Already Zeppelin is suffering -
>> in my opinion - from quite a lot of feature creep, and we should avoid
>> putting in the kitchen sink, at the cost of quality and maintainability.
>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>
>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
>> many clusters, but it's not getting the love it needs, and I wouldn't bet
>> on it, when it comes to integrating scheduling. Instead, any external tool
>> should be able to use the REST-API to trigger executions, if you want
>> external scheduling.
>>
>> So, in conclusion, if we take Moon's list as a list of descending
>> priorities, I fully agree, under the condition that code quality is
>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>> SPNEGO SSO support is what we really want) with user and group rights
>> assignment on the notebook level. We probably also need Knox-integration
>> (ODP-Members looking at integrating Zeppelin should consider contributing
>> this), and integration of something like Spree (
>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>
>> I'm hopeful that soon I can resume contributing some quality-oriented
>> code, to drive this "necessary evil" forward ;)
>>
>>
>>
>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>> I do agree with Vinayak. It need not be coupled with Oozie.
>>
>> Rather one should be able to call it from any scheduler typically used in
>> enterprise level. May be support for BPML.
>>
>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>> specific paragraph within a notebook using REST API should take care of
>> this requirement to some extent.
>>
>> Regards,
>>
>> Sourav
>>
>>
>>
>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>> vinayakagrawal88@gmail.com> wrote:
>>
>> @Eran Witkon,
>>
>> Thanks for the suggestion Eran. I concur with your thought.
>>
>> If Zepplin can be integrated with oozie, that would be wonderful. Users
>> will also be able to leverage their Oozie skills.
>>
>> This would be promising for now.
>>
>> However, in the future Hadoop might not necessarily be installed in Spark
>> Cluster and Oozie (since its installs with Hadoop Distribution) might not
>> be available.
>>
>> So perhaps we should give a thought about this feature for the future.
>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>
>> As Benjamin has iterated, Databrick notebook has this as a core notebook
>> feature.
>>
>>
>>
>> Also, would anybody give any suggestions regarding "sync with github"
>> feature?
>>
>> -Exporting notebook to Github
>>
>> -Importing notebook from Github
>>
>>
>>
>> Thanks
>>
>> Vinayak
>>
>>
>>
>>
>>
>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>> wrote:
>>
>> @*Vinayak Agrawal *I would suggest adding the ability to connect
>> zeppelin to existing scheduling tools\workflow tools such as
>> https://oozie.apache.org/. this requires betters hooks and status
>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>
>>
>>
>>
>>
>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>> vinayakagrawal88@gmail.com> wrote:
>>
>> Moon,
>>
>> The new roadmap looks very promising. I am very happy to see security in
>> the list.
>> I have some suggestions regarding Enterprise Ready features:
>>
>>
>> 1. Job Scheduler - Can this be improved?
>>
>> Currently the scheduler can be used with Cron expression or a pre-set
>> time. But in an enterprise solution, a notebook might be one piece of the
>> workflow. Can we look towards the functionality of scheduling notebook's
>> based on other notebooks finishing their job successfully?
>>
>> This requirement would arise in any ETL workflow, where all the
>> downstream users wait for the ETL notebook to finish successfully. Only
>> after that, other business oriented notebooks can be executed.
>>
>> 2. Importing a notebook - Is there a current requirement or future plan
>> to implement a feature that allows import-notebook-from-github? This would
>> allow users to share notebooks seamlessly.
>>
>> Thanks
>>
>> Vinayak
>>
>>
>>
>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>> Zhong Wang,
>>
>> Right, Folder support would be quite useful. Thanks for the opinion.
>>
>> Hope i can finish the work pr-190
>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>
>>
>>
>> Sourav,
>>
>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>> run paragraph/query concurrently.
>>
>>
>>
>> SparkInterpreter is implemented with FIFO scheduler considering nature of
>> scala compiler. That's why user can not run multiple paragraph concurrently
>> when they work with SparkInterpreter.
>>
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> separate scala compiler so paragraphs run concurrently, while they're in
>> different notebooks.
>>
>> Thanks for the feedback!
>>
>>
>>
>> Best,
>>
>> moon
>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>> wrote:
>>
>> Sourav: I think this newly merged PR can help you
>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>
>>
>>
>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>> Hi Moon,
>>
>> This looks great.
>>
>> My only suggestion would be to include a PR/feature - Support for Running
>> Concurrent paragraphs/queries in Zeppelin.
>>
>> Right now if more than one user tries to run paragraphs in multiple
>> notebooks concurrently through a single Zeppelin instance (and single
>> interpreter instance) the performance is very slow. It is obvious that the
>> queue gets built up within the zeppelin process and interpreter process in
>> that scenario as the time taken to move the status from start to pending
>> and pending to running is very high compared to the actual running time of
>> a paragraph.
>>
>> Without this the multi tenancy support would be meaningless as no one can
>> practically use it in a situation where multiple users are trying to
>> connect to the same instance of Zeppelin (and the related interpreter). A
>> possible solution would be to spawn separate instance of the same
>> interpreter at every notebook/user level.
>>
>> Regards,
>>
>> Sourav
>>
>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>> Hi Zeppelin users and developers,
>>
>>
>>
>> The roadmap we have published at
>>
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>
>> is almost 9 month old, and it doesn't reflect where the community goes
>> anymore. It's time to update.
>>
>>
>>
>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>> conferences and meetings, I could summarize the major interest of users and
>> developers in 7 categories. Enterprise ready, Usability improvement,
>> Pluggability, Documentation, Backend integration, Notebook storage, and
>> Visualization.
>>
>>
>>
>> And i could list related subjects under each categories.
>>
>>
>>    - Enterprise ready
>>
>>
>>    - Authentication
>>
>>
>>    - Shiro authentication ZEPPELIN-548
>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>
>>
>>    - Authorization
>>
>>
>>    - Notebook authorization PR-681
>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>
>>
>>    - Security
>>       - Multi-tenancy
>>       - Stability
>>
>>
>>    - Usability Improvement
>>
>>
>>    - UX improvement
>>       - Better Table data support
>>
>>
>>    - Download data as csv, etc PR-725
>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>
>>
>>    - Featureful table data display (pagenation, etc)
>>
>>
>>    - Pluggability ZEPPELIN-533
>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>
>>
>>    - Pluggable visualization
>>
>>
>>    - Dynamic Interpreter, notebook, visualization loading
>>
>>
>>    - Repository and registry for pluggable components
>>
>>
>>    - Improve documentation
>>
>>
>>    - Improve contents and readability
>>       - more tutorials, examples
>>
>>
>>    - Interpreter
>>
>>
>>    - Generic JDBC Interpreter
>>       - (spark)R Interpreter
>>       - Cluster manager for interpreter (Proposal
>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>       )
>>       - more interpreters
>>
>>
>>    - Notebook storage
>>
>>
>>    - Versioning ZEPPELIN-540
>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>       - more notebook storages
>>
>>
>>    - Visualization
>>
>>
>>    - More visualizations PR-152
>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>
>>
>>    - Customize graph (show/hide label, color, etc)
>>
>> It will help anyone quickly get overall interest of project and the
>> direction. And based on this roadmap, we can discuss and re-define the next
>> release 0.6.0 scope and it's schedule.
>>
>>
>>
>> What do you think? Any feedback would be appreciated.
>>
>>
>>
>> Thanks,
>>
>> moon
>>
>>
>>
>>
>>
>>
>> --
>>
>> Vinayak Agrawal
>>
>>
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>>
>> ~Lord Alfred Tennyson
>>
>>
>>
>>
>> --
>>
>> Vinayak Agrawal
>>
>> Big Data Analytics
>>
>> IBM
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>>
>> ~Lord Alfred Tennyson
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

RE: [DISCUSS] Update Roadmap

Posted by Guilherme Silveira <gu...@gmail.com>.

Is there a predefined release interval,  lets say,  6 months or 1 year,
between one version and another?
Em 23 de mar de 2016 4:10 PM, "Joel Van Veluwen" <
Joel.VanVeluwen@quantium.com.au> escreveu:

> Hi Nikolay,
>
>
>
> I raised this with MapR and there doesn’t appear to be plans to add
> Zeppelin to 5.1
>
>
>
> https://community.mapr.com/message/40332
>
>
>
> We are deploying it manually and everything is pretty stable – but it will
> vary depending on your environment.
>
>
>
> Cheers,
>
>
>
> Joel Van Veluwen
> *QUANTIUM*
> Level 25, 8 Chifley
> 8-12 Chifley Square
> Sydney NSW 2000
>
> T: +61 2 8224 8981
> M: +61 403 153 265
> F: +61 2 9292 6444
>
> W: quantium.com.au <http://www.quantium.com.au>
> ------------------------------
>
> linkedin.com/company/quantium <http://www.linkedin.com/company/quantium>
> facebook.com/QuantiumAustralia <http://www.facebook.com/QuantiumAustralia>
> twitter.com/QuantiumAU <http://www.twitter.com/QuantiumAU>
>
> The contents of this email, including attachments, may be confidential
> information. If you are not the intended recipient, any use, disclosure or
> copying of the information is unauthorised. If you have received this email
> in error, we would be grateful if you would notify us immediately by email
> reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the
> message from your system.
>
>
>
> *From:* Nikolay Voronchikhin [mailto:nvoronchikhin@gmail.com]
> *Sent:* Tuesday, 22 March 2016 11:39 AM
> *To:* users@zeppelin.incubator.apache.org
> *Subject:* Re: [DISCUSS] Update Roadmap
>
>
>
> Hi Zeppelin Users and Developers,
>
>
>
> Do you know if MapR will be adding Zeppelin to its roadmap for the next
> version after MapR 5.1?
>
>
>
> We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
> PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
> notebook.
>
> We are looking for an Apache Project that focuses on a Drill Notebook UI
> that performs better than the Drill Web Console UI itself.
>
>
>
> Sincerely,
>
> *Nikolay Voronchikhin*
>
> *Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
>
> *https://www.linkedin.com/in/nvoronchikhin
> <https://www.linkedin.com/in/nvoronchikhin>*
>
> *E-mail: nvoronchikhin@gmail.com <nv...@gmail.com>*
>
> *Mobile: 951-288-2778 <951-288-2778>*
>
>
>
>
>
> On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rc...@gmail.com>
> wrote:
>
> Dear All,
>
>
>
> I think direction setting is important for Enterprise readiness. I have a
> little bit of an overview of Ambari Views, which is very similar in nature
> to Zeppelin. Please let me explain:
>
>
>
> Hive View - interacts with Hive
>
> Pig View - interacts with Pig
>
> Workflow Designer - interacts with Oozie
>
>
>
> We have a very similar architecture in Zeppelin where we interact with
> these systems through Interpreters. The usage will also be similar, as both
> with interact with Hadoop clusters or in some cases Spark with Yarn on
> HDFS. Our priorities should include:
>
>
>
> - Design & implement for multi-tenancy
>
> - Auditability from Data/State and Lineage perspective
>
> - Ability to share Notebooks/Data/State across users, preferably through
> SparkContext sharing
>
> - Security between Zeppelin and the other systems, not limited to Spark
> through Kerberos. (@Rick +1)
>
>
>
> I will share an initial draft of the thoughts I have in mind, in the next
> couple of days.
>
>
>
> Thanks,
>
> Rohit.
>
>
>
>
>
>
>
> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org> wrote:
>
> Shabeel, thanks for the feedback about rest api and custom id. that might
> help avoid multiple rest api calls.
>
>
>
> Thanks everyone for valuable feedback. Looks like all we're going to the
> same direction. I have updated wiki.
>
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>
> Please take a look.
>
>
>
> I'm sure there're many missing details in this roadmap. I must say
> something not on this roadmap doesn't mean community is not working on or
> can't be included in the Zeppelin. Roadmap represents more like community
> interest and overall direction.
>
> We're not changing roadmap everyday, but that doesn't mean roadmap is set
> in stone and never be changed. We can improve it continuously.
>
>
>
> Please feel free to fork the this mail thread for any further discussion
> on specific subject. (e.g. job scheduling)
>
>
>
> Thanks,
>
> moon
>
>
>
> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>
> wrote:
>
> Also we need better rest api support for creating and fetching the
> notebooks and paragraphs.
>
> for example if I can set custom defined notebookid and paragraphid , we
> can avoid multiple rest api calls.
>
>
>
> http://localhost:8080/#/notebook/
> <notebookid>/paragraph/<paragraphid>?asIframe
>
> should return me error if notebook or paragraph deos not exists.
>
>
>
> and while creating notebook or paragraph I should be able to mention my
> custom ids.
>
>
>
> Regards
>
> Shabeel
>
>
>
> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
> wrote:
>
> +1 on @rick. quality is really important... I am still encountering bugs
> consistently
>
>
>
> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <te...@gmail.com>
> wrote:
>
> +1 on @rick
>
>
>
> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com> wrote:
>
> I see in the Enterprise section that multi-tenancy will be included, will
> this have user impersonation too? In this way, the user executing will be
> the user owning the process.
>
>
>
> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com> wrote:
>
>
>
> +1
>
>
>
> Hi Tamas,
>
>    Pluggable external visualization is really a GREAT feature to have. I'm
> looking forward to this :)
>
>
>
> Regards
>
> Shabeel
>
>
>
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>
> wrote:
>
> Hey,
>
>
>
> Really promising roadmap.
>
>
>
> I'd only push more visualization options. I agree built in visualization
> is needed with limited charting options but I think we also need somehow
> 'inject' external js visualizations also.
>
>
>
>
>
> For scheduling Zeppelin notebooks  we use
>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
> the job rest api. It's an enterprise ready and very robust solution right
> now.
>
>
>
> *Tamas*
>
>
>
> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>
> One point to clarify, I don't want to suggest Oozie in specific, I want to
> think about which features we develop and which ones we integrate external,
> preferred Apache, technology? We don't think about building our own storage
> services so why build our own scheduler?
> Eran
>
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>
> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>
> Now I can see a lot of demands around enterprise level job scheduling.
> Either external or built-in, I completely agree having enterprise level job
> scheduling support on the roadmap.
>
> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
> related issues i can find in our JIRA.
>
>
>
> @Vinayak
>
> Regarding importing notebook from github, Zeppelin has pluggable notebook
> storage layer (see related package
> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
> So, github notebook sync can be implemented easily.
>
>
>
> @Shabeel
>
> Right, we need better manage management to prevent such OOM.
>
> And i think table is one of the most frequently used way of displaying
> data. So definitely, we'll need more features like filter, sort, etc.
>
> After this roadmap discussion, discussion for the next release will
> follow. Then we'll get idea when those features will be available.
>
>
>
> @Prasad
>
> Thanks for mentioning HA and DR. They're really important subject for
> enterprise use. Definitely Zeppelin will need to address them.
>
> And displaying meta information of notebook on top level page is good idea.
>
>
>
> It's really great to hear many opinions and ideas.
>
> And thanks @Rick for sharing valuable view to Zeppelin project.
>
>
>
> Thanks,
>
> moon
>
>
>
>
>
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>
> Hi,
>
> For one, I know that there is rudimentary scheduling built into Zeppelin
> already (at least I fixed a bug in the test for a scheduling feature a few
> months ago).
>
> But another point is, that Zeppelin should also focus on quality,
> reproduceability and portability.
>
> Although this doesn't offer exciting new features, it would make
> development much easier.
>
> Cross-platform testability, Tests that pass when run sequentially,
> compatibility with Firefox, and many more open issues that make it so much
> harder to enhance Zeppelin and add features should be addressed soon,
> preferably before more features are added. Already Zeppelin is suffering -
> in my opinion - from quite a lot of feature creep, and we should avoid
> putting in the kitchen sink, at the cost of quality and maintainability.
> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>
> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
> many clusters, but it's not getting the love it needs, and I wouldn't bet
> on it, when it comes to integrating scheduling. Instead, any external tool
> should be able to use the REST-API to trigger executions, if you want
> external scheduling.
>
> So, in conclusion, if we take Moon's list as a list of descending
> priorities, I fully agree, under the condition that code quality is
> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
> SPNEGO SSO support is what we really want) with user and group rights
> assignment on the notebook level. We probably also need Knox-integration
> (ODP-Members looking at integrating Zeppelin should consider contributing
> this), and integration of something like Spree (
> https://github.com/hammerlab/spree) to be able to profile jobs.
>
> I'm hopeful that soon I can resume contributing some quality-oriented
> code, to drive this "necessary evil" forward ;)
>
>
>
> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
> I do agree with Vinayak. It need not be coupled with Oozie.
>
> Rather one should be able to call it from any scheduler typically used in
> enterprise level. May be support for BPML.
>
> I believe the existing ability to call/execute a Zeppelin Notebook or a
> specific paragraph within a notebook using REST API should take care of
> this requirement to some extent.
>
> Regards,
>
> Sourav
>
>
>
> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
> vinayakagrawal88@gmail.com> wrote:
>
> @Eran Witkon,
>
> Thanks for the suggestion Eran. I concur with your thought.
>
> If Zepplin can be integrated with oozie, that would be wonderful. Users
> will also be able to leverage their Oozie skills.
>
> This would be promising for now.
>
> However, in the future Hadoop might not necessarily be installed in Spark
> Cluster and Oozie (since its installs with Hadoop Distribution) might not
> be available.
>
> So perhaps we should give a thought about this feature for the future.
> Should it depend on oozie or should Zeppelin have its owns scheduling?
>
> As Benjamin has iterated, Databrick notebook has this as a core notebook
> feature.
>
>
>
> Also, would anybody give any suggestions regarding "sync with github"
> feature?
>
> -Exporting notebook to Github
>
> -Importing notebook from Github
>
>
>
> Thanks
>
> Vinayak
>
>
>
>
>
> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com> wrote:
>
> @*Vinayak Agrawal *I would suggest adding the ability to connect zeppelin
> to existing scheduling tools\workflow tools such as
> https://oozie.apache.org/. this requires betters hooks and status
> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>
>
>
>
>
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> vinayakagrawal88@gmail.com> wrote:
>
> Moon,
>
> The new roadmap looks very promising. I am very happy to see security in
> the list.
> I have some suggestions regarding Enterprise Ready features:
>
>
> 1. Job Scheduler - Can this be improved?
>
> Currently the scheduler can be used with Cron expression or a pre-set
> time. But in an enterprise solution, a notebook might be one piece of the
> workflow. Can we look towards the functionality of scheduling notebook's
> based on other notebooks finishing their job successfully?
>
> This requirement would arise in any ETL workflow, where all the downstream
> users wait for the ETL notebook to finish successfully. Only after that,
> other business oriented notebooks can be executed.
>
> 2. Importing a notebook - Is there a current requirement or future plan to
> implement a feature that allows import-notebook-from-github? This would
> allow users to share notebooks seamlessly.
>
> Thanks
>
> Vinayak
>
>
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>
> Zhong Wang,
>
> Right, Folder support would be quite useful. Thanks for the opinion.
>
> Hope i can finish the work pr-190
> <https://github.com/apache/incubator-zeppelin/pull/190>.
>
>
>
> Sourav,
>
> Regarding concurrent running, Zeppelin doesn't have limitation of run
> paragraph/query concurrently. Interpreter can implement it's own scheduling
> policy. For example, SparkSQL interpreter and ShellInterpreter can already
> run paragraph/query concurrently.
>
>
>
> SparkInterpreter is implemented with FIFO scheduler considering nature of
> scala compiler. That's why user can not run multiple paragraph concurrently
> when they work with SparkInterpreter.
>
> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> separate scala compiler so paragraphs run concurrently, while they're in
> different notebooks.
>
> Thanks for the feedback!
>
>
>
> Best,
>
> moon
>
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
> wrote:
>
> Sourav: I think this newly merged PR can help you
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>
>
>
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running
> Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple
> notebooks concurrently through a single Zeppelin instance (and single
> interpreter instance) the performance is very slow. It is obvious that the
> queue gets built up within the zeppelin process and interpreter process in
> that scenario as the time taken to move the status from start to pending
> and pending to running is very high compared to the actual running time of
> a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can
> practically use it in a situation where multiple users are trying to
> connect to the same instance of Zeppelin (and the related interpreter). A
> possible solution would be to spawn separate instance of the same
> interpreter at every notebook/user level.
>
> Regards,
>
> Sourav
>
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>
> Hi Zeppelin users and developers,
>
>
>
> The roadmap we have published at
>
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
>
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
>
>
> And i could list related subjects under each categories.
>
>
>    - Enterprise ready
>
>
>    - Authentication
>
>
>    - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>
>
>    - Authorization
>
>
>    - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>
>
>    - Security
>       - Multi-tenancy
>       - Stability
>
>
>    - Usability Improvement
>
>
>    - UX improvement
>       - Better Table data support
>
>
>    - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>
>
>    - Featureful table data display (pagenation, etc)
>
>
>    - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>
>
>    - Pluggable visualization
>
>
>    - Dynamic Interpreter, notebook, visualization loading
>
>
>    - Repository and registry for pluggable components
>
>
>    - Improve documentation
>
>
>    - Improve contents and readability
>       - more tutorials, examples
>
>
>    - Interpreter
>
>
>    - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>
>
>    - Notebook storage
>
>
>    - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>
>
>    - Visualization
>
>
>    - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>
>
>    - Customize graph (show/hide label, color, etc)
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
>
>
> What do you think? Any feedback would be appreciated.
>
>
>
> Thanks,
>
> moon
>
>
>
>
>
>
> --
>
> Vinayak Agrawal
>
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
>
> ~Lord Alfred Tennyson
>
>
>
>
> --
>
> Vinayak Agrawal
>
> Big Data Analytics
>
> IBM
>
> "To Strive, To Seek, To Find and Not to Yield!"
>
> ~Lord Alfred Tennyson
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

RE: [DISCUSS] Update Roadmap

Posted by Joel Van Veluwen <Jo...@quantium.com.au>.

Hi Nikolay,

I raised this with MapR and there doesn’t appear to be plans to add Zeppelin to 5.1

https://community.mapr.com/message/40332

We are deploying it manually and everything is pretty stable – but it will vary depending on your environment.

Cheers,


Joel Van Veluwen
QUANTIUM
Level 25, 8 Chifley
8-12 Chifley Square
Sydney NSW 2000

T: +61 2 8224 8981
M: +61 403 153 265
F: +61 2 9292 6444

W: quantium.com.au<http://www.quantium.com.au>

________________________________

linkedin.com/company/quantium<http://www.linkedin.com/company/quantium>
facebook.com/QuantiumAustralia<http://www.facebook.com/QuantiumAustralia>
twitter.com/QuantiumAU<http://www.twitter.com/QuantiumAU>

The contents of this email, including attachments, may be confidential information. If you are not the intended recipient, any use, disclosure or copying of the information is unauthorised. If you have received this email in error, we would be grateful if you would notify us immediately by email reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the message from your system.

From: Nikolay Voronchikhin [mailto:nvoronchikhin@gmail.com]
Sent: Tuesday, 22 March 2016 11:39 AM
To: users@zeppelin.incubator.apache.org
Subject: Re: [DISCUSS] Update Roadmap

Hi Zeppelin Users and Developers,

Do you know if MapR will be adding Zeppelin to its roadmap for the next version after MapR 5.1?

We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell, PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL notebook.
We are looking for an Apache Project that focuses on a Drill Notebook UI that performs better than the Drill Web Console UI itself.

Sincerely,
Nikolay Voronchikhin
Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco
https://www.linkedin.com/in/nvoronchikhin
E-mail: nvoronchikhin@gmail.com<ma...@gmail.com>
Mobile: 951-288-2778


On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rc...@gmail.com>> wrote:
Dear All,

I think direction setting is important for Enterprise readiness. I have a little bit of an overview of Ambari Views, which is very similar in nature to Zeppelin. Please let me explain:

Hive View - interacts with Hive
Pig View - interacts with Pig
Workflow Designer - interacts with Oozie

We have a very similar architecture in Zeppelin where we interact with these systems through Interpreters. The usage will also be similar, as both with interact with Hadoop clusters or in some cases Spark with Yarn on HDFS. Our priorities should include:

- Design & implement for multi-tenancy
- Auditability from Data/State and Lineage perspective
- Ability to share Notebooks/Data/State across users, preferably through SparkContext sharing
- Security between Zeppelin and the other systems, not limited to Spark through Kerberos. (@Rick +1)

I will share an initial draft of the thoughts I have in mind, in the next couple of days.

Thanks,
Rohit.



On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org>> wrote:
Shabeel, thanks for the feedback about rest api and custom id. that might help avoid multiple rest api calls.

Thanks everyone for valuable feedback. Looks like all we're going to the same direction. I have updated wiki.
https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
Please take a look.

I'm sure there're many missing details in this roadmap. I must say something not on this roadmap doesn't mean community is not working on or can't be included in the Zeppelin. Roadmap represents more like community interest and overall direction.
We're not changing roadmap everyday, but that doesn't mean roadmap is set in stone and never be changed. We can improve it continuously.

Please feel free to fork the this mail thread for any further discussion on specific subject. (e.g. job scheduling)

Thanks,
moon

On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>> wrote:
Also we need better rest api support for creating and fetching the notebooks and paragraphs.
for example if I can set custom defined notebookid and paragraphid , we can avoid multiple rest api calls.

http://localhost:8080/#/notebook/<notebookid>/paragraph/<paragraphid>?asIframe
should return me error if notebook or paragraph deos not exists.

and while creating notebook or paragraph I should be able to mention my custom ids.

Regards
Shabeel

On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>> wrote:
+1 on @rick. quality is really important... I am still encountering bugs consistently

On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <te...@gmail.com>> wrote:
+1 on @rick

On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com>> wrote:
I see in the Enterprise section that multi-tenancy will be included, will this have user impersonation too? In this way, the user executing will be the user owning the process.

On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com>> wrote:

+1

Hi Tamas,
   Pluggable external visualization is really a GREAT feature to have. I'm looking forward to this :)

Regards
Shabeel

On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>> wrote:
Hey,

Really promising roadmap.

I'd only push more visualization options. I agree built in visualization is needed with limited charting options but I think we also need somehow 'inject' external js visualizations also.


For scheduling Zeppelin notebooks  we use https://github.com/airbnb/airflow<https://github.com/airbnb/airflow> through the job rest api. It's an enterprise ready and very robust solution right now.

Tamas

On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com>> wrote:
One point to clarify, I don't want to suggest Oozie in specific, I want to think about which features we develop and which ones we integrate external, preferred Apache, technology? We don't think about building our own storage services so why build our own scheduler?
Eran
On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org>> wrote:
@Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
Now I can see a lot of demands around enterprise level job scheduling. Either external or built-in, I completely agree having enterprise level job scheduling support on the roadmap.
ZEPPELIN-137<https://issues.apache.org/jira/browse/ZEPPELIN-137>, ZEPPELIN-531<https://issues.apache.org/jira/browse/ZEPPELIN-531> are related issues i can find in our JIRA.

@Vinayak
Regarding importing notebook from github, Zeppelin has pluggable notebook storage layer (see related package<https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>). So, github notebook sync can be implemented easily.

@Shabeel
Right, we need better manage management to prevent such OOM.
And i think table is one of the most frequently used way of displaying data. So definitely, we'll need more features like filter, sort, etc.
After this roadmap discussion, discussion for the next release will follow. Then we'll get idea when those features will be available.

@Prasad
Thanks for mentioning HA and DR. They're really important subject for enterprise use. Definitely Zeppelin will need to address them.
And displaying meta information of notebook on top level page is good idea.

It's really great to hear many opinions and ideas.
And thanks @Rick for sharing valuable view to Zeppelin project.

Thanks,
moon


On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com>> wrote:
Hi,
For one, I know that there is rudimentary scheduling built into Zeppelin already (at least I fixed a bug in the test for a scheduling feature a few months ago).
But another point is, that Zeppelin should also focus on quality, reproduceability and portability.
Although this doesn't offer exciting new features, it would make development much easier.
Cross-platform testability, Tests that pass when run sequentially, compatibility with Firefox, and many more open issues that make it so much harder to enhance Zeppelin and add features should be addressed soon, preferably before more features are added. Already Zeppelin is suffering - in my opinion - from quite a lot of feature creep, and we should avoid putting in the kitchen sink, at the cost of quality and maintainability. Instead modularity (ZEPPELIN-533 in particular) should be targeted.
Oozie, in my opinion, is a dead end - it may de-facto still be in use on many clusters, but it's not getting the love it needs, and I wouldn't bet on it, when it comes to integrating scheduling. Instead, any external tool should be able to use the REST-API to trigger executions, if you want external scheduling.
So, in conclusion, if we take Moon's list as a list of descending priorities, I fully agree, under the condition that code quality is included as a subset of enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO support is what we really want) with user and group rights assignment on the notebook level. We probably also need Knox-integration (ODP-Members looking at integrating Zeppelin should consider contributing this), and integration of something like Spree (https://github.com/hammerlab/spree) to be able to profile jobs.
I'm hopeful that soon I can resume contributing some quality-oriented code, to drive this "necessary evil" forward ;)

On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <so...@gmail.com>> wrote:
I do agree with Vinayak. It need not be coupled with Oozie.
Rather one should be able to call it from any scheduler typically used in enterprise level. May be support for BPML.
I believe the existing ability to call/execute a Zeppelin Notebook or a specific paragraph within a notebook using REST API should take care of this requirement to some extent.
Regards,
Sourav

On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <vi...@gmail.com>> wrote:
@Eran Witkon,
Thanks for the suggestion Eran. I concur with your thought.
If Zepplin can be integrated with oozie, that would be wonderful. Users will also be able to leverage their Oozie skills.
This would be promising for now.
However, in the future Hadoop might not necessarily be installed in Spark Cluster and Oozie (since its installs with Hadoop Distribution) might not be available.
So perhaps we should give a thought about this feature for the future. Should it depend on oozie or should Zeppelin have its owns scheduling?
As Benjamin has iterated, Databrick notebook has this as a core notebook feature.

Also, would anybody give any suggestions regarding "sync with github" feature?
-Exporting notebook to Github
-Importing notebook from Github

Thanks
Vinayak


On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>> wrote:
@Vinayak Agrawal I would suggest adding the ability to connect zeppelin to existing scheduling tools\workflow tools such as  https://oozie.apache.org/. this requires betters hooks and status reporting but doesn't make zeppeling and ETL\scheduler tool by itself/


On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vi...@gmail.com>> wrote:
Moon,
The new roadmap looks very promising. I am very happy to see security in the list.
I have some suggestions regarding Enterprise Ready features:

1. Job Scheduler - Can this be improved?
Currently the scheduler can be used with Cron expression or a pre-set time. But in an enterprise solution, a notebook might be one piece of the workflow. Can we look towards the functionality of scheduling notebook's based on other notebooks finishing their job successfully?
This requirement would arise in any ETL workflow, where all the downstream users wait for the ETL notebook to finish successfully. Only after that, other business oriented notebooks can be executed.
2. Importing a notebook - Is there a current requirement or future plan to implement a feature that allows import-notebook-from-github? This would allow users to share notebooks seamlessly.
Thanks
Vinayak

On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>> wrote:
Zhong Wang,
Right, Folder support would be quite useful. Thanks for the opinion.
Hope i can finish the work pr-190<https://github.com/apache/incubator-zeppelin/pull/190>.

Sourav,
Regarding concurrent running, Zeppelin doesn't have limitation of run paragraph/query concurrently. Interpreter can implement it's own scheduling policy. For example, SparkSQL interpreter and ShellInterpreter can already run paragraph/query concurrently.

SparkInterpreter is implemented with FIFO scheduler considering nature of scala compiler. That's why user can not run multiple paragraph concurrently when they work with SparkInterpreter.
But as Zhong Wang mentioned, pr-703 enables each notebook will have separate scala compiler so paragraphs run concurrently, while they're in different notebooks.
Thanks for the feedback!

Best,
moon
On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>> wrote:
Sourav: I think this newly merged PR can help you https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537

On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <so...@gmail.com>> wrote:
Hi Moon,
This looks great.
My only suggestion would be to include a PR/feature - Support for Running Concurrent paragraphs/queries in Zeppelin.

Right now if more than one user tries to run paragraphs in multiple notebooks concurrently through a single Zeppelin instance (and single interpreter instance) the performance is very slow. It is obvious that the queue gets built up within the zeppelin process and interpreter process in that scenario as the time taken to move the status from start to pending and pending to running is very high compared to the actual running time of a paragraph.
Without this the multi tenancy support would be meaningless as no one can practically use it in a situation where multiple users are trying to connect to the same instance of Zeppelin (and the related interpreter). A possible solution would be to spawn separate instance of the same interpreter at every notebook/user level.
Regards,
Sourav
On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>> wrote:
Hi Zeppelin users and developers,

The roadmap we have published at
https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
is almost 9 month old, and it doesn't reflect where the community goes anymore. It's time to update.

Based on mailing list, jira issues, pullrequests, feedbacks from users, conferences and meetings, I could summarize the major interest of users and developers in 7 categories. Enterprise ready, Usability improvement, Pluggability, Documentation, Backend integration, Notebook storage, and Visualization.

And i could list related subjects under each categories.

  *   Enterprise ready

     *   Authentication

        *   Shiro authentication ZEPPELIN-548<https://issues.apache.org/jira/browse/ZEPPELIN-548>

     *   Authorization

        *   Notebook authorization PR-681<https://github.com/apache/incubator-zeppelin/pull/681>

     *   Security
     *   Multi-tenancy
     *   Stability

  *   Usability Improvement

     *   UX improvement
     *   Better Table data support

        *   Download data as csv, etc PR-725<https://github.com/apache/incubator-zeppelin/pull/725>, PR-714<https://github.com/apache/incubator-zeppelin/pull/714>, PR-6<https://github.com/apache/incubator-zeppelin/pull/6>, PR-89<https://github.com/apache/incubator-zeppelin/pull/89>

        *   Featureful table data display (pagenation, etc)

  *   Pluggability ZEPPELIN-533<https://issues.apache.org/jira/browse/ZEPPELIN-533>

     *   Pluggable visualization

     *   Dynamic Interpreter, notebook, visualization loading

     *   Repository and registry for pluggable components

  *   Improve documentation

     *   Improve contents and readability
     *   more tutorials, examples

  *   Interpreter

     *   Generic JDBC Interpreter
     *   (spark)R Interpreter
     *   Cluster manager for interpreter (Proposal<https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>)
     *   more interpreters

  *   Notebook storage

     *   Versioning ZEPPELIN-540<http://issues.apache.org/jira/browse/ZEPPELIN-540>
     *   more notebook storages

  *   Visualization

     *   More visualizations PR-152<https://github.com/apache/incubator-zeppelin/pull/152>, PR-728<https://github.com/apache/incubator-zeppelin/pull/728>, PR-336<https://github.com/apache/incubator-zeppelin/pull/336>, PR-321<https://github.com/apache/incubator-zeppelin/pull/321>

     *   Customize graph (show/hide label, color, etc)
It will help anyone quickly get overall interest of project and the direction. And based on this roadmap, we can discuss and re-define the next release 0.6.0 scope and it's schedule.

What do you think? Any feedback would be appreciated.

Thanks,
moon




--
Vinayak Agrawal

"To Strive, To Seek, To Find and Not to Yield!"
~Lord Alfred Tennyson



--
Vinayak Agrawal
Big Data Analytics
IBM
"To Strive, To Seek, To Find and Not to Yield!"
~Lord Alfred Tennyson

Re: [DISCUSS] Update Roadmap

Posted by Nikolay Voronchikhin <nv...@gmail.com>.

Hi Zeppelin Users and Developers,

Do you know if MapR will be adding Zeppelin to its roadmap for the next
version after MapR 5.1?

We see in Hue 3.9 that it provides notebooks for R Shell, Python Shell,
PySpark, SparkR, Hive SQL, Impala SQL, and Spark SQL, but no Drill SQL
notebook.
We are looking for an Apache Project that focuses on a Drill Notebook UI
that performs better than the Drill Web Console UI itself.

Sincerely,
*Nikolay Voronchikhin*
*Big Data/Data Warehouse/Data Science/Data Platforms Engineer at Cisco*
https://www.linkedin.com/in/nvoronchikhin
*E-mail: nvoronchikhin@gmail.com <nv...@gmail.com>*
*Mobile: 951-288-2778*


On Mon, Mar 21, 2016 at 2:44 PM, rohit choudhary <rc...@gmail.com> wrote:

> Dear All,
>
> I think direction setting is important for Enterprise readiness. I have a
> little bit of an overview of Ambari Views, which is very similar in nature
> to Zeppelin. Please let me explain:
>
> Hive View - interacts with Hive
> Pig View - interacts with Pig
> Workflow Designer - interacts with Oozie
>
> We have a very similar architecture in Zeppelin where we interact with
> these systems through Interpreters. The usage will also be similar, as both
> with interact with Hadoop clusters or in some cases Spark with Yarn on
> HDFS. Our priorities should include:
>
> - Design & implement for multi-tenancy
> - Auditability from Data/State and Lineage perspective
> - Ability to share Notebooks/Data/State across users, preferably through
> SparkContext sharing
> - Security between Zeppelin and the other systems, not limited to Spark
> through Kerberos. (@Rick +1)
>
> I will share an initial draft of the thoughts I have in mind, in the next
> couple of days.
>
> Thanks,
> Rohit.
>
>
>
> On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org> wrote:
>
>> Shabeel, thanks for the feedback about rest api and custom id. that might
>> help avoid multiple rest api calls.
>>
>> Thanks everyone for valuable feedback. Looks like all we're going to the
>> same direction. I have updated wiki.
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> Please take a look.
>>
>> I'm sure there're many missing details in this roadmap. I must say
>> something not on this roadmap doesn't mean community is not working on or
>> can't be included in the Zeppelin. Roadmap represents more like community
>> interest and overall direction.
>> We're not changing roadmap everyday, but that doesn't mean roadmap is set
>> in stone and never be changed. We can improve it continuously.
>>
>> Please feel free to fork the this mail thread for any further discussion
>> on specific subject. (e.g. job scheduling)
>>
>> Thanks,
>> moon
>>
>> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>
>> wrote:
>>
>>> Also we need better rest api support for creating and fetching the
>>> notebooks and paragraphs.
>>> for example if I can set custom defined notebookid and paragraphid , we
>>> can avoid multiple rest api calls.
>>>
>>> http://localhost:8080/#/notebook/
>>> <notebookid>/paragraph/<paragraphid>?asIframe
>>> should return me error if notebook or paragraph deos not exists.
>>>
>>> and while creating notebook or paragraph I should be able to mention my
>>> custom ids.
>>>
>>> Regards
>>> Shabeel
>>>
>>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>>>> +1 on @rick. quality is really important... I am still encountering
>>>> bugs consistently
>>>>
>>>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <
>>>> tejasrivastav@gmail.com> wrote:
>>>>
>>>>> +1 on @rick
>>>>>
>>>>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I see in the Enterprise section that multi-tenancy will be included,
>>>>>> will this have user impersonation too? In this way, the user executing will
>>>>>> be the user owning the process.
>>>>>>
>>>>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> Hi Tamas,
>>>>>>    Pluggable external visualization is really a GREAT feature to
>>>>>> have. I'm looking forward to this :)
>>>>>>
>>>>>> Regards
>>>>>> Shabeel
>>>>>>
>>>>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <
>>>>>> tamas.szuromi@odigeo.com> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> Really promising roadmap.
>>>>>>>
>>>>>>> I'd only push more visualization options. I agree built in
>>>>>>> visualization is needed with limited charting options but I think we also
>>>>>>> need somehow 'inject' external js visualizations also.
>>>>>>>
>>>>>>>
>>>>>>> For scheduling Zeppelin notebooks  we use
>>>>>>>  https://github.com/airbnb/airflow
>>>>>>> <https://github.com/airbnb/airflow> through the job rest api. It's
>>>>>>> an enterprise ready and very robust solution right now.
>>>>>>>
>>>>>>>
>>>>>>> *Tamas*
>>>>>>>
>>>>>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>>>>>
>>>>>>>> One point to clarify, I don't want to suggest Oozie in specific, I
>>>>>>>> want to think about which features we develop and which ones we integrate
>>>>>>>> external, preferred Apache, technology? We don't think about building our
>>>>>>>> own storage services so why build our own scheduler?
>>>>>>>> Eran
>>>>>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>>>>>> Now I can see a lot of demands around enterprise level job
>>>>>>>>> scheduling. Either external or built-in, I completely agree having
>>>>>>>>> enterprise level job scheduling support on the roadmap.
>>>>>>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>
>>>>>>>>> , ZEPPELIN-531
>>>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-531> are related
>>>>>>>>> issues i can find in our JIRA.
>>>>>>>>>
>>>>>>>>> @Vinayak
>>>>>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>>>>>> notebook storage layer (see related package
>>>>>>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>>>>>>> So, github notebook sync can be implemented easily.
>>>>>>>>>
>>>>>>>>> @Shabeel
>>>>>>>>> Right, we need better manage management to prevent such OOM.
>>>>>>>>> And i think table is one of the most frequently used way of
>>>>>>>>> displaying data. So definitely, we'll need more features like filter, sort,
>>>>>>>>> etc.
>>>>>>>>> After this roadmap discussion, discussion for the next release
>>>>>>>>> will follow. Then we'll get idea when those features will be available.
>>>>>>>>>
>>>>>>>>> @Prasad
>>>>>>>>> Thanks for mentioning HA and DR. They're really important subject
>>>>>>>>> for enterprise use. Definitely Zeppelin will need to address them.
>>>>>>>>> And displaying meta information of notebook on top level page is
>>>>>>>>> good idea.
>>>>>>>>>
>>>>>>>>> It's really great to hear many opinions and ideas.
>>>>>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> moon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> For one, I know that there is rudimentary scheduling built into
>>>>>>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>>>>>>> feature a few months ago).
>>>>>>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>>>>>>> reproduceability and portability.
>>>>>>>>>> Although this doesn't offer exciting new features, it would make
>>>>>>>>>> development much easier.
>>>>>>>>>>
>>>>>>>>>> Cross-platform testability, Tests that pass when run
>>>>>>>>>> sequentially, compatibility with Firefox, and many more open issues that
>>>>>>>>>> make it so much harder to enhance Zeppelin and add features should be
>>>>>>>>>> addressed soon, preferably before more features are added. Already Zeppelin
>>>>>>>>>> is suffering - in my opinion - from quite a lot of feature creep, and we
>>>>>>>>>> should avoid putting in the kitchen sink, at the cost of quality and
>>>>>>>>>> maintainability. Instead modularity (ZEPPELIN-533 in particular) should be
>>>>>>>>>> targeted.
>>>>>>>>>>
>>>>>>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in
>>>>>>>>>> use on many clusters, but it's not getting the love it needs, and I
>>>>>>>>>> wouldn't bet on it, when it comes to integrating scheduling. Instead, any
>>>>>>>>>> external tool should be able to use the REST-API to trigger executions, if
>>>>>>>>>> you want external scheduling.
>>>>>>>>>>
>>>>>>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>>>>>>> priorities, I fully agree, under the condition that code quality is
>>>>>>>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>>>>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>>>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>>>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>>>>>>> this), and integration of something like Spree (
>>>>>>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>>>>>>
>>>>>>>>>> I'm hopeful that soon I can resume contributing some
>>>>>>>>>> quality-oriented code, to drive this "necessary evil" forward ;)
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>>>>>>
>>>>>>>>>>> Rather one should be able to call it from any scheduler
>>>>>>>>>>> typically used in enterprise level. May be support for BPML.
>>>>>>>>>>>
>>>>>>>>>>> I believe the existing ability to call/execute a Zeppelin
>>>>>>>>>>> Notebook or a specific paragraph within a notebook using REST API should
>>>>>>>>>>> take care of this requirement to some extent.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Sourav
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> @Eran Witkon,
>>>>>>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>>>>>>> If Zepplin can be integrated with oozie, that would be
>>>>>>>>>>>> wonderful. Users will also be able to leverage their Oozie skills.
>>>>>>>>>>>> This would be promising for now.
>>>>>>>>>>>> However, in the future Hadoop might not necessarily be
>>>>>>>>>>>> installed in Spark Cluster and Oozie (since its installs with Hadoop
>>>>>>>>>>>> Distribution) might not be available.
>>>>>>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>>>>>>> scheduling?
>>>>>>>>>>>>
>>>>>>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>>>>>>> notebook feature.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>>>>>>> github" feature?
>>>>>>>>>>>> -Exporting notebook to Github
>>>>>>>>>>>> -Importing notebook from Github
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Vinayak
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <
>>>>>>>>>>>> eranwitkon@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to
>>>>>>>>>>>>> connect zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>>>>>>> https://oozie.apache.org/. this requires betters hooks and
>>>>>>>>>>>>> status reporting but doesn't make zeppeling and ETL\scheduler tool by
>>>>>>>>>>>>> itself/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Moon,
>>>>>>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>>>>>>> security in the list.
>>>>>>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>>>>>>> This requirement would arise in any ETL workflow, where all
>>>>>>>>>>>>>> the downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. Importing a notebook - Is there a current requirement or
>>>>>>>>>>>>>> future plan to implement a feature that allows import-notebook-from-github?
>>>>>>>>>>>>>> This would allow users to share notebooks seamlessly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Vinayak
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <
>>>>>>>>>>>>>> moon@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Zhong Wang,
>>>>>>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>>>>>>> opinion.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sourav,
>>>>>>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have
>>>>>>>>>>>>>>> limitation of run paragraph/query concurrently. Interpreter can implement
>>>>>>>>>>>>>>> it's own scheduling policy. For example, SparkSQL interpreter and
>>>>>>>>>>>>>>> ShellInterpreter can already run paragraph/query concurrently.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler
>>>>>>>>>>>>>>> considering nature of scala compiler. That's why user can not run multiple
>>>>>>>>>>>>>>> paragraph concurrently when they work with SparkInterpreter.
>>>>>>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook
>>>>>>>>>>>>>>> will have separate scala compiler so paragraphs run concurrently, while
>>>>>>>>>>>>>>> they're in different notebooks.
>>>>>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>>>>>>> wangzhong.neu@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My only suggestion would be to include a PR/feature -
>>>>>>>>>>>>>>>>> Support for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Without this the multi tenancy support would be
>>>>>>>>>>>>>>>>> meaningless as no one can practically use it in a situation where multiple
>>>>>>>>>>>>>>>>> users are trying to connect to the same instance of Zeppelin (and the
>>>>>>>>>>>>>>>>> related interpreter). A possible solution would be to spawn separate
>>>>>>>>>>>>>>>>> instance of the same interpreter at every notebook/user level.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <
>>>>>>>>>>>>>>>>> moon@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests,
>>>>>>>>>>>>>>>>>> feedbacks from users, conferences and meetings, I could summarize the major
>>>>>>>>>>>>>>>>>> interest of users and developers in 7 categories. Enterprise ready,
>>>>>>>>>>>>>>>>>> Usability improvement, Pluggability, Documentation, Backend integration,
>>>>>>>>>>>>>>>>>> Notebook storage, and Visualization.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>>>>>>       )
>>>>>>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It will help anyone quickly get overall interest of
>>>>>>>>>>>>>>>>>> project and the direction. And based on this roadmap, we can discuss and
>>>>>>>>>>>>>>>>>> re-define the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>>> Big Data Analytics
>>>>>>>>>>>> IBM
>>>>>>>>>>>>
>>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>

Re: [DISCUSS] Update Roadmap

Posted by rohit choudhary <rc...@gmail.com>.

Dear All,

I think direction setting is important for Enterprise readiness. I have a
little bit of an overview of Ambari Views, which is very similar in nature
to Zeppelin. Please let me explain:

Hive View - interacts with Hive
Pig View - interacts with Pig
Workflow Designer - interacts with Oozie

We have a very similar architecture in Zeppelin where we interact with
these systems through Interpreters. The usage will also be similar, as both
with interact with Hadoop clusters or in some cases Spark with Yarn on
HDFS. Our priorities should include:

- Design & implement for multi-tenancy
- Auditability from Data/State and Lineage perspective
- Ability to share Notebooks/Data/State across users, preferably through
SparkContext sharing
- Security between Zeppelin and the other systems, not limited to Spark
through Kerberos. (@Rick +1)

I will share an initial draft of the thoughts I have in mind, in the next
couple of days.

Thanks,
Rohit.



On Thu, Mar 3, 2016 at 7:54 AM, moon soo Lee <mo...@apache.org> wrote:

> Shabeel, thanks for the feedback about rest api and custom id. that might
> help avoid multiple rest api calls.
>
> Thanks everyone for valuable feedback. Looks like all we're going to the
> same direction. I have updated wiki.
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> Please take a look.
>
> I'm sure there're many missing details in this roadmap. I must say
> something not on this roadmap doesn't mean community is not working on or
> can't be included in the Zeppelin. Roadmap represents more like community
> interest and overall direction.
> We're not changing roadmap everyday, but that doesn't mean roadmap is set
> in stone and never be changed. We can improve it continuously.
>
> Please feel free to fork the this mail thread for any further discussion
> on specific subject. (e.g. job scheduling)
>
> Thanks,
> moon
>
> On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com>
> wrote:
>
>> Also we need better rest api support for creating and fetching the
>> notebooks and paragraphs.
>> for example if I can set custom defined notebookid and paragraphid , we
>> can avoid multiple rest api calls.
>>
>> http://localhost:8080/#/notebook/
>> <notebookid>/paragraph/<paragraphid>?asIframe
>> should return me error if notebook or paragraph deos not exists.
>>
>> and while creating notebook or paragraph I should be able to mention my
>> custom ids.
>>
>> Regards
>> Shabeel
>>
>> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
>> wrote:
>>
>>> +1 on @rick. quality is really important... I am still encountering bugs
>>> consistently
>>>
>>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <tejasrivastav@gmail.com
>>> > wrote:
>>>
>>>> +1 on @rick
>>>>
>>>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com>
>>>> wrote:
>>>>
>>>>> I see in the Enterprise section that multi-tenancy will be included,
>>>>> will this have user impersonation too? In this way, the user executing will
>>>>> be the user owning the process.
>>>>>
>>>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> +1
>>>>>
>>>>> Hi Tamas,
>>>>>    Pluggable external visualization is really a GREAT feature to
>>>>> have. I'm looking forward to this :)
>>>>>
>>>>> Regards
>>>>> Shabeel
>>>>>
>>>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <
>>>>> tamas.szuromi@odigeo.com> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> Really promising roadmap.
>>>>>>
>>>>>> I'd only push more visualization options. I agree built in
>>>>>> visualization is needed with limited charting options but I think we also
>>>>>> need somehow 'inject' external js visualizations also.
>>>>>>
>>>>>>
>>>>>> For scheduling Zeppelin notebooks  we use
>>>>>>  https://github.com/airbnb/airflow
>>>>>> <https://github.com/airbnb/airflow> through the job rest api. It's
>>>>>> an enterprise ready and very robust solution right now.
>>>>>>
>>>>>>
>>>>>> *Tamas*
>>>>>>
>>>>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>>>>
>>>>>>> One point to clarify, I don't want to suggest Oozie in specific, I
>>>>>>> want to think about which features we develop and which ones we integrate
>>>>>>> external, preferred Apache, technology? We don't think about building our
>>>>>>> own storage services so why build our own scheduler?
>>>>>>> Eran
>>>>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>>>>>
>>>>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>>>>> Now I can see a lot of demands around enterprise level job
>>>>>>>> scheduling. Either external or built-in, I completely agree having
>>>>>>>> enterprise level job scheduling support on the roadmap.
>>>>>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>>>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>>>>>> related issues i can find in our JIRA.
>>>>>>>>
>>>>>>>> @Vinayak
>>>>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>>>>> notebook storage layer (see related package
>>>>>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>>>>>> So, github notebook sync can be implemented easily.
>>>>>>>>
>>>>>>>> @Shabeel
>>>>>>>> Right, we need better manage management to prevent such OOM.
>>>>>>>> And i think table is one of the most frequently used way of
>>>>>>>> displaying data. So definitely, we'll need more features like filter, sort,
>>>>>>>> etc.
>>>>>>>> After this roadmap discussion, discussion for the next release will
>>>>>>>> follow. Then we'll get idea when those features will be available.
>>>>>>>>
>>>>>>>> @Prasad
>>>>>>>> Thanks for mentioning HA and DR. They're really important subject
>>>>>>>> for enterprise use. Definitely Zeppelin will need to address them.
>>>>>>>> And displaying meta information of notebook on top level page is
>>>>>>>> good idea.
>>>>>>>>
>>>>>>>> It's really great to hear many opinions and ideas.
>>>>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> moon
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> For one, I know that there is rudimentary scheduling built into
>>>>>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>>>>>> feature a few months ago).
>>>>>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>>>>>> reproduceability and portability.
>>>>>>>>> Although this doesn't offer exciting new features, it would make
>>>>>>>>> development much easier.
>>>>>>>>>
>>>>>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>>>>>> compatibility with Firefox, and many more open issues that make it so much
>>>>>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>>>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>>>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>>>>>
>>>>>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in
>>>>>>>>> use on many clusters, but it's not getting the love it needs, and I
>>>>>>>>> wouldn't bet on it, when it comes to integrating scheduling. Instead, any
>>>>>>>>> external tool should be able to use the REST-API to trigger executions, if
>>>>>>>>> you want external scheduling.
>>>>>>>>>
>>>>>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>>>>>> priorities, I fully agree, under the condition that code quality is
>>>>>>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>>>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>>>>>> this), and integration of something like Spree (
>>>>>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>>>>>
>>>>>>>>> I'm hopeful that soon I can resume contributing some
>>>>>>>>> quality-oriented code, to drive this "necessary evil" forward ;)
>>>>>>>>>
>>>>>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>>>>>
>>>>>>>>>> Rather one should be able to call it from any scheduler typically
>>>>>>>>>> used in enterprise level. May be support for BPML.
>>>>>>>>>>
>>>>>>>>>> I believe the existing ability to call/execute a Zeppelin
>>>>>>>>>> Notebook or a specific paragraph within a notebook using REST API should
>>>>>>>>>> take care of this requirement to some extent.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Sourav
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> @Eran Witkon,
>>>>>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>>>>>> If Zepplin can be integrated with oozie, that would be
>>>>>>>>>>> wonderful. Users will also be able to leverage their Oozie skills.
>>>>>>>>>>> This would be promising for now.
>>>>>>>>>>> However, in the future Hadoop might not necessarily be installed
>>>>>>>>>>> in Spark Cluster and Oozie (since its installs with Hadoop Distribution)
>>>>>>>>>>> might not be available.
>>>>>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>>>>>> scheduling?
>>>>>>>>>>>
>>>>>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>>>>>> notebook feature.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>>>>>> github" feature?
>>>>>>>>>>> -Exporting notebook to Github
>>>>>>>>>>> -Importing notebook from Github
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Vinayak
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <
>>>>>>>>>>> eranwitkon@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>>>>>> https://oozie.apache.org/. this requires betters hooks and
>>>>>>>>>>>> status reporting but doesn't make zeppeling and ETL\scheduler tool by
>>>>>>>>>>>> itself/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Moon,
>>>>>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>>>>>> security in the list.
>>>>>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>>>>>> This requirement would arise in any ETL workflow, where all
>>>>>>>>>>>>> the downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Importing a notebook - Is there a current requirement or
>>>>>>>>>>>>> future plan to implement a feature that allows import-notebook-from-github?
>>>>>>>>>>>>> This would allow users to share notebooks seamlessly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Vinayak
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <
>>>>>>>>>>>>> moon@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Zhong Wang,
>>>>>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>>>>>> opinion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sourav,
>>>>>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have
>>>>>>>>>>>>>> limitation of run paragraph/query concurrently. Interpreter can implement
>>>>>>>>>>>>>> it's own scheduling policy. For example, SparkSQL interpreter and
>>>>>>>>>>>>>> ShellInterpreter can already run paragraph/query concurrently.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler
>>>>>>>>>>>>>> considering nature of scala compiler. That's why user can not run multiple
>>>>>>>>>>>>>> paragraph concurrently when they work with SparkInterpreter.
>>>>>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook
>>>>>>>>>>>>>> will have separate scala compiler so paragraphs run concurrently, while
>>>>>>>>>>>>>> they're in different notebooks.
>>>>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>>>>>> wangzhong.neu@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My only suggestion would be to include a PR/feature -
>>>>>>>>>>>>>>>> Support for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Without this the multi tenancy support would be meaningless
>>>>>>>>>>>>>>>> as no one can practically use it in a situation where multiple users are
>>>>>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related
>>>>>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate instance of
>>>>>>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <
>>>>>>>>>>>>>>>> moon@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests,
>>>>>>>>>>>>>>>>> feedbacks from users, conferences and meetings, I could summarize the major
>>>>>>>>>>>>>>>>> interest of users and developers in 7 categories. Enterprise ready,
>>>>>>>>>>>>>>>>> Usability improvement, Pluggability, Documentation, Backend integration,
>>>>>>>>>>>>>>>>> Notebook storage, and Visualization.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>>>>>       )
>>>>>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It will help anyone quickly get overall interest of
>>>>>>>>>>>>>>>>> project and the direction. And based on this roadmap, we can discuss and
>>>>>>>>>>>>>>>>> re-define the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>> Big Data Analytics
>>>>>>>>>>> IBM
>>>>>>>>>>>
>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>

Re: [DISCUSS] Update Roadmap

Posted by moon soo Lee <mo...@apache.org>.

Shabeel, thanks for the feedback about rest api and custom id. that might
help avoid multiple rest api calls.

Thanks everyone for valuable feedback. Looks like all we're going to the
same direction. I have updated wiki.
https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
Please take a look.

I'm sure there're many missing details in this roadmap. I must say
something not on this roadmap doesn't mean community is not working on or
can't be included in the Zeppelin. Roadmap represents more like community
interest and overall direction.
We're not changing roadmap everyday, but that doesn't mean roadmap is set
in stone and never be changed. We can improve it continuously.

Please feel free to fork the this mail thread for any further discussion on
specific subject. (e.g. job scheduling)

Thanks,
moon

On Wed, Mar 2, 2016 at 12:31 AM Shabeel Syed <sh...@gmail.com> wrote:

> Also we need better rest api support for creating and fetching the
> notebooks and paragraphs.
> for example if I can set custom defined notebookid and paragraphid , we
> can avoid multiple rest api calls.
>
> http://localhost:8080/#/notebook/
> <notebookid>/paragraph/<paragraphid>?asIframe
> should return me error if notebook or paragraph deos not exists.
>
> and while creating notebook or paragraph I should be able to mention my
> custom ids.
>
> Regards
> Shabeel
>
> On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com>
> wrote:
>
>> +1 on @rick. quality is really important... I am still encountering bugs
>> consistently
>>
>> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <te...@gmail.com>
>> wrote:
>>
>>> +1 on @rick
>>>
>>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com> wrote:
>>>
>>>> I see in the Enterprise section that multi-tenancy will be included,
>>>> will this have user impersonation too? In this way, the user executing will
>>>> be the user owning the process.
>>>>
>>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com>
>>>> wrote:
>>>>
>>>> +1
>>>>
>>>> Hi Tamas,
>>>>    Pluggable external visualization is really a GREAT feature to have.
>>>> I'm looking forward to this :)
>>>>
>>>> Regards
>>>> Shabeel
>>>>
>>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szuromi@odigeo.com
>>>> > wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> Really promising roadmap.
>>>>>
>>>>> I'd only push more visualization options. I agree built in
>>>>> visualization is needed with limited charting options but I think we also
>>>>> need somehow 'inject' external js visualizations also.
>>>>>
>>>>>
>>>>> For scheduling Zeppelin notebooks  we use
>>>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow>
>>>>>  through the job rest api. It's an enterprise ready and very robust
>>>>> solution right now.
>>>>>
>>>>>
>>>>> *Tamas*
>>>>>
>>>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>>>
>>>>>> One point to clarify, I don't want to suggest Oozie in specific, I
>>>>>> want to think about which features we develop and which ones we integrate
>>>>>> external, preferred Apache, technology? We don't think about building our
>>>>>> own storage services so why build our own scheduler?
>>>>>> Eran
>>>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>>>>
>>>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>>>> Now I can see a lot of demands around enterprise level job
>>>>>>> scheduling. Either external or built-in, I completely agree having
>>>>>>> enterprise level job scheduling support on the roadmap.
>>>>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>>>>> related issues i can find in our JIRA.
>>>>>>>
>>>>>>> @Vinayak
>>>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>>>> notebook storage layer (see related package
>>>>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>>>>> So, github notebook sync can be implemented easily.
>>>>>>>
>>>>>>> @Shabeel
>>>>>>> Right, we need better manage management to prevent such OOM.
>>>>>>> And i think table is one of the most frequently used way of
>>>>>>> displaying data. So definitely, we'll need more features like filter, sort,
>>>>>>> etc.
>>>>>>> After this roadmap discussion, discussion for the next release will
>>>>>>> follow. Then we'll get idea when those features will be available.
>>>>>>>
>>>>>>> @Prasad
>>>>>>> Thanks for mentioning HA and DR. They're really important subject
>>>>>>> for enterprise use. Definitely Zeppelin will need to address them.
>>>>>>> And displaying meta information of notebook on top level page is
>>>>>>> good idea.
>>>>>>>
>>>>>>> It's really great to hear many opinions and ideas.
>>>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> moon
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> For one, I know that there is rudimentary scheduling built into
>>>>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>>>>> feature a few months ago).
>>>>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>>>>> reproduceability and portability.
>>>>>>>> Although this doesn't offer exciting new features, it would make
>>>>>>>> development much easier.
>>>>>>>>
>>>>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>>>>> compatibility with Firefox, and many more open issues that make it so much
>>>>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>>>>
>>>>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in
>>>>>>>> use on many clusters, but it's not getting the love it needs, and I
>>>>>>>> wouldn't bet on it, when it comes to integrating scheduling. Instead, any
>>>>>>>> external tool should be able to use the REST-API to trigger executions, if
>>>>>>>> you want external scheduling.
>>>>>>>>
>>>>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>>>>> priorities, I fully agree, under the condition that code quality is
>>>>>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>>>>> this), and integration of something like Spree (
>>>>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>>>>
>>>>>>>> I'm hopeful that soon I can resume contributing some
>>>>>>>> quality-oriented code, to drive this "necessary evil" forward ;)
>>>>>>>>
>>>>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>>>>
>>>>>>>>> Rather one should be able to call it from any scheduler typically
>>>>>>>>> used in enterprise level. May be support for BPML.
>>>>>>>>>
>>>>>>>>> I believe the existing ability to call/execute a Zeppelin Notebook
>>>>>>>>> or a specific paragraph within a notebook using REST API should take care
>>>>>>>>> of this requirement to some extent.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Sourav
>>>>>>>>>
>>>>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> @Eran Witkon,
>>>>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>>>>>> This would be promising for now.
>>>>>>>>>> However, in the future Hadoop might not necessarily be installed
>>>>>>>>>> in Spark Cluster and Oozie (since its installs with Hadoop Distribution)
>>>>>>>>>> might not be available.
>>>>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>>>>> scheduling?
>>>>>>>>>>
>>>>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>>>>> notebook feature.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>>>>> github" feature?
>>>>>>>>>> -Exporting notebook to Github
>>>>>>>>>> -Importing notebook from Github
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Vinayak
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <
>>>>>>>>>> eranwitkon@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>>>>> https://oozie.apache.org/. this requires betters hooks and
>>>>>>>>>>> status reporting but doesn't make zeppeling and ETL\scheduler tool by
>>>>>>>>>>> itself/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Moon,
>>>>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>>>>> security in the list.
>>>>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Importing a notebook - Is there a current requirement or
>>>>>>>>>>>> future plan to implement a feature that allows import-notebook-from-github?
>>>>>>>>>>>> This would allow users to share notebooks seamlessly.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Vinayak
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <moon@apache.org
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Zhong Wang,
>>>>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>>>>> opinion.
>>>>>>>>>>>>>
>>>>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Sourav,
>>>>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation
>>>>>>>>>>>>> of run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and ShellInterpreter
>>>>>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>>>>>
>>>>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler
>>>>>>>>>>>>> considering nature of scala compiler. That's why user can not run multiple
>>>>>>>>>>>>> paragraph concurrently when they work with SparkInterpreter.
>>>>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while they're
>>>>>>>>>>>>> in different notebooks.
>>>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> moon
>>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>>>>> wangzhong.neu@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My only suggestion would be to include a PR/feature -
>>>>>>>>>>>>>>> Support for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Without this the multi tenancy support would be meaningless
>>>>>>>>>>>>>>> as no one can practically use it in a situation where multiple users are
>>>>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related
>>>>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate instance of
>>>>>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <
>>>>>>>>>>>>>>> moon@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>>>>>> from users, conferences and meetings, I could summarize the major interest
>>>>>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>>>>       )
>>>>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It will help anyone quickly get overall interest of project
>>>>>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss and re-define
>>>>>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>> Big Data Analytics
>>>>>>>>>> IBM
>>>>>>>>>>
>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>>
>>
>

Re: [DISCUSS] Update Roadmap

Posted by Shabeel Syed <sh...@gmail.com>.

Also we need better rest api support for creating and fetching the
notebooks and paragraphs.
for example if I can set custom defined notebookid and paragraphid , we can
avoid multiple rest api calls.

http://localhost:8080/#/notebook/
<notebookid>/paragraph/<paragraphid>?asIframe
should return me error if notebook or paragraph deos not exists.

and while creating notebook or paragraph I should be able to mention my
custom ids.

Regards
Shabeel

On Wed, Mar 2, 2016 at 11:55 AM, Zhong Wang <wa...@gmail.com> wrote:

> +1 on @rick. quality is really important... I am still encountering bugs
> consistently
>
> On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <te...@gmail.com>
> wrote:
>
>> +1 on @rick
>>
>> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com> wrote:
>>
>>> I see in the Enterprise section that multi-tenancy will be included,
>>> will this have user impersonation too? In this way, the user executing will
>>> be the user owning the process.
>>>
>>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com> wrote:
>>>
>>> +1
>>>
>>> Hi Tamas,
>>>    Pluggable external visualization is really a GREAT feature to have.
>>> I'm looking forward to this :)
>>>
>>> Regards
>>> Shabeel
>>>
>>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> Really promising roadmap.
>>>>
>>>> I'd only push more visualization options. I agree built in
>>>> visualization is needed with limited charting options but I think we also
>>>> need somehow 'inject' external js visualizations also.
>>>>
>>>>
>>>> For scheduling Zeppelin notebooks  we use
>>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
>>>> the job rest api. It's an enterprise ready and very robust solution
>>>> right now.
>>>>
>>>>
>>>> *Tamas*
>>>>
>>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>>
>>>>> One point to clarify, I don't want to suggest Oozie in specific, I
>>>>> want to think about which features we develop and which ones we integrate
>>>>> external, preferred Apache, technology? We don't think about building our
>>>>> own storage services so why build our own scheduler?
>>>>> Eran
>>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>>> Now I can see a lot of demands around enterprise level job
>>>>>> scheduling. Either external or built-in, I completely agree having
>>>>>> enterprise level job scheduling support on the roadmap.
>>>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>>>> related issues i can find in our JIRA.
>>>>>>
>>>>>> @Vinayak
>>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>>> notebook storage layer (see related package
>>>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>>>> So, github notebook sync can be implemented easily.
>>>>>>
>>>>>> @Shabeel
>>>>>> Right, we need better manage management to prevent such OOM.
>>>>>> And i think table is one of the most frequently used way of
>>>>>> displaying data. So definitely, we'll need more features like filter, sort,
>>>>>> etc.
>>>>>> After this roadmap discussion, discussion for the next release will
>>>>>> follow. Then we'll get idea when those features will be available.
>>>>>>
>>>>>> @Prasad
>>>>>> Thanks for mentioning HA and DR. They're really important subject for
>>>>>> enterprise use. Definitely Zeppelin will need to address them.
>>>>>> And displaying meta information of notebook on top level page is good
>>>>>> idea.
>>>>>>
>>>>>> It's really great to hear many opinions and ideas.
>>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> For one, I know that there is rudimentary scheduling built into
>>>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>>>> feature a few months ago).
>>>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>>>> reproduceability and portability.
>>>>>>> Although this doesn't offer exciting new features, it would make
>>>>>>> development much easier.
>>>>>>>
>>>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>>>> compatibility with Firefox, and many more open issues that make it so much
>>>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>>>
>>>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in
>>>>>>> use on many clusters, but it's not getting the love it needs, and I
>>>>>>> wouldn't bet on it, when it comes to integrating scheduling. Instead, any
>>>>>>> external tool should be able to use the REST-API to trigger executions, if
>>>>>>> you want external scheduling.
>>>>>>>
>>>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>>>> priorities, I fully agree, under the condition that code quality is
>>>>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>>>> this), and integration of something like Spree (
>>>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>>>
>>>>>>> I'm hopeful that soon I can resume contributing some
>>>>>>> quality-oriented code, to drive this "necessary evil" forward ;)
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>
>>>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>>>
>>>>>>>> Rather one should be able to call it from any scheduler typically
>>>>>>>> used in enterprise level. May be support for BPML.
>>>>>>>>
>>>>>>>> I believe the existing ability to call/execute a Zeppelin Notebook
>>>>>>>> or a specific paragraph within a notebook using REST API should take care
>>>>>>>> of this requirement to some extent.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sourav
>>>>>>>>
>>>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> @Eran Witkon,
>>>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>>>>> This would be promising for now.
>>>>>>>>> However, in the future Hadoop might not necessarily be installed
>>>>>>>>> in Spark Cluster and Oozie (since its installs with Hadoop Distribution)
>>>>>>>>> might not be available.
>>>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>>>> scheduling?
>>>>>>>>>
>>>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>>>> notebook feature.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>>>> github" feature?
>>>>>>>>> -Exporting notebook to Github
>>>>>>>>> -Importing notebook from Github
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Vinayak
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwitkon@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>>>> https://oozie.apache.org/. this requires betters hooks and
>>>>>>>>>> status reporting but doesn't make zeppeling and ETL\scheduler tool by
>>>>>>>>>> itself/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Moon,
>>>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>>>> security in the list.
>>>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>>>
>>>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>>>
>>>>>>>>>>> 2. Importing a notebook - Is there a current requirement or
>>>>>>>>>>> future plan to implement a feature that allows import-notebook-from-github?
>>>>>>>>>>> This would allow users to share notebooks seamlessly.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Vinayak
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Zhong Wang,
>>>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>>>> opinion.
>>>>>>>>>>>>
>>>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Sourav,
>>>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation
>>>>>>>>>>>> of run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and ShellInterpreter
>>>>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>>>>
>>>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>>>>>> nature of scala compiler. That's why user can not run multiple paragraph
>>>>>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while they're
>>>>>>>>>>>> in different notebooks.
>>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> moon
>>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>>>> wangzhong.neu@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support
>>>>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without this the multi tenancy support would be meaningless
>>>>>>>>>>>>>> as no one can practically use it in a situation where multiple users are
>>>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related
>>>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate instance of
>>>>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <
>>>>>>>>>>>>>> moon@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>>>>> from users, conferences and meetings, I could summarize the major interest
>>>>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>>>       )
>>>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It will help anyone quickly get overall interest of project
>>>>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss and re-define
>>>>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Vinayak Agrawal
>>>>>>>>> Big Data Analytics
>>>>>>>>> IBM
>>>>>>>>>
>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>
>>>
>

Re: [DISCUSS] Update Roadmap

Posted by Zhong Wang <wa...@gmail.com>.

+1 on @rick. quality is really important... I am still encountering bugs
consistently

On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV <te...@gmail.com>
wrote:

> +1 on @rick
>
> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com> wrote:
>
>> I see in the Enterprise section that multi-tenancy will be included, will
>> this have user impersonation too? In this way, the user executing will be
>> the user owning the process.
>>
>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com> wrote:
>>
>> +1
>>
>> Hi Tamas,
>>    Pluggable external visualization is really a GREAT feature to have.
>> I'm looking forward to this :)
>>
>> Regards
>> Shabeel
>>
>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>
>> wrote:
>>
>>> Hey,
>>>
>>> Really promising roadmap.
>>>
>>> I'd only push more visualization options. I agree built in
>>> visualization is needed with limited charting options but I think we also
>>> need somehow 'inject' external js visualizations also.
>>>
>>>
>>> For scheduling Zeppelin notebooks  we use
>>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
>>> the job rest api. It's an enterprise ready and very robust solution
>>> right now.
>>>
>>>
>>> *Tamas*
>>>
>>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>>
>>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>>> to think about which features we develop and which ones we integrate
>>>> external, preferred Apache, technology? We don't think about building our
>>>> own storage services so why build our own scheduler?
>>>> Eran
>>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>>> Now I can see a lot of demands around enterprise level job scheduling.
>>>>> Either external or built-in, I completely agree having enterprise level job
>>>>> scheduling support on the roadmap.
>>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>>> related issues i can find in our JIRA.
>>>>>
>>>>> @Vinayak
>>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>>> notebook storage layer (see related package
>>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>>> So, github notebook sync can be implemented easily.
>>>>>
>>>>> @Shabeel
>>>>> Right, we need better manage management to prevent such OOM.
>>>>> And i think table is one of the most frequently used way of displaying
>>>>> data. So definitely, we'll need more features like filter, sort, etc.
>>>>> After this roadmap discussion, discussion for the next release will
>>>>> follow. Then we'll get idea when those features will be available.
>>>>>
>>>>> @Prasad
>>>>> Thanks for mentioning HA and DR. They're really important subject for
>>>>> enterprise use. Definitely Zeppelin will need to address them.
>>>>> And displaying meta information of notebook on top level page is good
>>>>> idea.
>>>>>
>>>>> It's really great to hear many opinions and ideas.
>>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> For one, I know that there is rudimentary scheduling built into
>>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>>> feature a few months ago).
>>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>>> reproduceability and portability.
>>>>>> Although this doesn't offer exciting new features, it would make
>>>>>> development much easier.
>>>>>>
>>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>>> compatibility with Firefox, and many more open issues that make it so much
>>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>>
>>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>>>>>> on many clusters, but it's not getting the love it needs, and I wouldn't
>>>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>>>> tool should be able to use the REST-API to trigger executions, if you want
>>>>>> external scheduling.
>>>>>>
>>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>>> priorities, I fully agree, under the condition that code quality is
>>>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>>> this), and integration of something like Spree (
>>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>>
>>>>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>>>>> code, to drive this "necessary evil" forward ;)
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>
>>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>>
>>>>>>> Rather one should be able to call it from any scheduler typically
>>>>>>> used in enterprise level. May be support for BPML.
>>>>>>>
>>>>>>> I believe the existing ability to call/execute a Zeppelin Notebook
>>>>>>> or a specific paragraph within a notebook using REST API should take care
>>>>>>> of this requirement to some extent.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Sourav
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>
>>>>>>>> @Eran Witkon,
>>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>>>> This would be promising for now.
>>>>>>>> However, in the future Hadoop might not necessarily be installed in
>>>>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>>>>>>> not be available.
>>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>>> scheduling?
>>>>>>>>
>>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>>> notebook feature.
>>>>>>>>
>>>>>>>>
>>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>>> github" feature?
>>>>>>>> -Exporting notebook to Github
>>>>>>>> -Importing notebook from Github
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Vinayak
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Moon,
>>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>>> security in the list.
>>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>>
>>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>>
>>>>>>>>>> 2. Importing a notebook - Is there a current requirement or
>>>>>>>>>> future plan to implement a feature that allows import-notebook-from-github?
>>>>>>>>>> This would allow users to share notebooks seamlessly.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Vinayak
>>>>>>>>>>
>>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Zhong Wang,
>>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>>> opinion.
>>>>>>>>>>>
>>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Sourav,
>>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation
>>>>>>>>>>> of run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and ShellInterpreter
>>>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>>>
>>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>>>>> nature of scala compiler. That's why user can not run multiple paragraph
>>>>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while they're
>>>>>>>>>>> in different notebooks.
>>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> moon
>>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>>> wangzhong.neu@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>>
>>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support
>>>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Without this the multi tenancy support would be meaningless as
>>>>>>>>>>>>> no one can practically use it in a situation where multiple users are
>>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related
>>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate instance of
>>>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Sourav
>>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <moon@apache.org
>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>>>> from users, conferences and meetings, I could summarize the major interest
>>>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>>       )
>>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It will help anyone quickly get overall interest of project
>>>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss and re-define
>>>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> moon
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Vinayak Agrawal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Vinayak Agrawal
>>>>>>>> Big Data Analytics
>>>>>>>> IBM
>>>>>>>>
>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>
>>

Re: [DISCUSS] Update Roadmap

Posted by TEJA SRIVASTAV <te...@gmail.com>.

+1 on @rick

On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim <bb...@gmail.com> wrote:

> I see in the Enterprise section that multi-tenancy will be included, will
> this have user impersonation too? In this way, the user executing will be
> the user owning the process.
>
> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com> wrote:
>
> +1
>
> Hi Tamas,
>    Pluggable external visualization is really a GREAT feature to have.
> I'm looking forward to this :)
>
> Regards
> Shabeel
>
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>
> wrote:
>
>> Hey,
>>
>> Really promising roadmap.
>>
>> I'd only push more visualization options. I agree built in visualization
>> is needed with limited charting options but I think we also need somehow
>> 'inject' external js visualizations also.
>>
>>
>> For scheduling Zeppelin notebooks  we use
>>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
>> the job rest api. It's an enterprise ready and very robust solution
>> right now.
>>
>>
>> *Tamas*
>>
>> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>>
>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>> to think about which features we develop and which ones we integrate
>>> external, preferred Apache, technology? We don't think about building our
>>> own storage services so why build our own scheduler?
>>> Eran
>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>>> Now I can see a lot of demands around enterprise level job scheduling.
>>>> Either external or built-in, I completely agree having enterprise level job
>>>> scheduling support on the roadmap.
>>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>>> related issues i can find in our JIRA.
>>>>
>>>> @Vinayak
>>>> Regarding importing notebook from github, Zeppelin has pluggable
>>>> notebook storage layer (see related package
>>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>>> So, github notebook sync can be implemented easily.
>>>>
>>>> @Shabeel
>>>> Right, we need better manage management to prevent such OOM.
>>>> And i think table is one of the most frequently used way of displaying
>>>> data. So definitely, we'll need more features like filter, sort, etc.
>>>> After this roadmap discussion, discussion for the next release will
>>>> follow. Then we'll get idea when those features will be available.
>>>>
>>>> @Prasad
>>>> Thanks for mentioning HA and DR. They're really important subject for
>>>> enterprise use. Definitely Zeppelin will need to address them.
>>>> And displaying meta information of notebook on top level page is good
>>>> idea.
>>>>
>>>> It's really great to hear many opinions and ideas.
>>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> For one, I know that there is rudimentary scheduling built into
>>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>>> feature a few months ago).
>>>>> But another point is, that Zeppelin should also focus on quality,
>>>>> reproduceability and portability.
>>>>> Although this doesn't offer exciting new features, it would make
>>>>> development much easier.
>>>>>
>>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>>> compatibility with Firefox, and many more open issues that make it so much
>>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>>
>>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>>>>> on many clusters, but it's not getting the love it needs, and I wouldn't
>>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>>> tool should be able to use the REST-API to trigger executions, if you want
>>>>> external scheduling.
>>>>>
>>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>>> priorities, I fully agree, under the condition that code quality is
>>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>>> assignment on the notebook level. We probably also need Knox-integration
>>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>>> this), and integration of something like Spree (
>>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>>
>>>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>>>> code, to drive this "necessary evil" forward ;)
>>>>>
>>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>
>>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>>
>>>>>> Rather one should be able to call it from any scheduler typically
>>>>>> used in enterprise level. May be support for BPML.
>>>>>>
>>>>>> I believe the existing ability to call/execute a Zeppelin Notebook or
>>>>>> a specific paragraph within a notebook using REST API should take care of
>>>>>> this requirement to some extent.
>>>>>>
>>>>>> Regards,
>>>>>> Sourav
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>
>>>>>>> @Eran Witkon,
>>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>>> This would be promising for now.
>>>>>>> However, in the future Hadoop might not necessarily be installed in
>>>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>>>>>> not be available.
>>>>>>> So perhaps we should give a thought about this feature for the
>>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>>> scheduling?
>>>>>>>
>>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>>> notebook feature.
>>>>>>>
>>>>>>>
>>>>>>> Also, would anybody give any suggestions regarding "sync with
>>>>>>> github" feature?
>>>>>>> -Exporting notebook to Github
>>>>>>> -Importing notebook from Github
>>>>>>>
>>>>>>> Thanks
>>>>>>> Vinayak
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Moon,
>>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>>> security in the list.
>>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>>
>>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>>
>>>>>>>>> 2. Importing a notebook - Is there a current requirement or future
>>>>>>>>> plan to implement a feature that allows import-notebook-from-github? This
>>>>>>>>> would allow users to share notebooks seamlessly.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Vinayak
>>>>>>>>>
>>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Zhong Wang,
>>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>>> opinion.
>>>>>>>>>>
>>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Sourav,
>>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of
>>>>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and ShellInterpreter
>>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>>
>>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>>>> nature of scala compiler. That's why user can not run multiple paragraph
>>>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while they're
>>>>>>>>>> in different notebooks.
>>>>>>>>>> Thanks for the feedback!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> moon
>>>>>>>>>>
>>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <
>>>>>>>>>> wangzhong.neu@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>> Hi Moon,
>>>>>>>>>>>>
>>>>>>>>>>>> This looks great.
>>>>>>>>>>>>
>>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support
>>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>>
>>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>>
>>>>>>>>>>>> Without this the multi tenancy support would be meaningless as
>>>>>>>>>>>> no one can practically use it in a situation where multiple users are
>>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related
>>>>>>>>>>>> interpreter). A possible solution would be to spawn separate instance of
>>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Sourav
>>>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>>> from users, conferences and meetings, I could summarize the major interest
>>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>>       - Security
>>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>>       )
>>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>
>>>>>>>>>>>>>       , PR-728
>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>>>>>>>       , PR-336
>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>
>>>>>>>>>>>>>       , PR-321
>>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>>
>>>>>>>>>>>>> It will help anyone quickly get overall interest of project
>>>>>>>>>>>>> and the direction. And based on this roadmap, we can discuss and re-define
>>>>>>>>>>>>> the next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> moon
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Vinayak Agrawal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Vinayak Agrawal
>>>>>>> Big Data Analytics
>>>>>>> IBM
>>>>>>>
>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>> ~Lord Alfred Tennyson
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>
>
>

Re: [DISCUSS] Update Roadmap

Posted by Benjamin Kim <bb...@gmail.com>.

I see in the Enterprise section that multi-tenancy will be included, will this have user impersonation too? In this way, the user executing will be the user owning the process.

> On Mar 1, 2016, at 12:51 AM, Shabeel Syed <sh...@gmail.com> wrote:
> 
> +1
> 
> Hi Tamas,
>    Pluggable external visualization is really a GREAT feature to have. I'm looking forward to this :)
> 
> Regards
> Shabeel
> 
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <tamas.szuromi@odigeo.com <ma...@odigeo.com>> wrote:
> Hey,
> 
> Really promising roadmap.
> 
> I'd only push more visualization options. I agree built in visualization is needed with limited charting options but I think we also need somehow 'inject' external js visualizations also. 
> 
> 
> For scheduling Zeppelin notebooks  we use https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through the job rest api. It's an enterprise ready and very robust solution right now.
> 
> Tamas
> 
> 
> On 1 March 2016 at 09:12, Eran Witkon <eranwitkon@gmail.com <ma...@gmail.com>> wrote:
> One point to clarify, I don't want to suggest Oozie in specific, I want to think about which features we develop and which ones we integrate external, preferred Apache, technology? We don't think about building our own storage services so why build our own scheduler?
> Eran 
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling. Either external or built-in, I completely agree having enterprise level job scheduling support on the roadmap.
> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>, ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are related issues i can find in our JIRA.
> 
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable notebook storage layer (see related package <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>). So, github notebook sync can be implemented easily.
> 
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying data. So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will follow. Then we'll get idea when those features will be available.
> 
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good idea.
> 
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
> 
> Thanks,
> moon
> 
> 
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <rahvin@gmail.com <ma...@gmail.com>> wrote:
> Hi,
> 
> For one, I know that there is rudimentary scheduling built into Zeppelin already (at least I fixed a bug in the test for a scheduling feature a few months ago).
> But another point is, that Zeppelin should also focus on quality, reproduceability and portability.
> Although this doesn't offer exciting new features, it would make development much easier.
> 
> Cross-platform testability, Tests that pass when run sequentially, compatibility with Firefox, and many more open issues that make it so much harder to enhance Zeppelin and add features should be addressed soon, preferably before more features are added. Already Zeppelin is suffering - in my opinion - from quite a lot of feature creep, and we should avoid putting in the kitchen sink, at the cost of quality and maintainability. Instead modularity (ZEPPELIN-533 in particular) should be targeted.
> 
> Oozie, in my opinion, is a dead end - it may de-facto still be in use on many clusters, but it's not getting the love it needs, and I wouldn't bet on it, when it comes to integrating scheduling. Instead, any external tool should be able to use the REST-API to trigger executions, if you want external scheduling.
> 
> So, in conclusion, if we take Moon's list as a list of descending priorities, I fully agree, under the condition that code quality is included as a subset of enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO support is what we really want) with user and group rights assignment on the notebook level. We probably also need Knox-integration (ODP-Members looking at integrating Zeppelin should consider contributing this), and integration of something like Spree (https://github.com/hammerlab/spree <https://github.com/hammerlab/spree>) to be able to profile jobs.
> 
> I'm hopeful that soon I can resume contributing some quality-oriented code, to drive this "necessary evil" forward ;)
> 
> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <sourav.mazumder00@gmail.com <ma...@gmail.com>> wrote:
> I do agree with Vinayak. It need not be coupled with Oozie.
> 
> Rather one should be able to call it from any scheduler typically used in enterprise level. May be support for BPML.
> 
> I believe the existing ability to call/execute a Zeppelin Notebook or a specific paragraph within a notebook using REST API should take care of this requirement to some extent.
> 
> Regards,
> Sourav
> 
> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <vinayakagrawal88@gmail.com <ma...@gmail.com>> wrote:
> @Eran Witkon, 
> Thanks for the suggestion Eran. I concur with your thought. 
> If Zepplin can be integrated with oozie, that would be wonderful. Users will also be able to leverage their Oozie skills. 
> This would be promising for now. 
> However, in the future Hadoop might not necessarily be installed in Spark Cluster and Oozie (since its installs with Hadoop Distribution) might not be available.
> So perhaps we should give a thought about this feature for the future. Should it depend on oozie or should Zeppelin have its owns scheduling?
> 
> As Benjamin has iterated, Databrick notebook has this as a core notebook feature. 
> 
> 
> Also, would anybody give any suggestions regarding "sync with github" feature?
> -Exporting notebook to Github
> -Importing notebook from Github
> 
> Thanks 
> Vinayak  
>  
> 
> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <eranwitkon@gmail.com <ma...@gmail.com>> wrote:
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin to existing scheduling tools\workflow tools such as  https://oozie.apache.org/ <https://oozie.apache.org/>. this requires betters hooks and status reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
> 
> 
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vinayakagrawal88@gmail.com <ma...@gmail.com>> wrote:
> Moon,
> The new roadmap looks very promising. I am very happy to see security in the list.
> I have some suggestions regarding Enterprise Ready features:
> 
> 1. Job Scheduler - Can this be improved? 
> Currently the scheduler can be used with Cron expression or a pre-set time. But in an enterprise solution, a notebook might be one piece of the workflow. Can we look towards the functionality of scheduling notebook's based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream users wait for the ETL notebook to finish successfully. Only after that, other business oriented notebooks can be executed.  
> 
> 2. Importing a notebook - Is there a current requirement or future plan to implement a feature that allows import-notebook-from-github? This would allow users to share notebooks seamlessly. 
> 
> Thanks 
> Vinayak
> 
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
> Zhong Wang, 
> Right, Folder support would be quite useful. Thanks for the opinion. 
> Hope i can finish the work pr-190 <https://github.com/apache/incubator-zeppelin/pull/190>.
> 
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run paragraph/query concurrently. Interpreter can implement it's own scheduling policy. For example, SparkSQL interpreter and ShellInterpreter can already run paragraph/query concurrently.
> 
> SparkInterpreter is implemented with FIFO scheduler considering nature of scala compiler. That's why user can not run multiple paragraph concurrently when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have separate scala compiler so paragraphs run concurrently, while they're in different notebooks.
> Thanks for the feedback!
> 
> Best,
> moon
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wangzhong.neu@gmail.com <ma...@gmail.com>> wrote:
> Sourav: I think this newly merged PR can help you https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 <https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537>
> 
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <sourav.mazumder00@gmail.com <ma...@gmail.com>> wrote:
> Hi Moon,
> 
> This looks great.
> 
> My only suggestion would be to include a PR/feature - Support for Running Concurrent paragraphs/queries in Zeppelin. 
> 
> Right now if more than one user tries to run paragraphs in multiple notebooks concurrently through a single Zeppelin instance (and single interpreter instance) the performance is very slow. It is obvious that the queue gets built up within the zeppelin process and interpreter process in that scenario as the time taken to move the status from start to pending and pending to running is very high compared to the actual running time of a paragraph.
> 
> Without this the multi tenancy support would be meaningless as no one can practically use it in a situation where multiple users are trying to connect to the same instance of Zeppelin (and the related interpreter). A possible solution would be to spawn separate instance of the same interpreter at every notebook/user level.
> 
> Regards,
> Sourav
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <moon@apache.org <ma...@apache.org>> wrote:
> Hi Zeppelin users and developers,
> 
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap <https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap>
> is almost 9 month old, and it doesn't reflect where the community goes anymore. It's time to update.
> 
> Based on mailing list, jira issues, pullrequests, feedbacks from users, conferences and meetings, I could summarize the major interest of users and developers in 7 categories. Enterprise ready, Usability improvement, Pluggability, Documentation, Backend integration, Notebook storage, and Visualization.
> 
> And i could list related subjects under each categories.
> Enterprise ready
> Authentication 
> Shiro authentication ZEPPELIN-548 <https://issues.apache.org/jira/browse/ZEPPELIN-548>
> Authorization 
> Notebook authorization PR-681 <https://github.com/apache/incubator-zeppelin/pull/681>
> Security
> Multi-tenancy
> Stability
> Usability Improvement
> UX improvement
> Better Table data support
> Download data as csv, etc PR-725 <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
> Featureful table data display (pagenation, etc)
> Pluggability ZEPPELIN-533 <https://issues.apache.org/jira/browse/ZEPPELIN-533>
> Pluggable visualization
> Dynamic Interpreter, notebook, visualization loading
> Repository and registry for pluggable components
> Improve documentation
> Improve contents and readability
> more tutorials, examples
> Interpreter
> Generic JDBC Interpreter
> (spark)R Interpreter
> Cluster manager for interpreter (Proposal <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>)
> more interpreters
> Notebook storage
> Versioning ZEPPELIN-540 <http://issues.apache.org/jira/browse/ZEPPELIN-540>
> more notebook storages
> Visualization
> More visualizations PR-152 <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728 <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336 <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321 <https://github.com/apache/incubator-zeppelin/pull/321>
> Customize graph (show/hide label, color, etc)
> It will help anyone quickly get overall interest of project and the direction. And based on this roadmap, we can discuss and re-define the next release 0.6.0 scope and it's schedule.
> 
> What do you think? Any feedback would be appreciated.
> 
> Thanks,
> moon
> 
> 
> 
> 
> -- 
> Vinayak Agrawal
> 
> 
> "To Strive, To Seek, To Find and Not to Yield!" 
> ~Lord Alfred Tennyson
> 
> 
> 
> -- 
> Vinayak Agrawal
> Big Data Analytics
> IBM
> 
> "To Strive, To Seek, To Find and Not to Yield!" 
> ~Lord Alfred Tennyson
> 
> 
> 
>

Re: [DISCUSS] Update Roadmap

Posted by Shabeel Syed <sh...@gmail.com>.

+1

Hi Tamas,
   Pluggable external visualization is really a GREAT feature to have. I'm
looking forward to this :)

Regards
Shabeel

On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi <ta...@odigeo.com>
wrote:

> Hey,
>
> Really promising roadmap.
>
> I'd only push more visualization options. I agree built in visualization
> is needed with limited charting options but I think we also need somehow
> 'inject' external js visualizations also.
>
>
> For scheduling Zeppelin notebooks  we use
>  https://github.com/airbnb/airflow <https://github.com/airbnb/airflow> through
> the job rest api. It's an enterprise ready and very robust solution right
> now.
>
>
> *Tamas*
>
> On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:
>
>> One point to clarify, I don't want to suggest Oozie in specific, I want
>> to think about which features we develop and which ones we integrate
>> external, preferred Apache, technology? We don't think about building our
>> own storage services so why build our own scheduler?
>> Eran
>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>>
>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>> Now I can see a lot of demands around enterprise level job scheduling.
>>> Either external or built-in, I completely agree having enterprise level job
>>> scheduling support on the roadmap.
>>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>>> related issues i can find in our JIRA.
>>>
>>> @Vinayak
>>> Regarding importing notebook from github, Zeppelin has pluggable
>>> notebook storage layer (see related package
>>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>>> So, github notebook sync can be implemented easily.
>>>
>>> @Shabeel
>>> Right, we need better manage management to prevent such OOM.
>>> And i think table is one of the most frequently used way of displaying
>>> data. So definitely, we'll need more features like filter, sort, etc.
>>> After this roadmap discussion, discussion for the next release will
>>> follow. Then we'll get idea when those features will be available.
>>>
>>> @Prasad
>>> Thanks for mentioning HA and DR. They're really important subject for
>>> enterprise use. Definitely Zeppelin will need to address them.
>>> And displaying meta information of notebook on top level page is good
>>> idea.
>>>
>>> It's really great to hear many opinions and ideas.
>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> For one, I know that there is rudimentary scheduling built into
>>>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>>>> feature a few months ago).
>>>> But another point is, that Zeppelin should also focus on quality,
>>>> reproduceability and portability.
>>>> Although this doesn't offer exciting new features, it would make
>>>> development much easier.
>>>>
>>>> Cross-platform testability, Tests that pass when run sequentially,
>>>> compatibility with Firefox, and many more open issues that make it so much
>>>> harder to enhance Zeppelin and add features should be addressed soon,
>>>> preferably before more features are added. Already Zeppelin is suffering -
>>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>>
>>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>>>> on many clusters, but it's not getting the love it needs, and I wouldn't
>>>> bet on it, when it comes to integrating scheduling. Instead, any external
>>>> tool should be able to use the REST-API to trigger executions, if you want
>>>> external scheduling.
>>>>
>>>> So, in conclusion, if we take Moon's list as a list of descending
>>>> priorities, I fully agree, under the condition that code quality is
>>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>>> SPNEGO SSO support is what we really want) with user and group rights
>>>> assignment on the notebook level. We probably also need Knox-integration
>>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>>> this), and integration of something like Spree (
>>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>>
>>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>>> code, to drive this "necessary evil" forward ;)
>>>>
>>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>>> sourav.mazumder00@gmail.com> wrote:
>>>>
>>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>>
>>>>> Rather one should be able to call it from any scheduler typically used
>>>>> in enterprise level. May be support for BPML.
>>>>>
>>>>> I believe the existing ability to call/execute a Zeppelin Notebook or
>>>>> a specific paragraph within a notebook using REST API should take care of
>>>>> this requirement to some extent.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>
>>>>>> @Eran Witkon,
>>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>>> Users will also be able to leverage their Oozie skills.
>>>>>> This would be promising for now.
>>>>>> However, in the future Hadoop might not necessarily be installed in
>>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>>>>> not be available.
>>>>>> So perhaps we should give a thought about this feature for the
>>>>>> future. Should it depend on oozie or should Zeppelin have its owns
>>>>>> scheduling?
>>>>>>
>>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>>> notebook feature.
>>>>>>
>>>>>>
>>>>>> Also, would anybody give any suggestions regarding "sync with github"
>>>>>> feature?
>>>>>> -Exporting notebook to Github
>>>>>> -Importing notebook from Github
>>>>>>
>>>>>> Thanks
>>>>>> Vinayak
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>>
>>>>>>>> Moon,
>>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>>> security in the list.
>>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>>
>>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>>
>>>>>>>> 2. Importing a notebook - Is there a current requirement or future
>>>>>>>> plan to implement a feature that allows import-notebook-from-github? This
>>>>>>>> would allow users to share notebooks seamlessly.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Vinayak
>>>>>>>>
>>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Zhong Wang,
>>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>>> opinion.
>>>>>>>>>
>>>>>>>> Hope i can finish the work pr-190
>>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>>
>>>>>>>>
>>>>>>>>> Sourav,
>>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of
>>>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>>>> scheduling policy. For example, SparkSQL interpreter and ShellInterpreter
>>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>>
>>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>>> nature of scala compiler. That's why user can not run multiple paragraph
>>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will
>>>>>>>>> have separate scala compiler so paragraphs run concurrently, while they're
>>>>>>>>> in different notebooks.
>>>>>>>>> Thanks for the feedback!
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> moon
>>>>>>>>>
>>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>> Hi Moon,
>>>>>>>>>>>
>>>>>>>>>>> This looks great.
>>>>>>>>>>>
>>>>>>>>>>> My only suggestion would be to include a PR/feature - Support
>>>>>>>>>>> for Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>>
>>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>>> time of a paragraph.
>>>>>>>>>>>
>>>>>>>>>>> Without this the multi tenancy support would be meaningless as
>>>>>>>>>>> no one can practically use it in a situation where multiple users are
>>>>>>>>>>> trying to connect to the same instance of Zeppelin (and the related
>>>>>>>>>>> interpreter). A possible solution would be to spawn separate instance of
>>>>>>>>>>> the same interpreter at every notebook/user level.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Sourav
>>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>>
>>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>>
>>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks
>>>>>>>>>>>> from users, conferences and meetings, I could summarize the major interest
>>>>>>>>>>>> of users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>>
>>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>>       - Authentication
>>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>>       - Authorization
>>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>>       - Security
>>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>>       - Stability
>>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>>          , PR-714
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>>          , PR-6
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>>>>>          , PR-89
>>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>>       )
>>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>>    - Visualization
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
>>>>>>>>>>>>       PR-728
>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>,
>>>>>>>>>>>>       PR-336
>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
>>>>>>>>>>>>       PR-321
>>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>>
>>>>>>>>>>>> It will help anyone quickly get overall interest of project and
>>>>>>>>>>>> the direction. And based on this roadmap, we can discuss and re-define the
>>>>>>>>>>>> next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> moon
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Vinayak Agrawal
>>>>>>>>
>>>>>>>>
>>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>>> ~Lord Alfred Tennyson
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Vinayak Agrawal
>>>>>> Big Data Analytics
>>>>>> IBM
>>>>>>
>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>> ~Lord Alfred Tennyson
>>>>>>
>>>>>
>>>>>
>>>>
>

Re: [DISCUSS] Update Roadmap

Posted by Tamas Szuromi <ta...@odigeo.com>.

Hey,

Really promising roadmap.

I'd only push more visualization options. I agree built in visualization is
needed with limited charting options but I think we also need somehow
'inject' external js visualizations also.


For scheduling Zeppelin notebooks  we use https://github.com/airbnb/airflow
<https://github.com/airbnb/airflow> through the job rest api. It's an
enterprise ready and very robust solution right now.


*Tamas*

On 1 March 2016 at 09:12, Eran Witkon <er...@gmail.com> wrote:

> One point to clarify, I don't want to suggest Oozie in specific, I want to
> think about which features we develop and which ones we integrate external,
> preferred Apache, technology? We don't think about building our own storage
> services so why build our own scheduler?
> Eran
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:
>
>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>> Now I can see a lot of demands around enterprise level job scheduling.
>> Either external or built-in, I completely agree having enterprise level job
>> scheduling support on the roadmap.
>> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
>> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
>> related issues i can find in our JIRA.
>>
>> @Vinayak
>> Regarding importing notebook from github, Zeppelin has pluggable notebook
>> storage layer (see related package
>> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
>> So, github notebook sync can be implemented easily.
>>
>> @Shabeel
>> Right, we need better manage management to prevent such OOM.
>> And i think table is one of the most frequently used way of displaying
>> data. So definitely, we'll need more features like filter, sort, etc.
>> After this roadmap discussion, discussion for the next release will
>> follow. Then we'll get idea when those features will be available.
>>
>> @Prasad
>> Thanks for mentioning HA and DR. They're really important subject for
>> enterprise use. Definitely Zeppelin will need to address them.
>> And displaying meta information of notebook on top level page is good
>> idea.
>>
>> It's really great to hear many opinions and ideas.
>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>
>> Thanks,
>> moon
>>
>>
>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> For one, I know that there is rudimentary scheduling built into Zeppelin
>>> already (at least I fixed a bug in the test for a scheduling feature a few
>>> months ago).
>>> But another point is, that Zeppelin should also focus on quality,
>>> reproduceability and portability.
>>> Although this doesn't offer exciting new features, it would make
>>> development much easier.
>>>
>>> Cross-platform testability, Tests that pass when run sequentially,
>>> compatibility with Firefox, and many more open issues that make it so much
>>> harder to enhance Zeppelin and add features should be addressed soon,
>>> preferably before more features are added. Already Zeppelin is suffering -
>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>
>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
>>> many clusters, but it's not getting the love it needs, and I wouldn't bet
>>> on it, when it comes to integrating scheduling. Instead, any external tool
>>> should be able to use the REST-API to trigger executions, if you want
>>> external scheduling.
>>>
>>> So, in conclusion, if we take Moon's list as a list of descending
>>> priorities, I fully agree, under the condition that code quality is
>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>> SPNEGO SSO support is what we really want) with user and group rights
>>> assignment on the notebook level. We probably also need Knox-integration
>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>> this), and integration of something like Spree (
>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>
>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>> code, to drive this "necessary evil" forward ;)
>>>
>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>>
>>>> Rather one should be able to call it from any scheduler typically used
>>>> in enterprise level. May be support for BPML.
>>>>
>>>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>>>> specific paragraph within a notebook using REST API should take care of
>>>> this requirement to some extent.
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>>> vinayakagrawal88@gmail.com> wrote:
>>>>
>>>>> @Eran Witkon,
>>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>>> If Zepplin can be integrated with oozie, that would be wonderful.
>>>>> Users will also be able to leverage their Oozie skills.
>>>>> This would be promising for now.
>>>>> However, in the future Hadoop might not necessarily be installed in
>>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>>>> not be available.
>>>>> So perhaps we should give a thought about this feature for the future.
>>>>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>>>>
>>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>>> notebook feature.
>>>>>
>>>>>
>>>>> Also, would anybody give any suggestions regarding "sync with github"
>>>>> feature?
>>>>> -Exporting notebook to Github
>>>>> -Importing notebook from Github
>>>>>
>>>>> Thanks
>>>>> Vinayak
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>>
>>>>>>> Moon,
>>>>>>> The new roadmap looks very promising. I am very happy to see
>>>>>>> security in the list.
>>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>>
>>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>>> Currently the scheduler can be used with Cron expression or a
>>>>>>> pre-set time. But in an enterprise solution, a notebook might be one piece
>>>>>>> of the workflow. Can we look towards the functionality of scheduling
>>>>>>> notebook's based on other notebooks finishing their job successfully?
>>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>>
>>>>>>> 2. Importing a notebook - Is there a current requirement or future
>>>>>>> plan to implement a feature that allows import-notebook-from-github? This
>>>>>>> would allow users to share notebooks seamlessly.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Vinayak
>>>>>>>
>>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Zhong Wang,
>>>>>>>> Right, Folder support would be quite useful. Thanks for the
>>>>>>>> opinion.
>>>>>>>>
>>>>>>> Hope i can finish the work pr-190
>>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>>
>>>>>>>
>>>>>>>> Sourav,
>>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of
>>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>>> scheduling policy. For example, SparkSQL interpreter and ShellInterpreter
>>>>>>>> can already run paragraph/query concurrently.
>>>>>>>>
>>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>>> nature of scala compiler. That's why user can not run multiple paragraph
>>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>>>>>>> separate scala compiler so paragraphs run concurrently, while they're in
>>>>>>>> different notebooks.
>>>>>>>> Thanks for the feedback!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> moon
>>>>>>>>
>>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>>
>>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>>
>>>>>>>> Hi Moon,
>>>>>>>>>>
>>>>>>>>>> This looks great.
>>>>>>>>>>
>>>>>>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>>>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>>
>>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>>> time of a paragraph.
>>>>>>>>>>
>>>>>>>>>> Without this the multi tenancy support would be meaningless as no
>>>>>>>>>> one can practically use it in a situation where multiple users are trying
>>>>>>>>>> to connect to the same instance of Zeppelin (and the related interpreter).
>>>>>>>>>> A possible solution would be to spawn separate instance of the same
>>>>>>>>>> interpreter at every notebook/user level.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Sourav
>>>>>>>>>>
>>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>>
>>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>>
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the
>>>>>>>>>>> community goes anymore. It's time to update.
>>>>>>>>>>>
>>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>>
>>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>>       - Authentication
>>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>>       - Authorization
>>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>>       - Security
>>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>>       - Stability
>>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - UX improvement
>>>>>>>>>>>       - Better Table data support
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>>          , PR-714
>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>>          , PR-6
>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>>>>>>>>          PR-89
>>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Improve documentation
>>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>>    - Interpreter
>>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>>       )
>>>>>>>>>>>       - more interpreters
>>>>>>>>>>>    - Notebook storage
>>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>>       - more notebook storages
>>>>>>>>>>>    - Visualization
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
>>>>>>>>>>>       PR-728
>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>,
>>>>>>>>>>>       PR-336
>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
>>>>>>>>>>>       PR-321
>>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>>
>>>>>>>>>>> It will help anyone quickly get overall interest of project and
>>>>>>>>>>> the direction. And based on this roadmap, we can discuss and re-define the
>>>>>>>>>>> next release 0.6.0 scope and it's schedule.
>>>>>>>>>>>
>>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> moon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Vinayak Agrawal
>>>>>>>
>>>>>>>
>>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>>> ~Lord Alfred Tennyson
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Vinayak Agrawal
>>>>> Big Data Analytics
>>>>> IBM
>>>>>
>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>> ~Lord Alfred Tennyson
>>>>>
>>>>
>>>>
>>>

Re: [DISCUSS] Update Roadmap

Posted by Eran Witkon <er...@gmail.com>.

One point to clarify, I don't want to suggest Oozie in specific, I want to
think about which features we develop and which ones we integrate external,
preferred Apache, technology? We don't think about building our own storage
services so why build our own scheduler?
Eran
On Tue, 1 Mar 2016 at 09:49 moon soo Lee <mo...@apache.org> wrote:

> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling.
> Either external or built-in, I completely agree having enterprise level job
> scheduling support on the roadmap.
> ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
> ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
> related issues i can find in our JIRA.
>
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable notebook
> storage layer (see related package
> <https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
> So, github notebook sync can be implemented easily.
>
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying
> data. So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will
> follow. Then we'll get idea when those features will be available.
>
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for
> enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good idea.
>
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
>
> Thanks,
> moon
>
>
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:
>
>> Hi,
>>
>> For one, I know that there is rudimentary scheduling built into Zeppelin
>> already (at least I fixed a bug in the test for a scheduling feature a few
>> months ago).
>> But another point is, that Zeppelin should also focus on quality,
>> reproduceability and portability.
>> Although this doesn't offer exciting new features, it would make
>> development much easier.
>>
>> Cross-platform testability, Tests that pass when run sequentially,
>> compatibility with Firefox, and many more open issues that make it so much
>> harder to enhance Zeppelin and add features should be addressed soon,
>> preferably before more features are added. Already Zeppelin is suffering -
>> in my opinion - from quite a lot of feature creep, and we should avoid
>> putting in the kitchen sink, at the cost of quality and maintainability.
>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>
>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
>> many clusters, but it's not getting the love it needs, and I wouldn't bet
>> on it, when it comes to integrating scheduling. Instead, any external tool
>> should be able to use the REST-API to trigger executions, if you want
>> external scheduling.
>>
>> So, in conclusion, if we take Moon's list as a list of descending
>> priorities, I fully agree, under the condition that code quality is
>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>> SPNEGO SSO support is what we really want) with user and group rights
>> assignment on the notebook level. We probably also need Knox-integration
>> (ODP-Members looking at integrating Zeppelin should consider contributing
>> this), and integration of something like Spree (
>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>
>> I'm hopeful that soon I can resume contributing some quality-oriented
>> code, to drive this "necessary evil" forward ;)
>>
>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>
>>> Rather one should be able to call it from any scheduler typically used
>>> in enterprise level. May be support for BPML.
>>>
>>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>>> specific paragraph within a notebook using REST API should take care of
>>> this requirement to some extent.
>>>
>>> Regards,
>>> Sourav
>>>
>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>> vinayakagrawal88@gmail.com> wrote:
>>>
>>>> @Eran Witkon,
>>>> Thanks for the suggestion Eran. I concur with your thought.
>>>> If Zepplin can be integrated with oozie, that would be wonderful. Users
>>>> will also be able to leverage their Oozie skills.
>>>> This would be promising for now.
>>>> However, in the future Hadoop might not necessarily be installed in
>>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>>> not be available.
>>>> So perhaps we should give a thought about this feature for the future.
>>>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>>>
>>>> As Benjamin has iterated, Databrick notebook has this as a core
>>>> notebook feature.
>>>>
>>>>
>>>> Also, would anybody give any suggestions regarding "sync with github"
>>>> feature?
>>>> -Exporting notebook to Github
>>>> -Importing notebook from Github
>>>>
>>>> Thanks
>>>> Vinayak
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>>> wrote:
>>>>
>>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>>> vinayakagrawal88@gmail.com> wrote:
>>>>>
>>>>>> Moon,
>>>>>> The new roadmap looks very promising. I am very happy to see security
>>>>>> in the list.
>>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>>
>>>>>> 1. Job Scheduler - Can this be improved?
>>>>>> Currently the scheduler can be used with Cron expression or a pre-set
>>>>>> time. But in an enterprise solution, a notebook might be one piece of the
>>>>>> workflow. Can we look towards the functionality of scheduling notebook's
>>>>>> based on other notebooks finishing their job successfully?
>>>>>> This requirement would arise in any ETL workflow, where all the
>>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>>> after that, other business oriented notebooks can be executed.
>>>>>>
>>>>>> 2. Importing a notebook - Is there a current requirement or future
>>>>>> plan to implement a feature that allows import-notebook-from-github? This
>>>>>> would allow users to share notebooks seamlessly.
>>>>>>
>>>>>> Thanks
>>>>>> Vinayak
>>>>>>
>>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Zhong Wang,
>>>>>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>>>>>
>>>>>> Hope i can finish the work pr-190
>>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>>
>>>>>>
>>>>>>> Sourav,
>>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of
>>>>>>> run paragraph/query concurrently. Interpreter can implement it's own
>>>>>>> scheduling policy. For example, SparkSQL interpreter and ShellInterpreter
>>>>>>> can already run paragraph/query concurrently.
>>>>>>>
>>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>>> nature of scala compiler. That's why user can not run multiple paragraph
>>>>>>> concurrently when they work with SparkInterpreter.
>>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>>>>>> separate scala compiler so paragraphs run concurrently, while they're in
>>>>>>> different notebooks.
>>>>>>> Thanks for the feedback!
>>>>>>>
>>>>>>> Best,
>>>>>>> moon
>>>>>>>
>>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>> Sourav: I think this newly merged PR can help you
>>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>>
>>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>>
>>>>>>> Hi Moon,
>>>>>>>>>
>>>>>>>>> This looks great.
>>>>>>>>>
>>>>>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>>
>>>>>>>>> Right now if more than one user tries to run paragraphs in
>>>>>>>>> multiple notebooks concurrently through a single Zeppelin instance (and
>>>>>>>>> single interpreter instance) the performance is very slow. It is obvious
>>>>>>>>> that the queue gets built up within the zeppelin process and interpreter
>>>>>>>>> process in that scenario as the time taken to move the status from start to
>>>>>>>>> pending and pending to running is very high compared to the actual running
>>>>>>>>> time of a paragraph.
>>>>>>>>>
>>>>>>>>> Without this the multi tenancy support would be meaningless as no
>>>>>>>>> one can practically use it in a situation where multiple users are trying
>>>>>>>>> to connect to the same instance of Zeppelin (and the related interpreter).
>>>>>>>>> A possible solution would be to spawn separate instance of the same
>>>>>>>>> interpreter at every notebook/user level.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Sourav
>>>>>>>>>
>>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>>
>>>>>>>>>> The roadmap we have published at
>>>>>>>>>>
>>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>>>>>> goes anymore. It's time to update.
>>>>>>>>>>
>>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>>>> storage, and Visualization.
>>>>>>>>>>
>>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>    - Enterprise ready
>>>>>>>>>>       - Authentication
>>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>>       - Authorization
>>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>>       - Security
>>>>>>>>>>       - Multi-tenancy
>>>>>>>>>>       - Stability
>>>>>>>>>>    - Usability Improvement
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - UX improvement
>>>>>>>>>>       - Better Table data support
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>
>>>>>>>>>>          , PR-714
>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>>>>>>>          , PR-6
>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>>>>>>>          PR-89
>>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>>       - Pluggable visualization
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Improve documentation
>>>>>>>>>>       - Improve contents and readability
>>>>>>>>>>       - more tutorials, examples
>>>>>>>>>>    - Interpreter
>>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>>       )
>>>>>>>>>>       - more interpreters
>>>>>>>>>>    - Notebook storage
>>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>>       - more notebook storages
>>>>>>>>>>    - Visualization
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
>>>>>>>>>>       PR-728
>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>,
>>>>>>>>>>       PR-336
>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
>>>>>>>>>>       PR-321
>>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>>
>>>>>>>>>> It will help anyone quickly get overall interest of project and
>>>>>>>>>> the direction. And based on this roadmap, we can discuss and re-define the
>>>>>>>>>> next release 0.6.0 scope and it's schedule.
>>>>>>>>>>
>>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> moon
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Vinayak Agrawal
>>>>>>
>>>>>>
>>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>>> ~Lord Alfred Tennyson
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Vinayak Agrawal
>>>> Big Data Analytics
>>>> IBM
>>>>
>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>> ~Lord Alfred Tennyson
>>>>
>>>
>>>
>>

Re: [DISCUSS] Update Roadmap

Posted by moon soo Lee <mo...@apache.org>.

@Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
Now I can see a lot of demands around enterprise level job scheduling.
Either external or built-in, I completely agree having enterprise level job
scheduling support on the roadmap.
ZEPPELIN-137 <https://issues.apache.org/jira/browse/ZEPPELIN-137>,
ZEPPELIN-531 <https://issues.apache.org/jira/browse/ZEPPELIN-531> are
related issues i can find in our JIRA.

@Vinayak
Regarding importing notebook from github, Zeppelin has pluggable notebook
storage layer (see related package
<https://github.com/apache/incubator-zeppelin/tree/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo>).
So, github notebook sync can be implemented easily.

@Shabeel
Right, we need better manage management to prevent such OOM.
And i think table is one of the most frequently used way of displaying
data. So definitely, we'll need more features like filter, sort, etc.
After this roadmap discussion, discussion for the next release will follow.
Then we'll get idea when those features will be available.

@Prasad
Thanks for mentioning HA and DR. They're really important subject for
enterprise use. Definitely Zeppelin will need to address them.
And displaying meta information of notebook on top level page is good idea.

It's really great to hear many opinions and ideas.
And thanks @Rick for sharing valuable view to Zeppelin project.

Thanks,
moon


On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz <ra...@gmail.com> wrote:

> Hi,
>
> For one, I know that there is rudimentary scheduling built into Zeppelin
> already (at least I fixed a bug in the test for a scheduling feature a few
> months ago).
> But another point is, that Zeppelin should also focus on quality,
> reproduceability and portability.
> Although this doesn't offer exciting new features, it would make
> development much easier.
>
> Cross-platform testability, Tests that pass when run sequentially,
> compatibility with Firefox, and many more open issues that make it so much
> harder to enhance Zeppelin and add features should be addressed soon,
> preferably before more features are added. Already Zeppelin is suffering -
> in my opinion - from quite a lot of feature creep, and we should avoid
> putting in the kitchen sink, at the cost of quality and maintainability.
> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>
> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
> many clusters, but it's not getting the love it needs, and I wouldn't bet
> on it, when it comes to integrating scheduling. Instead, any external tool
> should be able to use the REST-API to trigger executions, if you want
> external scheduling.
>
> So, in conclusion, if we take Moon's list as a list of descending
> priorities, I fully agree, under the condition that code quality is
> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
> SPNEGO SSO support is what we really want) with user and group rights
> assignment on the notebook level. We probably also need Knox-integration
> (ODP-Members looking at integrating Zeppelin should consider contributing
> this), and integration of something like Spree (
> https://github.com/hammerlab/spree) to be able to profile jobs.
>
> I'm hopeful that soon I can resume contributing some quality-oriented
> code, to drive this "necessary evil" forward ;)
>
> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
>> I do agree with Vinayak. It need not be coupled with Oozie.
>>
>> Rather one should be able to call it from any scheduler typically used in
>> enterprise level. May be support for BPML.
>>
>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>> specific paragraph within a notebook using REST API should take care of
>> this requirement to some extent.
>>
>> Regards,
>> Sourav
>>
>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>> vinayakagrawal88@gmail.com> wrote:
>>
>>> @Eran Witkon,
>>> Thanks for the suggestion Eran. I concur with your thought.
>>> If Zepplin can be integrated with oozie, that would be wonderful. Users
>>> will also be able to leverage their Oozie skills.
>>> This would be promising for now.
>>> However, in the future Hadoop might not necessarily be installed in
>>> Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
>>> not be available.
>>> So perhaps we should give a thought about this feature for the future.
>>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>>
>>> As Benjamin has iterated, Databrick notebook has this as a core notebook
>>> feature.
>>>
>>>
>>> Also, would anybody give any suggestions regarding "sync with github"
>>> feature?
>>> -Exporting notebook to Github
>>> -Importing notebook from Github
>>>
>>> Thanks
>>> Vinayak
>>>
>>>
>>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>>> wrote:
>>>
>>>> @Vinayak Agrawal I would suggest adding the ability to connect
>>>> zeppelin to existing scheduling tools\workflow tools such as
>>>> https://oozie.apache.org/. this requires betters hooks and status
>>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>>> vinayakagrawal88@gmail.com> wrote:
>>>>
>>>>> Moon,
>>>>> The new roadmap looks very promising. I am very happy to see security
>>>>> in the list.
>>>>> I have some suggestions regarding Enterprise Ready features:
>>>>>
>>>>> 1. Job Scheduler - Can this be improved?
>>>>> Currently the scheduler can be used with Cron expression or a pre-set
>>>>> time. But in an enterprise solution, a notebook might be one piece of the
>>>>> workflow. Can we look towards the functionality of scheduling notebook's
>>>>> based on other notebooks finishing their job successfully?
>>>>> This requirement would arise in any ETL workflow, where all the
>>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>>> after that, other business oriented notebooks can be executed.
>>>>>
>>>>> 2. Importing a notebook - Is there a current requirement or future
>>>>> plan to implement a feature that allows import-notebook-from-github? This
>>>>> would allow users to share notebooks seamlessly.
>>>>>
>>>>> Thanks
>>>>> Vinayak
>>>>>
>>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Zhong Wang,
>>>>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>>>>
>>>>> Hope i can finish the work pr-190
>>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>>
>>>>>
>>>>>> Sourav,
>>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>>>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>>>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>>>>> run paragraph/query concurrently.
>>>>>>
>>>>>> SparkInterpreter is implemented with FIFO scheduler considering
>>>>>> nature of scala compiler. That's why user can not run multiple paragraph
>>>>>> concurrently when they work with SparkInterpreter.
>>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>>>>> separate scala compiler so paragraphs run concurrently, while they're in
>>>>>> different notebooks.
>>>>>> Thanks for the feedback!
>>>>>>
>>>>>> Best,
>>>>>> moon
>>>>>>
>>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>> Sourav: I think this newly merged PR can help you
>>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>>
>>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>>
>>>>>> Hi Moon,
>>>>>>>>
>>>>>>>> This looks great.
>>>>>>>>
>>>>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>>
>>>>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>>>>> that scenario as the time taken to move the status from start to pending
>>>>>>>> and pending to running is very high compared to the actual running time of
>>>>>>>> a paragraph.
>>>>>>>>
>>>>>>>> Without this the multi tenancy support would be meaningless as no
>>>>>>>> one can practically use it in a situation where multiple users are trying
>>>>>>>> to connect to the same instance of Zeppelin (and the related interpreter).
>>>>>>>> A possible solution would be to spawn separate instance of the same
>>>>>>>> interpreter at every notebook/user level.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sourav
>>>>>>>>
>>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>> Hi Zeppelin users and developers,
>>>>>>>>>
>>>>>>>>> The roadmap we have published at
>>>>>>>>>
>>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>>>>> goes anymore. It's time to update.
>>>>>>>>>
>>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>>> storage, and Visualization.
>>>>>>>>>
>>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>>
>>>>>>>>
>>>>>>>>>    - Enterprise ready
>>>>>>>>>       - Authentication
>>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>>       - Authorization
>>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>>       - Security
>>>>>>>>>       - Multi-tenancy
>>>>>>>>>       - Stability
>>>>>>>>>    - Usability Improvement
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - UX improvement
>>>>>>>>>       - Better Table data support
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>>>>>          PR-714
>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>>>>>>>>>          PR-6
>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>>>>>>          PR-89
>>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>>       - Pluggable visualization
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Improve documentation
>>>>>>>>>       - Improve contents and readability
>>>>>>>>>       - more tutorials, examples
>>>>>>>>>    - Interpreter
>>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>>       - (spark)R Interpreter
>>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>>       )
>>>>>>>>>       - more interpreters
>>>>>>>>>    - Notebook storage
>>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>>       - more notebook storages
>>>>>>>>>    - Visualization
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - More visualizations PR-152
>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
>>>>>>>>>       PR-728
>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>,
>>>>>>>>>       PR-336
>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
>>>>>>>>>       PR-321
>>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>>
>>>>>>>>> It will help anyone quickly get overall interest of project and
>>>>>>>>> the direction. And based on this roadmap, we can discuss and re-define the
>>>>>>>>> next release 0.6.0 scope and it's schedule.
>>>>>>>>>
>>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> moon
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Vinayak Agrawal
>>>>>
>>>>>
>>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>>> ~Lord Alfred Tennyson
>>>>>
>>>>
>>>
>>>
>>> --
>>> Vinayak Agrawal
>>> Big Data Analytics
>>> IBM
>>>
>>> "To Strive, To Seek, To Find and Not to Yield!"
>>> ~Lord Alfred Tennyson
>>>
>>
>>
>

Re: [DISCUSS] Update Roadmap

Posted by Rick Moritz <ra...@gmail.com>.

Hi,

For one, I know that there is rudimentary scheduling built into Zeppelin
already (at least I fixed a bug in the test for a scheduling feature a few
months ago).
But another point is, that Zeppelin should also focus on quality,
reproduceability and portability.
Although this doesn't offer exciting new features, it would make
development much easier.

Cross-platform testability, Tests that pass when run sequentially,
compatibility with Firefox, and many more open issues that make it so much
harder to enhance Zeppelin and add features should be addressed soon,
preferably before more features are added. Already Zeppelin is suffering -
in my opinion - from quite a lot of feature creep, and we should avoid
putting in the kitchen sink, at the cost of quality and maintainability.
Instead modularity (ZEPPELIN-533 in particular) should be targeted.

Oozie, in my opinion, is a dead end - it may de-facto still be in use on
many clusters, but it's not getting the love it needs, and I wouldn't bet
on it, when it comes to integrating scheduling. Instead, any external tool
should be able to use the REST-API to trigger executions, if you want
external scheduling.

So, in conclusion, if we take Moon's list as a list of descending
priorities, I fully agree, under the condition that code quality is
included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
SPNEGO SSO support is what we really want) with user and group rights
assignment on the notebook level. We probably also need Knox-integration
(ODP-Members looking at integrating Zeppelin should consider contributing
this), and integration of something like Spree (
https://github.com/hammerlab/spree) to be able to profile jobs.

I'm hopeful that soon I can resume contributing some quality-oriented code,
to drive this "necessary evil" forward ;)

On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> I do agree with Vinayak. It need not be coupled with Oozie.
>
> Rather one should be able to call it from any scheduler typically used in
> enterprise level. May be support for BPML.
>
> I believe the existing ability to call/execute a Zeppelin Notebook or a
> specific paragraph within a notebook using REST API should take care of
> this requirement to some extent.
>
> Regards,
> Sourav
>
> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
> vinayakagrawal88@gmail.com> wrote:
>
>> @Eran Witkon,
>> Thanks for the suggestion Eran. I concur with your thought.
>> If Zepplin can be integrated with oozie, that would be wonderful. Users
>> will also be able to leverage their Oozie skills.
>> This would be promising for now.
>> However, in the future Hadoop might not necessarily be installed in Spark
>> Cluster and Oozie (since its installs with Hadoop Distribution) might not
>> be available.
>> So perhaps we should give a thought about this feature for the future.
>> Should it depend on oozie or should Zeppelin have its owns scheduling?
>>
>> As Benjamin has iterated, Databrick notebook has this as a core notebook
>> feature.
>>
>>
>> Also, would anybody give any suggestions regarding "sync with github"
>> feature?
>> -Exporting notebook to Github
>> -Importing notebook from Github
>>
>> Thanks
>> Vinayak
>>
>>
>> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com>
>> wrote:
>>
>>> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
>>> to existing scheduling tools\workflow tools such as
>>> https://oozie.apache.org/. this requires betters hooks and status
>>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>>
>>>
>>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>>> vinayakagrawal88@gmail.com> wrote:
>>>
>>>> Moon,
>>>> The new roadmap looks very promising. I am very happy to see security
>>>> in the list.
>>>> I have some suggestions regarding Enterprise Ready features:
>>>>
>>>> 1. Job Scheduler - Can this be improved?
>>>> Currently the scheduler can be used with Cron expression or a pre-set
>>>> time. But in an enterprise solution, a notebook might be one piece of the
>>>> workflow. Can we look towards the functionality of scheduling notebook's
>>>> based on other notebooks finishing their job successfully?
>>>> This requirement would arise in any ETL workflow, where all the
>>>> downstream users wait for the ETL notebook to finish successfully. Only
>>>> after that, other business oriented notebooks can be executed.
>>>>
>>>> 2. Importing a notebook - Is there a current requirement or future plan
>>>> to implement a feature that allows import-notebook-from-github? This would
>>>> allow users to share notebooks seamlessly.
>>>>
>>>> Thanks
>>>> Vinayak
>>>>
>>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> Zhong Wang,
>>>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>>>
>>>> Hope i can finish the work pr-190
>>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>>
>>>>
>>>>> Sourav,
>>>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>>>> run paragraph/query concurrently.
>>>>>
>>>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>>>> of scala compiler. That's why user can not run multiple paragraph
>>>>> concurrently when they work with SparkInterpreter.
>>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>>>> separate scala compiler so paragraphs run concurrently, while they're in
>>>>> different notebooks.
>>>>> Thanks for the feedback!
>>>>>
>>>>> Best,
>>>>> moon
>>>>>
>>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>>>> wrote:
>>>>>
>>>> Sourav: I think this newly merged PR can help you
>>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>>
>>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>>
>>>>> Hi Moon,
>>>>>>>
>>>>>>> This looks great.
>>>>>>>
>>>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>>
>>>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>>>> that scenario as the time taken to move the status from start to pending
>>>>>>> and pending to running is very high compared to the actual running time of
>>>>>>> a paragraph.
>>>>>>>
>>>>>>> Without this the multi tenancy support would be meaningless as no
>>>>>>> one can practically use it in a situation where multiple users are trying
>>>>>>> to connect to the same instance of Zeppelin (and the related interpreter).
>>>>>>> A possible solution would be to spawn separate instance of the same
>>>>>>> interpreter at every notebook/user level.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Sourav
>>>>>>>
>>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>> Hi Zeppelin users and developers,
>>>>>>>>
>>>>>>>> The roadmap we have published at
>>>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>>>> goes anymore. It's time to update.
>>>>>>>>
>>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>>> storage, and Visualization.
>>>>>>>>
>>>>>>>> And i could list related subjects under each categories.
>>>>>>>>
>>>>>>>
>>>>>>>>    - Enterprise ready
>>>>>>>>       - Authentication
>>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>>       - Authorization
>>>>>>>>          - Notebook authorization PR-681
>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>>       - Security
>>>>>>>>       - Multi-tenancy
>>>>>>>>       - Stability
>>>>>>>>    - Usability Improvement
>>>>>>>>
>>>>>>>>
>>>>>>>>    - UX improvement
>>>>>>>>       - Better Table data support
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>>>>          PR-714
>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>>>>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>>          , PR-89
>>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>>       - Pluggable visualization
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Repository and registry for pluggable components
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Improve documentation
>>>>>>>>       - Improve contents and readability
>>>>>>>>       - more tutorials, examples
>>>>>>>>    - Interpreter
>>>>>>>>       - Generic JDBC Interpreter
>>>>>>>>       - (spark)R Interpreter
>>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>>       )
>>>>>>>>       - more interpreters
>>>>>>>>    - Notebook storage
>>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>>       - more notebook storages
>>>>>>>>    - Visualization
>>>>>>>>
>>>>>>>>
>>>>>>>>    - More visualizations PR-152
>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
>>>>>>>>       PR-728
>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>,
>>>>>>>>       PR-336
>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
>>>>>>>>       PR-321
>>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>>
>>>>>>>>
>>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>>
>>>>>>>> It will help anyone quickly get overall interest of project and the
>>>>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>>>>> release 0.6.0 scope and it's schedule.
>>>>>>>>
>>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> moon
>>>>>>>>
>>>>>>>>
>>>>
>>>>
>>>> --
>>>> Vinayak Agrawal
>>>>
>>>>
>>>> "To Strive, To Seek, To Find and Not to Yield!"
>>>> ~Lord Alfred Tennyson
>>>>
>>>
>>
>>
>> --
>> Vinayak Agrawal
>> Big Data Analytics
>> IBM
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>> ~Lord Alfred Tennyson
>>
>
>

Re: [DISCUSS] Update Roadmap

Posted by Sourav Mazumder <so...@gmail.com>.

I do agree with Vinayak. It need not be coupled with Oozie.

Rather one should be able to call it from any scheduler typically used in
enterprise level. May be support for BPML.

I believe the existing ability to call/execute a Zeppelin Notebook or a
specific paragraph within a notebook using REST API should take care of
this requirement to some extent.

Regards,
Sourav

On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
vinayakagrawal88@gmail.com> wrote:

> @Eran Witkon,
> Thanks for the suggestion Eran. I concur with your thought.
> If Zepplin can be integrated with oozie, that would be wonderful. Users
> will also be able to leverage their Oozie skills.
> This would be promising for now.
> However, in the future Hadoop might not necessarily be installed in Spark
> Cluster and Oozie (since its installs with Hadoop Distribution) might not
> be available.
> So perhaps we should give a thought about this feature for the future.
> Should it depend on oozie or should Zeppelin have its owns scheduling?
>
> As Benjamin has iterated, Databrick notebook has this as a core notebook
> feature.
>
>
> Also, would anybody give any suggestions regarding "sync with github"
> feature?
> -Exporting notebook to Github
> -Importing notebook from Github
>
> Thanks
> Vinayak
>
>
> On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com> wrote:
>
>> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
>> to existing scheduling tools\workflow tools such as
>> https://oozie.apache.org/. this requires betters hooks and status
>> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>>
>>
>> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
>> vinayakagrawal88@gmail.com> wrote:
>>
>>> Moon,
>>> The new roadmap looks very promising. I am very happy to see security in
>>> the list.
>>> I have some suggestions regarding Enterprise Ready features:
>>>
>>> 1. Job Scheduler - Can this be improved?
>>> Currently the scheduler can be used with Cron expression or a pre-set
>>> time. But in an enterprise solution, a notebook might be one piece of the
>>> workflow. Can we look towards the functionality of scheduling notebook's
>>> based on other notebooks finishing their job successfully?
>>> This requirement would arise in any ETL workflow, where all the
>>> downstream users wait for the ETL notebook to finish successfully. Only
>>> after that, other business oriented notebooks can be executed.
>>>
>>> 2. Importing a notebook - Is there a current requirement or future plan
>>> to implement a feature that allows import-notebook-from-github? This would
>>> allow users to share notebooks seamlessly.
>>>
>>> Thanks
>>> Vinayak
>>>
>>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> Zhong Wang,
>>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>>
>>> Hope i can finish the work pr-190
>>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>>
>>>
>>>> Sourav,
>>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>>> run paragraph/query concurrently.
>>>>
>>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>>> of scala compiler. That's why user can not run multiple paragraph
>>>> concurrently when they work with SparkInterpreter.
>>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>>> separate scala compiler so paragraphs run concurrently, while they're in
>>>> different notebooks.
>>>> Thanks for the feedback!
>>>>
>>>> Best,
>>>> moon
>>>>
>>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>>> wrote:
>>>>
>>> Sourav: I think this newly merged PR can help you
>>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>>
>>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>>> sourav.mazumder00@gmail.com> wrote:
>>>>>
>>>> Hi Moon,
>>>>>>
>>>>>> This looks great.
>>>>>>
>>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>>
>>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>>> that scenario as the time taken to move the status from start to pending
>>>>>> and pending to running is very high compared to the actual running time of
>>>>>> a paragraph.
>>>>>>
>>>>>> Without this the multi tenancy support would be meaningless as no one
>>>>>> can practically use it in a situation where multiple users are trying to
>>>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>>>> possible solution would be to spawn separate instance of the same
>>>>>> interpreter at every notebook/user level.
>>>>>>
>>>>>> Regards,
>>>>>> Sourav
>>>>>>
>>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>> Hi Zeppelin users and developers,
>>>>>>>
>>>>>>> The roadmap we have published at
>>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>>> goes anymore. It's time to update.
>>>>>>>
>>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>>> storage, and Visualization.
>>>>>>>
>>>>>>> And i could list related subjects under each categories.
>>>>>>>
>>>>>>
>>>>>>>    - Enterprise ready
>>>>>>>       - Authentication
>>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>>       - Authorization
>>>>>>>          - Notebook authorization PR-681
>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>>       - Security
>>>>>>>       - Multi-tenancy
>>>>>>>       - Stability
>>>>>>>    - Usability Improvement
>>>>>>>
>>>>>>>
>>>>>>>    - UX improvement
>>>>>>>       - Better Table data support
>>>>>>>
>>>>>>>
>>>>>>>    - Download data as csv, etc PR-725
>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>>>          PR-714
>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>>>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>
>>>>>>>          , PR-89
>>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>>
>>>>>>>
>>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>>
>>>>>>>
>>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>>       - Pluggable visualization
>>>>>>>
>>>>>>>
>>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>>
>>>>>>>
>>>>>>>    - Repository and registry for pluggable components
>>>>>>>
>>>>>>>
>>>>>>>    - Improve documentation
>>>>>>>       - Improve contents and readability
>>>>>>>       - more tutorials, examples
>>>>>>>    - Interpreter
>>>>>>>       - Generic JDBC Interpreter
>>>>>>>       - (spark)R Interpreter
>>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>>       )
>>>>>>>       - more interpreters
>>>>>>>    - Notebook storage
>>>>>>>       - Versioning ZEPPELIN-540
>>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>>       - more notebook storages
>>>>>>>    - Visualization
>>>>>>>
>>>>>>>
>>>>>>>    - More visualizations PR-152
>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>,
>>>>>>>       PR-728 <https://github.com/apache/incubator-zeppelin/pull/728>
>>>>>>>       , PR-336
>>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>,
>>>>>>>       PR-321 <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>>
>>>>>>>
>>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>>
>>>>>>> It will help anyone quickly get overall interest of project and the
>>>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>>>> release 0.6.0 scope and it's schedule.
>>>>>>>
>>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> moon
>>>>>>>
>>>>>>>
>>>
>>>
>>> --
>>> Vinayak Agrawal
>>>
>>>
>>> "To Strive, To Seek, To Find and Not to Yield!"
>>> ~Lord Alfred Tennyson
>>>
>>
>
>
> --
> Vinayak Agrawal
> Big Data Analytics
> IBM
>
> "To Strive, To Seek, To Find and Not to Yield!"
> ~Lord Alfred Tennyson
>

Re: [DISCUSS] Update Roadmap

Posted by Vinayak Agrawal <vi...@gmail.com>.

@Eran Witkon,
Thanks for the suggestion Eran. I concur with your thought.
If Zepplin can be integrated with oozie, that would be wonderful. Users
will also be able to leverage their Oozie skills.
This would be promising for now.
However, in the future Hadoop might not necessarily be installed in Spark
Cluster and Oozie (since its installs with Hadoop Distribution) might not
be available.
So perhaps we should give a thought about this feature for the future.
Should it depend on oozie or should Zeppelin have its owns scheduling?

As Benjamin has iterated, Databrick notebook has this as a core notebook
feature.


Also, would anybody give any suggestions regarding "sync with github"
feature?
-Exporting notebook to Github
-Importing notebook from Github

Thanks
Vinayak


On Mon, Feb 29, 2016 at 4:17 AM, Eran Witkon <er...@gmail.com> wrote:

> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin
> to existing scheduling tools\workflow tools such as
> https://oozie.apache.org/. this requires betters hooks and status
> reporting but doesn't make zeppeling and ETL\scheduler tool by itself/
>
>
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <
> vinayakagrawal88@gmail.com> wrote:
>
>> Moon,
>> The new roadmap looks very promising. I am very happy to see security in
>> the list.
>> I have some suggestions regarding Enterprise Ready features:
>>
>> 1. Job Scheduler - Can this be improved?
>> Currently the scheduler can be used with Cron expression or a pre-set
>> time. But in an enterprise solution, a notebook might be one piece of the
>> workflow. Can we look towards the functionality of scheduling notebook's
>> based on other notebooks finishing their job successfully?
>> This requirement would arise in any ETL workflow, where all the
>> downstream users wait for the ETL notebook to finish successfully. Only
>> after that, other business oriented notebooks can be executed.
>>
>> 2. Importing a notebook - Is there a current requirement or future plan
>> to implement a feature that allows import-notebook-from-github? This would
>> allow users to share notebooks seamlessly.
>>
>> Thanks
>> Vinayak
>>
>> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Zhong Wang,
>>> Right, Folder support would be quite useful. Thanks for the opinion.
>>>
>> Hope i can finish the work pr-190
>>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>>
>>
>>> Sourav,
>>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>>> run paragraph/query concurrently.
>>>
>>> SparkInterpreter is implemented with FIFO scheduler considering nature
>>> of scala compiler. That's why user can not run multiple paragraph
>>> concurrently when they work with SparkInterpreter.
>>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>>> separate scala compiler so paragraphs run concurrently, while they're in
>>> different notebooks.
>>> Thanks for the feedback!
>>>
>>> Best,
>>> moon
>>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>>> wrote:
>>>
>> Sourav: I think this newly merged PR can help you
>>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>>
>>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>>> sourav.mazumder00@gmail.com> wrote:
>>>>
>>> Hi Moon,
>>>>>
>>>>> This looks great.
>>>>>
>>>>> My only suggestion would be to include a PR/feature - Support for
>>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>>
>>>>> Right now if more than one user tries to run paragraphs in multiple
>>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>>> queue gets built up within the zeppelin process and interpreter process in
>>>>> that scenario as the time taken to move the status from start to pending
>>>>> and pending to running is very high compared to the actual running time of
>>>>> a paragraph.
>>>>>
>>>>> Without this the multi tenancy support would be meaningless as no one
>>>>> can practically use it in a situation where multiple users are trying to
>>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>>> possible solution would be to spawn separate instance of the same
>>>>> interpreter at every notebook/user level.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>> Hi Zeppelin users and developers,
>>>>>>
>>>>>> The roadmap we have published at
>>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>>> is almost 9 month old, and it doesn't reflect where the community
>>>>>> goes anymore. It's time to update.
>>>>>>
>>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>>> storage, and Visualization.
>>>>>>
>>>>>> And i could list related subjects under each categories.
>>>>>>
>>>>>
>>>>>>    - Enterprise ready
>>>>>>       - Authentication
>>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>>       - Authorization
>>>>>>          - Notebook authorization PR-681
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>>       - Security
>>>>>>       - Multi-tenancy
>>>>>>       - Stability
>>>>>>    - Usability Improvement
>>>>>>
>>>>>>
>>>>>>    - UX improvement
>>>>>>       - Better Table data support
>>>>>>
>>>>>>
>>>>>>    - Download data as csv, etc PR-725
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>>          PR-714
>>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>,
>>>>>>          PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>>
>>>>>>
>>>>>>    - Featureful table data display (pagenation, etc)
>>>>>>
>>>>>>
>>>>>>    - Pluggability ZEPPELIN-533
>>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>>       - Pluggable visualization
>>>>>>
>>>>>>
>>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>>
>>>>>>
>>>>>>    - Repository and registry for pluggable components
>>>>>>
>>>>>>
>>>>>>    - Improve documentation
>>>>>>       - Improve contents and readability
>>>>>>       - more tutorials, examples
>>>>>>    - Interpreter
>>>>>>       - Generic JDBC Interpreter
>>>>>>       - (spark)R Interpreter
>>>>>>       - Cluster manager for interpreter (Proposal
>>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>>       )
>>>>>>       - more interpreters
>>>>>>    - Notebook storage
>>>>>>       - Versioning ZEPPELIN-540
>>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>>       - more notebook storages
>>>>>>    - Visualization
>>>>>>
>>>>>>
>>>>>>    - More visualizations PR-152
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>>
>>>>>>
>>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>>
>>>>>> It will help anyone quickly get overall interest of project and the
>>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>>> release 0.6.0 scope and it's schedule.
>>>>>>
>>>>>> What do you think? Any feedback would be appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>>
>>
>>
>> --
>> Vinayak Agrawal
>>
>>
>> "To Strive, To Seek, To Find and Not to Yield!"
>> ~Lord Alfred Tennyson
>>
>


-- 
Vinayak Agrawal
Big Data Analytics
IBM

"To Strive, To Seek, To Find and Not to Yield!"
~Lord Alfred Tennyson

Re: [DISCUSS] Update Roadmap

Posted by Eran Witkon <er...@gmail.com>.

@Vinayak Agrawal I would suggest adding the ability to connect zeppelin to
existing scheduling tools\workflow tools such as  https://oozie.apache.org/.
this requires betters hooks and status reporting but doesn't make zeppeling
and ETL\scheduler tool by itself/


On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vi...@gmail.com>
wrote:

> Moon,
> The new roadmap looks very promising. I am very happy to see security in
> the list.
> I have some suggestions regarding Enterprise Ready features:
>
> 1. Job Scheduler - Can this be improved?
> Currently the scheduler can be used with Cron expression or a pre-set
> time. But in an enterprise solution, a notebook might be one piece of the
> workflow. Can we look towards the functionality of scheduling notebook's
> based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream
> users wait for the ETL notebook to finish successfully. Only after that,
> other business oriented notebooks can be executed.
>
> 2. Importing a notebook - Is there a current requirement or future plan to
> implement a feature that allows import-notebook-from-github? This would
> allow users to share notebooks seamlessly.
>
> Thanks
> Vinayak
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Zhong Wang,
>> Right, Folder support would be quite useful. Thanks for the opinion.
>>
> Hope i can finish the work pr-190
>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>
>
>> Sourav,
>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>> run paragraph/query concurrently.
>>
>> SparkInterpreter is implemented with FIFO scheduler considering nature of
>> scala compiler. That's why user can not run multiple paragraph concurrently
>> when they work with SparkInterpreter.
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> separate scala compiler so paragraphs run concurrently, while they're in
>> different notebooks.
>> Thanks for the feedback!
>>
>> Best,
>> moon
>>
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>> wrote:
>>
> Sourav: I think this newly merged PR can help you
>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>
>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>> Hi Moon,
>>>>
>>>> This looks great.
>>>>
>>>> My only suggestion would be to include a PR/feature - Support for
>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>
>>>> Right now if more than one user tries to run paragraphs in multiple
>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>> queue gets built up within the zeppelin process and interpreter process in
>>>> that scenario as the time taken to move the status from start to pending
>>>> and pending to running is very high compared to the actual running time of
>>>> a paragraph.
>>>>
>>>> Without this the multi tenancy support would be meaningless as no one
>>>> can practically use it in a situation where multiple users are trying to
>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>> possible solution would be to spawn separate instance of the same
>>>> interpreter at every notebook/user level.
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>> Hi Zeppelin users and developers,
>>>>>
>>>>> The roadmap we have published at
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>>> anymore. It's time to update.
>>>>>
>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>> storage, and Visualization.
>>>>>
>>>>> And i could list related subjects under each categories.
>>>>>
>>>>
>>>>>    - Enterprise ready
>>>>>       - Authentication
>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>       - Authorization
>>>>>          - Notebook authorization PR-681
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>       - Security
>>>>>       - Multi-tenancy
>>>>>       - Stability
>>>>>    - Usability Improvement
>>>>>
>>>>>
>>>>>    - UX improvement
>>>>>       - Better Table data support
>>>>>
>>>>>
>>>>>    - Download data as csv, etc PR-725
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>          PR-714
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>
>>>>>
>>>>>    - Featureful table data display (pagenation, etc)
>>>>>
>>>>>
>>>>>    - Pluggability ZEPPELIN-533
>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>       - Pluggable visualization
>>>>>
>>>>>
>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>
>>>>>
>>>>>    - Repository and registry for pluggable components
>>>>>
>>>>>
>>>>>    - Improve documentation
>>>>>       - Improve contents and readability
>>>>>       - more tutorials, examples
>>>>>    - Interpreter
>>>>>       - Generic JDBC Interpreter
>>>>>       - (spark)R Interpreter
>>>>>       - Cluster manager for interpreter (Proposal
>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>       )
>>>>>       - more interpreters
>>>>>    - Notebook storage
>>>>>       - Versioning ZEPPELIN-540
>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>       - more notebook storages
>>>>>    - Visualization
>>>>>
>>>>>
>>>>>    - More visualizations PR-152
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>
>>>>>
>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>
>>>>> It will help anyone quickly get overall interest of project and the
>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>> release 0.6.0 scope and it's schedule.
>>>>>
>>>>> What do you think? Any feedback would be appreciated.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>>
>
>
> --
> Vinayak Agrawal
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
> ~Lord Alfred Tennyson
>

Re: [DISCUSS] Update Roadmap

Posted by Eran Witkon <er...@gmail.com>.

@Vinayak Agrawal I would suggest adding the ability to connect zeppelin to
existing scheduling tools\workflow tools such as  https://oozie.apache.org/.
this requires betters hooks and status reporting but doesn't make zeppeling
and ETL\scheduler tool by itself/


On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal <vi...@gmail.com>
wrote:

> Moon,
> The new roadmap looks very promising. I am very happy to see security in
> the list.
> I have some suggestions regarding Enterprise Ready features:
>
> 1. Job Scheduler - Can this be improved?
> Currently the scheduler can be used with Cron expression or a pre-set
> time. But in an enterprise solution, a notebook might be one piece of the
> workflow. Can we look towards the functionality of scheduling notebook's
> based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream
> users wait for the ETL notebook to finish successfully. Only after that,
> other business oriented notebooks can be executed.
>
> 2. Importing a notebook - Is there a current requirement or future plan to
> implement a feature that allows import-notebook-from-github? This would
> allow users to share notebooks seamlessly.
>
> Thanks
> Vinayak
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Zhong Wang,
>> Right, Folder support would be quite useful. Thanks for the opinion.
>>
> Hope i can finish the work pr-190
>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>
>
>> Sourav,
>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>> run paragraph/query concurrently.
>>
>> SparkInterpreter is implemented with FIFO scheduler considering nature of
>> scala compiler. That's why user can not run multiple paragraph concurrently
>> when they work with SparkInterpreter.
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> separate scala compiler so paragraphs run concurrently, while they're in
>> different notebooks.
>> Thanks for the feedback!
>>
>> Best,
>> moon
>>
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>> wrote:
>>
> Sourav: I think this newly merged PR can help you
>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>
>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>> Hi Moon,
>>>>
>>>> This looks great.
>>>>
>>>> My only suggestion would be to include a PR/feature - Support for
>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>
>>>> Right now if more than one user tries to run paragraphs in multiple
>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>> queue gets built up within the zeppelin process and interpreter process in
>>>> that scenario as the time taken to move the status from start to pending
>>>> and pending to running is very high compared to the actual running time of
>>>> a paragraph.
>>>>
>>>> Without this the multi tenancy support would be meaningless as no one
>>>> can practically use it in a situation where multiple users are trying to
>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>> possible solution would be to spawn separate instance of the same
>>>> interpreter at every notebook/user level.
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>> Hi Zeppelin users and developers,
>>>>>
>>>>> The roadmap we have published at
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>>> anymore. It's time to update.
>>>>>
>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>> storage, and Visualization.
>>>>>
>>>>> And i could list related subjects under each categories.
>>>>>
>>>>
>>>>>    - Enterprise ready
>>>>>       - Authentication
>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>       - Authorization
>>>>>          - Notebook authorization PR-681
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>       - Security
>>>>>       - Multi-tenancy
>>>>>       - Stability
>>>>>    - Usability Improvement
>>>>>
>>>>>
>>>>>    - UX improvement
>>>>>       - Better Table data support
>>>>>
>>>>>
>>>>>    - Download data as csv, etc PR-725
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>          PR-714
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>
>>>>>
>>>>>    - Featureful table data display (pagenation, etc)
>>>>>
>>>>>
>>>>>    - Pluggability ZEPPELIN-533
>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>       - Pluggable visualization
>>>>>
>>>>>
>>>>>    - Dynamic Interpreter, notebook, visualization loading
>>>>>
>>>>>
>>>>>    - Repository and registry for pluggable components
>>>>>
>>>>>
>>>>>    - Improve documentation
>>>>>       - Improve contents and readability
>>>>>       - more tutorials, examples
>>>>>    - Interpreter
>>>>>       - Generic JDBC Interpreter
>>>>>       - (spark)R Interpreter
>>>>>       - Cluster manager for interpreter (Proposal
>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>       )
>>>>>       - more interpreters
>>>>>    - Notebook storage
>>>>>       - Versioning ZEPPELIN-540
>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>       - more notebook storages
>>>>>    - Visualization
>>>>>
>>>>>
>>>>>    - More visualizations PR-152
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>
>>>>>
>>>>>    - Customize graph (show/hide label, color, etc)
>>>>>
>>>>> It will help anyone quickly get overall interest of project and the
>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>> release 0.6.0 scope and it's schedule.
>>>>>
>>>>> What do you think? Any feedback would be appreciated.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>>
>
>
> --
> Vinayak Agrawal
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
> ~Lord Alfred Tennyson
>

Re: [DISCUSS] Update Roadmap

Posted by Shabeel Syed <sh...@gmail.com>.

Hi Moon,

       Some of my requirements.

   1. Can we achieve better memory management for notebooks ? I'm also
   facing some similar OOM issue, like Dafeng mentioned in other
   discussion.I'm using the iframe view of a paragraph, can we load that
   code+results to memory only when requested ? I think this is one area to be
   focused on.
   2. In table/graph view can we include below features along with
   pagination ?

                a) Search , similar to
https://docs.angularjs.org/api/ng/filter/filter
                b) Sorting of columns. Also custom sorting algorithms ?

    Also any idea on GA for these suggested improvements ?


Regards
Shabeel

On Mon, Feb 29, 2016 at 1:51 PM, Vinayak Agrawal <vinayakagrawal88@gmail.com
> wrote:

> Moon,
> The new roadmap looks very promising. I am very happy to see security in
> the list.
> I have some suggestions regarding Enterprise Ready features:
>
> 1. Job Scheduler - Can this be improved?
> Currently the scheduler can be used with Cron expression or a pre-set
> time. But in an enterprise solution, a notebook might be one piece of the
> workflow. Can we look towards the functionality of scheduling notebook's
> based on other notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream
> users wait for the ETL notebook to finish successfully. Only after that,
> other business oriented notebooks can be executed.
>
> 2. Importing a notebook - Is there a current requirement or future plan to
> implement a feature that allows import-notebook-from-github? This would
> allow users to share notebooks seamlessly.
>
> Thanks
> Vinayak
>
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Zhong Wang,
>> Right, Folder support would be quite useful. Thanks for the opinion.
>> Hope i can finish the work pr-190
>> <https://github.com/apache/incubator-zeppelin/pull/190>.
>>
>> Sourav,
>> Regarding concurrent running, Zeppelin doesn't have limitation of run
>> paragraph/query concurrently. Interpreter can implement it's own scheduling
>> policy. For example, SparkSQL interpreter and ShellInterpreter can already
>> run paragraph/query concurrently.
>>
>> SparkInterpreter is implemented with FIFO scheduler considering nature of
>> scala compiler. That's why user can not run multiple paragraph concurrently
>> when they work with SparkInterpreter.
>> But as Zhong Wang mentioned, pr-703 enables each notebook will have
>> separate scala compiler so paragraphs run concurrently, while they're in
>> different notebooks.
>> Thanks for the feedback!
>>
>> Best,
>> moon
>>
>> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
>> wrote:
>>
>>> Sourav: I think this newly merged PR can help you
>>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>>
>>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>>> sourav.mazumder00@gmail.com> wrote:
>>>
>>>> Hi Moon,
>>>>
>>>> This looks great.
>>>>
>>>> My only suggestion would be to include a PR/feature - Support for
>>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>>
>>>> Right now if more than one user tries to run paragraphs in multiple
>>>> notebooks concurrently through a single Zeppelin instance (and single
>>>> interpreter instance) the performance is very slow. It is obvious that the
>>>> queue gets built up within the zeppelin process and interpreter process in
>>>> that scenario as the time taken to move the status from start to pending
>>>> and pending to running is very high compared to the actual running time of
>>>> a paragraph.
>>>>
>>>> Without this the multi tenancy support would be meaningless as no one
>>>> can practically use it in a situation where multiple users are trying to
>>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>>> possible solution would be to spawn separate instance of the same
>>>> interpreter at every notebook/user level.
>>>>
>>>> Regards,
>>>> Sourav
>>>>
>>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> Hi Zeppelin users and developers,
>>>>>
>>>>> The roadmap we have published at
>>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>>> anymore. It's time to update.
>>>>>
>>>>> Based on mailing list, jira issues, pullrequests, feedbacks from
>>>>> users, conferences and meetings, I could summarize the major interest of
>>>>> users and developers in 7 categories. Enterprise ready, Usability
>>>>> improvement, Pluggability, Documentation, Backend integration, Notebook
>>>>> storage, and Visualization.
>>>>>
>>>>> And i could list related subjects under each categories.
>>>>>
>>>>>    - Enterprise ready
>>>>>       - Authentication
>>>>>          - Shiro authentication ZEPPELIN-548
>>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>>       - Authorization
>>>>>          - Notebook authorization PR-681
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>>       - Security
>>>>>       - Multi-tenancy
>>>>>       - Stability
>>>>>    - Usability Improvement
>>>>>       - UX improvement
>>>>>       - Better Table data support
>>>>>          - Download data as csv, etc PR-725
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>>          PR-714
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>>          - Featureful table data display (pagenation, etc)
>>>>>       - Pluggability ZEPPELIN-533
>>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>>       - Pluggable visualization
>>>>>       - Dynamic Interpreter, notebook, visualization loading
>>>>>       - Repository and registry for pluggable components
>>>>>    - Improve documentation
>>>>>       - Improve contents and readability
>>>>>       - more tutorials, examples
>>>>>    - Interpreter
>>>>>       - Generic JDBC Interpreter
>>>>>       - (spark)R Interpreter
>>>>>       - Cluster manager for interpreter (Proposal
>>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>>       )
>>>>>       - more interpreters
>>>>>    - Notebook storage
>>>>>       - Versioning ZEPPELIN-540
>>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>>       - more notebook storages
>>>>>    - Visualization
>>>>>       - More visualizations PR-152
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>>       - Customize graph (show/hide label, color, etc)
>>>>>
>>>>>
>>>>> It will help anyone quickly get overall interest of project and the
>>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>>> release 0.6.0 scope and it's schedule.
>>>>>
>>>>> What do you think? Any feedback would be appreciated.
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>>
>>>>
>
>
> --
> Vinayak Agrawal
>
>
> "To Strive, To Seek, To Find and Not to Yield!"
> ~Lord Alfred Tennyson
>

Re: [DISCUSS] Update Roadmap

Posted by Vinayak Agrawal <vi...@gmail.com>.

Moon,
The new roadmap looks very promising. I am very happy to see security in
the list.
I have some suggestions regarding Enterprise Ready features:

1. Job Scheduler - Can this be improved?
Currently the scheduler can be used with Cron expression or a pre-set time.
But in an enterprise solution, a notebook might be one piece of the
workflow. Can we look towards the functionality of scheduling notebook's
based on other notebooks finishing their job successfully?
This requirement would arise in any ETL workflow, where all the downstream
users wait for the ETL notebook to finish successfully. Only after that,
other business oriented notebooks can be executed.

2. Importing a notebook - Is there a current requirement or future plan to
implement a feature that allows import-notebook-from-github? This would
allow users to share notebooks seamlessly.

Thanks
Vinayak

On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:

> Zhong Wang,
> Right, Folder support would be quite useful. Thanks for the opinion.
> Hope i can finish the work pr-190
> <https://github.com/apache/incubator-zeppelin/pull/190>.
>
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run
> paragraph/query concurrently. Interpreter can implement it's own scheduling
> policy. For example, SparkSQL interpreter and ShellInterpreter can already
> run paragraph/query concurrently.
>
> SparkInterpreter is implemented with FIFO scheduler considering nature of
> scala compiler. That's why user can not run multiple paragraph concurrently
> when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> separate scala compiler so paragraphs run concurrently, while they're in
> different notebooks.
> Thanks for the feedback!
>
> Best,
> moon
>
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
> wrote:
>
>> Sourav: I think this newly merged PR can help you
>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>
>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>> Hi Moon,
>>>
>>> This looks great.
>>>
>>> My only suggestion would be to include a PR/feature - Support for
>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>
>>> Right now if more than one user tries to run paragraphs in multiple
>>> notebooks concurrently through a single Zeppelin instance (and single
>>> interpreter instance) the performance is very slow. It is obvious that the
>>> queue gets built up within the zeppelin process and interpreter process in
>>> that scenario as the time taken to move the status from start to pending
>>> and pending to running is very high compared to the actual running time of
>>> a paragraph.
>>>
>>> Without this the multi tenancy support would be meaningless as no one
>>> can practically use it in a situation where multiple users are trying to
>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>> possible solution would be to spawn separate instance of the same
>>> interpreter at every notebook/user level.
>>>
>>> Regards,
>>> Sourav
>>>
>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> Hi Zeppelin users and developers,
>>>>
>>>> The roadmap we have published at
>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>> anymore. It's time to update.
>>>>
>>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>>> conferences and meetings, I could summarize the major interest of users and
>>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>>> Visualization.
>>>>
>>>> And i could list related subjects under each categories.
>>>>
>>>>    - Enterprise ready
>>>>       - Authentication
>>>>          - Shiro authentication ZEPPELIN-548
>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>       - Authorization
>>>>          - Notebook authorization PR-681
>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>       - Security
>>>>       - Multi-tenancy
>>>>       - Stability
>>>>    - Usability Improvement
>>>>       - UX improvement
>>>>       - Better Table data support
>>>>          - Download data as csv, etc PR-725
>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>          PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>          , PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>          - Featureful table data display (pagenation, etc)
>>>>       - Pluggability ZEPPELIN-533
>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>       - Pluggable visualization
>>>>       - Dynamic Interpreter, notebook, visualization loading
>>>>       - Repository and registry for pluggable components
>>>>    - Improve documentation
>>>>       - Improve contents and readability
>>>>       - more tutorials, examples
>>>>    - Interpreter
>>>>       - Generic JDBC Interpreter
>>>>       - (spark)R Interpreter
>>>>       - Cluster manager for interpreter (Proposal
>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>       )
>>>>       - more interpreters
>>>>    - Notebook storage
>>>>       - Versioning ZEPPELIN-540
>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>       - more notebook storages
>>>>    - Visualization
>>>>       - More visualizations PR-152
>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>       - Customize graph (show/hide label, color, etc)
>>>>
>>>>
>>>> It will help anyone quickly get overall interest of project and the
>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>> release 0.6.0 scope and it's schedule.
>>>>
>>>> What do you think? Any feedback would be appreciated.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>>
>>>


-- 
Vinayak Agrawal


"To Strive, To Seek, To Find and Not to Yield!"
~Lord Alfred Tennyson

Re: [DISCUSS] Update Roadmap

Posted by Vinayak Agrawal <vi...@gmail.com>.

Moon,
The new roadmap looks very promising. I am very happy to see security in
the list.
I have some suggestions regarding Enterprise Ready features:

1. Job Scheduler - Can this be improved?
Currently the scheduler can be used with Cron expression or a pre-set time.
But in an enterprise solution, a notebook might be one piece of the
workflow. Can we look towards the functionality of scheduling notebook's
based on other notebooks finishing their job successfully?
This requirement would arise in any ETL workflow, where all the downstream
users wait for the ETL notebook to finish successfully. Only after that,
other business oriented notebooks can be executed.

2. Importing a notebook - Is there a current requirement or future plan to
implement a feature that allows import-notebook-from-github? This would
allow users to share notebooks seamlessly.

Thanks
Vinayak

On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee <mo...@apache.org> wrote:

> Zhong Wang,
> Right, Folder support would be quite useful. Thanks for the opinion.
> Hope i can finish the work pr-190
> <https://github.com/apache/incubator-zeppelin/pull/190>.
>
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run
> paragraph/query concurrently. Interpreter can implement it's own scheduling
> policy. For example, SparkSQL interpreter and ShellInterpreter can already
> run paragraph/query concurrently.
>
> SparkInterpreter is implemented with FIFO scheduler considering nature of
> scala compiler. That's why user can not run multiple paragraph concurrently
> when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have
> separate scala compiler so paragraphs run concurrently, while they're in
> different notebooks.
> Thanks for the feedback!
>
> Best,
> moon
>
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com>
> wrote:
>
>> Sourav: I think this newly merged PR can help you
>> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>>
>> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
>> sourav.mazumder00@gmail.com> wrote:
>>
>>> Hi Moon,
>>>
>>> This looks great.
>>>
>>> My only suggestion would be to include a PR/feature - Support for
>>> Running Concurrent paragraphs/queries in Zeppelin.
>>>
>>> Right now if more than one user tries to run paragraphs in multiple
>>> notebooks concurrently through a single Zeppelin instance (and single
>>> interpreter instance) the performance is very slow. It is obvious that the
>>> queue gets built up within the zeppelin process and interpreter process in
>>> that scenario as the time taken to move the status from start to pending
>>> and pending to running is very high compared to the actual running time of
>>> a paragraph.
>>>
>>> Without this the multi tenancy support would be meaningless as no one
>>> can practically use it in a situation where multiple users are trying to
>>> connect to the same instance of Zeppelin (and the related interpreter). A
>>> possible solution would be to spawn separate instance of the same
>>> interpreter at every notebook/user level.
>>>
>>> Regards,
>>> Sourav
>>>
>>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> Hi Zeppelin users and developers,
>>>>
>>>> The roadmap we have published at
>>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>>> is almost 9 month old, and it doesn't reflect where the community goes
>>>> anymore. It's time to update.
>>>>
>>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>>> conferences and meetings, I could summarize the major interest of users and
>>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>>> Visualization.
>>>>
>>>> And i could list related subjects under each categories.
>>>>
>>>>    - Enterprise ready
>>>>       - Authentication
>>>>          - Shiro authentication ZEPPELIN-548
>>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>>       - Authorization
>>>>          - Notebook authorization PR-681
>>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>>       - Security
>>>>       - Multi-tenancy
>>>>       - Stability
>>>>    - Usability Improvement
>>>>       - UX improvement
>>>>       - Better Table data support
>>>>          - Download data as csv, etc PR-725
>>>>          <https://github.com/apache/incubator-zeppelin/pull/725>,
>>>>          PR-714 <https://github.com/apache/incubator-zeppelin/pull/714>
>>>>          , PR-6 <https://github.com/apache/incubator-zeppelin/pull/6>,
>>>>          PR-89 <https://github.com/apache/incubator-zeppelin/pull/89>
>>>>          - Featureful table data display (pagenation, etc)
>>>>       - Pluggability ZEPPELIN-533
>>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>>       - Pluggable visualization
>>>>       - Dynamic Interpreter, notebook, visualization loading
>>>>       - Repository and registry for pluggable components
>>>>    - Improve documentation
>>>>       - Improve contents and readability
>>>>       - more tutorials, examples
>>>>    - Interpreter
>>>>       - Generic JDBC Interpreter
>>>>       - (spark)R Interpreter
>>>>       - Cluster manager for interpreter (Proposal
>>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>>       )
>>>>       - more interpreters
>>>>    - Notebook storage
>>>>       - Versioning ZEPPELIN-540
>>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>>       - more notebook storages
>>>>    - Visualization
>>>>       - More visualizations PR-152
>>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>>       - Customize graph (show/hide label, color, etc)
>>>>
>>>>
>>>> It will help anyone quickly get overall interest of project and the
>>>> direction. And based on this roadmap, we can discuss and re-define the next
>>>> release 0.6.0 scope and it's schedule.
>>>>
>>>> What do you think? Any feedback would be appreciated.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>>
>>>


-- 
Vinayak Agrawal


"To Strive, To Seek, To Find and Not to Yield!"
~Lord Alfred Tennyson

Re: [DISCUSS] Update Roadmap

Posted by moon soo Lee <mo...@apache.org>.

Zhong Wang,
Right, Folder support would be quite useful. Thanks for the opinion.
Hope i can finish the work pr-190
<https://github.com/apache/incubator-zeppelin/pull/190>.

Sourav,
Regarding concurrent running, Zeppelin doesn't have limitation of run
paragraph/query concurrently. Interpreter can implement it's own scheduling
policy. For example, SparkSQL interpreter and ShellInterpreter can already
run paragraph/query concurrently.

SparkInterpreter is implemented with FIFO scheduler considering nature of
scala compiler. That's why user can not run multiple paragraph concurrently
when they work with SparkInterpreter.
But as Zhong Wang mentioned, pr-703 enables each notebook will have
separate scala compiler so paragraphs run concurrently, while they're in
different notebooks.
Thanks for the feedback!

Best,
moon

On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com> wrote:

> Sourav: I think this newly merged PR can help you
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
>> Hi Moon,
>>
>> This looks great.
>>
>> My only suggestion would be to include a PR/feature - Support for Running
>> Concurrent paragraphs/queries in Zeppelin.
>>
>> Right now if more than one user tries to run paragraphs in multiple
>> notebooks concurrently through a single Zeppelin instance (and single
>> interpreter instance) the performance is very slow. It is obvious that the
>> queue gets built up within the zeppelin process and interpreter process in
>> that scenario as the time taken to move the status from start to pending
>> and pending to running is very high compared to the actual running time of
>> a paragraph.
>>
>> Without this the multi tenancy support would be meaningless as no one can
>> practically use it in a situation where multiple users are trying to
>> connect to the same instance of Zeppelin (and the related interpreter). A
>> possible solution would be to spawn separate instance of the same
>> interpreter at every notebook/user level.
>>
>> Regards,
>> Sourav
>>
>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Hi Zeppelin users and developers,
>>>
>>> The roadmap we have published at
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>> is almost 9 month old, and it doesn't reflect where the community goes
>>> anymore. It's time to update.
>>>
>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>> conferences and meetings, I could summarize the major interest of users and
>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>> Visualization.
>>>
>>> And i could list related subjects under each categories.
>>>
>>>    - Enterprise ready
>>>       - Authentication
>>>          - Shiro authentication ZEPPELIN-548
>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>       - Authorization
>>>          - Notebook authorization PR-681
>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>       - Security
>>>       - Multi-tenancy
>>>       - Stability
>>>    - Usability Improvement
>>>       - UX improvement
>>>       - Better Table data support
>>>          - Download data as csv, etc PR-725
>>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>          - Featureful table data display (pagenation, etc)
>>>       - Pluggability ZEPPELIN-533
>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>       - Pluggable visualization
>>>       - Dynamic Interpreter, notebook, visualization loading
>>>       - Repository and registry for pluggable components
>>>    - Improve documentation
>>>       - Improve contents and readability
>>>       - more tutorials, examples
>>>    - Interpreter
>>>       - Generic JDBC Interpreter
>>>       - (spark)R Interpreter
>>>       - Cluster manager for interpreter (Proposal
>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>       )
>>>       - more interpreters
>>>    - Notebook storage
>>>       - Versioning ZEPPELIN-540
>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>       - more notebook storages
>>>    - Visualization
>>>       - More visualizations PR-152
>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>       - Customize graph (show/hide label, color, etc)
>>>
>>>
>>> It will help anyone quickly get overall interest of project and the
>>> direction. And based on this roadmap, we can discuss and re-define the next
>>> release 0.6.0 scope and it's schedule.
>>>
>>> What do you think? Any feedback would be appreciated.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>

Re: [DISCUSS] Update Roadmap

Posted by moon soo Lee <mo...@apache.org>.

Zhong Wang,
Right, Folder support would be quite useful. Thanks for the opinion.
Hope i can finish the work pr-190
<https://github.com/apache/incubator-zeppelin/pull/190>.

Sourav,
Regarding concurrent running, Zeppelin doesn't have limitation of run
paragraph/query concurrently. Interpreter can implement it's own scheduling
policy. For example, SparkSQL interpreter and ShellInterpreter can already
run paragraph/query concurrently.

SparkInterpreter is implemented with FIFO scheduler considering nature of
scala compiler. That's why user can not run multiple paragraph concurrently
when they work with SparkInterpreter.
But as Zhong Wang mentioned, pr-703 enables each notebook will have
separate scala compiler so paragraphs run concurrently, while they're in
different notebooks.
Thanks for the feedback!

Best,
moon

On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang <wa...@gmail.com> wrote:

> Sourav: I think this newly merged PR can help you
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537
>
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
>> Hi Moon,
>>
>> This looks great.
>>
>> My only suggestion would be to include a PR/feature - Support for Running
>> Concurrent paragraphs/queries in Zeppelin.
>>
>> Right now if more than one user tries to run paragraphs in multiple
>> notebooks concurrently through a single Zeppelin instance (and single
>> interpreter instance) the performance is very slow. It is obvious that the
>> queue gets built up within the zeppelin process and interpreter process in
>> that scenario as the time taken to move the status from start to pending
>> and pending to running is very high compared to the actual running time of
>> a paragraph.
>>
>> Without this the multi tenancy support would be meaningless as no one can
>> practically use it in a situation where multiple users are trying to
>> connect to the same instance of Zeppelin (and the related interpreter). A
>> possible solution would be to spawn separate instance of the same
>> interpreter at every notebook/user level.
>>
>> Regards,
>> Sourav
>>
>> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> Hi Zeppelin users and developers,
>>>
>>> The roadmap we have published at
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>> is almost 9 month old, and it doesn't reflect where the community goes
>>> anymore. It's time to update.
>>>
>>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>>> conferences and meetings, I could summarize the major interest of users and
>>> developers in 7 categories. Enterprise ready, Usability improvement,
>>> Pluggability, Documentation, Backend integration, Notebook storage, and
>>> Visualization.
>>>
>>> And i could list related subjects under each categories.
>>>
>>>    - Enterprise ready
>>>       - Authentication
>>>          - Shiro authentication ZEPPELIN-548
>>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>>       - Authorization
>>>          - Notebook authorization PR-681
>>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>>       - Security
>>>       - Multi-tenancy
>>>       - Stability
>>>    - Usability Improvement
>>>       - UX improvement
>>>       - Better Table data support
>>>          - Download data as csv, etc PR-725
>>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>>          - Featureful table data display (pagenation, etc)
>>>       - Pluggability ZEPPELIN-533
>>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>>       - Pluggable visualization
>>>       - Dynamic Interpreter, notebook, visualization loading
>>>       - Repository and registry for pluggable components
>>>    - Improve documentation
>>>       - Improve contents and readability
>>>       - more tutorials, examples
>>>    - Interpreter
>>>       - Generic JDBC Interpreter
>>>       - (spark)R Interpreter
>>>       - Cluster manager for interpreter (Proposal
>>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>>       )
>>>       - more interpreters
>>>    - Notebook storage
>>>       - Versioning ZEPPELIN-540
>>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>>       - more notebook storages
>>>    - Visualization
>>>       - More visualizations PR-152
>>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>>       - Customize graph (show/hide label, color, etc)
>>>
>>>
>>> It will help anyone quickly get overall interest of project and the
>>> direction. And based on this roadmap, we can discuss and re-define the next
>>> release 0.6.0 scope and it's schedule.
>>>
>>> What do you think? Any feedback would be appreciated.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>

Re: [DISCUSS] Update Roadmap

Posted by Zhong Wang <wa...@gmail.com>.

Sourav: I think this newly merged PR can help you
https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537

On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running
> Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple
> notebooks concurrently through a single Zeppelin instance (and single
> interpreter instance) the performance is very slow. It is obvious that the
> queue gets built up within the zeppelin process and interpreter process in
> that scenario as the time taken to move the status from start to pending
> and pending to running is very high compared to the actual running time of
> a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can
> practically use it in a situation where multiple users are trying to
> connect to the same instance of Zeppelin (and the related interpreter). A
> possible solution would be to spawn separate instance of the same
> interpreter at every notebook/user level.
>
> Regards,
> Sourav
>
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Hi Zeppelin users and developers,
>>
>> The roadmap we have published at
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> is almost 9 month old, and it doesn't reflect where the community goes
>> anymore. It's time to update.
>>
>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>> conferences and meetings, I could summarize the major interest of users and
>> developers in 7 categories. Enterprise ready, Usability improvement,
>> Pluggability, Documentation, Backend integration, Notebook storage, and
>> Visualization.
>>
>> And i could list related subjects under each categories.
>>
>>    - Enterprise ready
>>       - Authentication
>>          - Shiro authentication ZEPPELIN-548
>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>       - Authorization
>>          - Notebook authorization PR-681
>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>       - Security
>>       - Multi-tenancy
>>       - Stability
>>    - Usability Improvement
>>       - UX improvement
>>       - Better Table data support
>>          - Download data as csv, etc PR-725
>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>          - Featureful table data display (pagenation, etc)
>>       - Pluggability ZEPPELIN-533
>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>       - Pluggable visualization
>>       - Dynamic Interpreter, notebook, visualization loading
>>       - Repository and registry for pluggable components
>>    - Improve documentation
>>       - Improve contents and readability
>>       - more tutorials, examples
>>    - Interpreter
>>       - Generic JDBC Interpreter
>>       - (spark)R Interpreter
>>       - Cluster manager for interpreter (Proposal
>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>       )
>>       - more interpreters
>>    - Notebook storage
>>       - Versioning ZEPPELIN-540
>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>       - more notebook storages
>>    - Visualization
>>       - More visualizations PR-152
>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>       - Customize graph (show/hide label, color, etc)
>>
>>
>> It will help anyone quickly get overall interest of project and the
>> direction. And based on this roadmap, we can discuss and re-define the next
>> release 0.6.0 scope and it's schedule.
>>
>> What do you think? Any feedback would be appreciated.
>>
>> Thanks,
>> moon
>>
>>
>

Re: [DISCUSS] Update Roadmap

Posted by Zhong Wang <wa...@gmail.com>.

This is awesome! Really glad to see that the roadmap is adjusted based on
the community's needs. One feature I hope to see in 0.6.0 is folder
support, which can benefit both "UX improvement" and "Multi-tenacy".

Zhong

On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running
> Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple
> notebooks concurrently through a single Zeppelin instance (and single
> interpreter instance) the performance is very slow. It is obvious that the
> queue gets built up within the zeppelin process and interpreter process in
> that scenario as the time taken to move the status from start to pending
> and pending to running is very high compared to the actual running time of
> a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can
> practically use it in a situation where multiple users are trying to
> connect to the same instance of Zeppelin (and the related interpreter). A
> possible solution would be to spawn separate instance of the same
> interpreter at every notebook/user level.
>
> Regards,
> Sourav
>
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Hi Zeppelin users and developers,
>>
>> The roadmap we have published at
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> is almost 9 month old, and it doesn't reflect where the community goes
>> anymore. It's time to update.
>>
>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>> conferences and meetings, I could summarize the major interest of users and
>> developers in 7 categories. Enterprise ready, Usability improvement,
>> Pluggability, Documentation, Backend integration, Notebook storage, and
>> Visualization.
>>
>> And i could list related subjects under each categories.
>>
>>    - Enterprise ready
>>       - Authentication
>>          - Shiro authentication ZEPPELIN-548
>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>       - Authorization
>>          - Notebook authorization PR-681
>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>       - Security
>>       - Multi-tenancy
>>       - Stability
>>    - Usability Improvement
>>       - UX improvement
>>       - Better Table data support
>>          - Download data as csv, etc PR-725
>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>          - Featureful table data display (pagenation, etc)
>>       - Pluggability ZEPPELIN-533
>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>       - Pluggable visualization
>>       - Dynamic Interpreter, notebook, visualization loading
>>       - Repository and registry for pluggable components
>>    - Improve documentation
>>       - Improve contents and readability
>>       - more tutorials, examples
>>    - Interpreter
>>       - Generic JDBC Interpreter
>>       - (spark)R Interpreter
>>       - Cluster manager for interpreter (Proposal
>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>       )
>>       - more interpreters
>>    - Notebook storage
>>       - Versioning ZEPPELIN-540
>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>       - more notebook storages
>>    - Visualization
>>       - More visualizations PR-152
>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>       - Customize graph (show/hide label, color, etc)
>>
>>
>> It will help anyone quickly get overall interest of project and the
>> direction. And based on this roadmap, we can discuss and re-define the next
>> release 0.6.0 scope and it's schedule.
>>
>> What do you think? Any feedback would be appreciated.
>>
>> Thanks,
>> moon
>>
>>
>

Re: [DISCUSS] Update Roadmap

Posted by Zhong Wang <wa...@gmail.com>.

This is awesome! Really glad to see that the roadmap is adjusted based on
the community's needs. One feature I hope to see in 0.6.0 is folder
support, which can benefit both "UX improvement" and "Multi-tenacy".

Zhong

On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running
> Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple
> notebooks concurrently through a single Zeppelin instance (and single
> interpreter instance) the performance is very slow. It is obvious that the
> queue gets built up within the zeppelin process and interpreter process in
> that scenario as the time taken to move the status from start to pending
> and pending to running is very high compared to the actual running time of
> a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can
> practically use it in a situation where multiple users are trying to
> connect to the same instance of Zeppelin (and the related interpreter). A
> possible solution would be to spawn separate instance of the same
> interpreter at every notebook/user level.
>
> Regards,
> Sourav
>
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Hi Zeppelin users and developers,
>>
>> The roadmap we have published at
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> is almost 9 month old, and it doesn't reflect where the community goes
>> anymore. It's time to update.
>>
>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>> conferences and meetings, I could summarize the major interest of users and
>> developers in 7 categories. Enterprise ready, Usability improvement,
>> Pluggability, Documentation, Backend integration, Notebook storage, and
>> Visualization.
>>
>> And i could list related subjects under each categories.
>>
>>    - Enterprise ready
>>       - Authentication
>>          - Shiro authentication ZEPPELIN-548
>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>       - Authorization
>>          - Notebook authorization PR-681
>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>       - Security
>>       - Multi-tenancy
>>       - Stability
>>    - Usability Improvement
>>       - UX improvement
>>       - Better Table data support
>>          - Download data as csv, etc PR-725
>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>          - Featureful table data display (pagenation, etc)
>>       - Pluggability ZEPPELIN-533
>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>       - Pluggable visualization
>>       - Dynamic Interpreter, notebook, visualization loading
>>       - Repository and registry for pluggable components
>>    - Improve documentation
>>       - Improve contents and readability
>>       - more tutorials, examples
>>    - Interpreter
>>       - Generic JDBC Interpreter
>>       - (spark)R Interpreter
>>       - Cluster manager for interpreter (Proposal
>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>       )
>>       - more interpreters
>>    - Notebook storage
>>       - Versioning ZEPPELIN-540
>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>       - more notebook storages
>>    - Visualization
>>       - More visualizations PR-152
>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>       - Customize graph (show/hide label, color, etc)
>>
>>
>> It will help anyone quickly get overall interest of project and the
>> direction. And based on this roadmap, we can discuss and re-define the next
>> release 0.6.0 scope and it's schedule.
>>
>> What do you think? Any feedback would be appreciated.
>>
>> Thanks,
>> moon
>>
>>
>

Re: [DISCUSS] Update Roadmap

Posted by Zhong Wang <wa...@gmail.com>.

Sourav: I think this newly merged PR can help you
https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537

On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> Hi Moon,
>
> This looks great.
>
> My only suggestion would be to include a PR/feature - Support for Running
> Concurrent paragraphs/queries in Zeppelin.
>
> Right now if more than one user tries to run paragraphs in multiple
> notebooks concurrently through a single Zeppelin instance (and single
> interpreter instance) the performance is very slow. It is obvious that the
> queue gets built up within the zeppelin process and interpreter process in
> that scenario as the time taken to move the status from start to pending
> and pending to running is very high compared to the actual running time of
> a paragraph.
>
> Without this the multi tenancy support would be meaningless as no one can
> practically use it in a situation where multiple users are trying to
> connect to the same instance of Zeppelin (and the related interpreter). A
> possible solution would be to spawn separate instance of the same
> interpreter at every notebook/user level.
>
> Regards,
> Sourav
>
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> Hi Zeppelin users and developers,
>>
>> The roadmap we have published at
>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>> is almost 9 month old, and it doesn't reflect where the community goes
>> anymore. It's time to update.
>>
>> Based on mailing list, jira issues, pullrequests, feedbacks from users,
>> conferences and meetings, I could summarize the major interest of users and
>> developers in 7 categories. Enterprise ready, Usability improvement,
>> Pluggability, Documentation, Backend integration, Notebook storage, and
>> Visualization.
>>
>> And i could list related subjects under each categories.
>>
>>    - Enterprise ready
>>       - Authentication
>>          - Shiro authentication ZEPPELIN-548
>>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>>       - Authorization
>>          - Notebook authorization PR-681
>>          <https://github.com/apache/incubator-zeppelin/pull/681>
>>       - Security
>>       - Multi-tenancy
>>       - Stability
>>    - Usability Improvement
>>       - UX improvement
>>       - Better Table data support
>>          - Download data as csv, etc PR-725
>>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>>          <https://github.com/apache/incubator-zeppelin/pull/89>
>>          - Featureful table data display (pagenation, etc)
>>       - Pluggability ZEPPELIN-533
>>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>>       - Pluggable visualization
>>       - Dynamic Interpreter, notebook, visualization loading
>>       - Repository and registry for pluggable components
>>    - Improve documentation
>>       - Improve contents and readability
>>       - more tutorials, examples
>>    - Interpreter
>>       - Generic JDBC Interpreter
>>       - (spark)R Interpreter
>>       - Cluster manager for interpreter (Proposal
>>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>>       )
>>       - more interpreters
>>    - Notebook storage
>>       - Versioning ZEPPELIN-540
>>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>>       - more notebook storages
>>    - Visualization
>>       - More visualizations PR-152
>>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>>       <https://github.com/apache/incubator-zeppelin/pull/321>
>>       - Customize graph (show/hide label, color, etc)
>>
>>
>> It will help anyone quickly get overall interest of project and the
>> direction. And based on this roadmap, we can discuss and re-define the next
>> release 0.6.0 scope and it's schedule.
>>
>> What do you think? Any feedback would be appreciated.
>>
>> Thanks,
>> moon
>>
>>
>

Re: [DISCUSS] Update Roadmap

Posted by Sourav Mazumder <so...@gmail.com>.

Hi Moon,

This looks great.

My only suggestion would be to include a PR/feature - Support for Running
Concurrent paragraphs/queries in Zeppelin.

Right now if more than one user tries to run paragraphs in multiple
notebooks concurrently through a single Zeppelin instance (and single
interpreter instance) the performance is very slow. It is obvious that the
queue gets built up within the zeppelin process and interpreter process in
that scenario as the time taken to move the status from start to pending
and pending to running is very high compared to the actual running time of
a paragraph.

Without this the multi tenancy support would be meaningless as no one can
practically use it in a situation where multiple users are trying to
connect to the same instance of Zeppelin (and the related interpreter). A
possible solution would be to spawn separate instance of the same
interpreter at every notebook/user level.

Regards,
Sourav

On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:

> Hi Zeppelin users and developers,
>
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
> And i could list related subjects under each categories.
>
>    - Enterprise ready
>       - Authentication
>          - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>       - Authorization
>          - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>       - Security
>       - Multi-tenancy
>       - Stability
>    - Usability Improvement
>       - UX improvement
>       - Better Table data support
>          - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>          - Featureful table data display (pagenation, etc)
>       - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>       - Pluggable visualization
>       - Dynamic Interpreter, notebook, visualization loading
>       - Repository and registry for pluggable components
>    - Improve documentation
>       - Improve contents and readability
>       - more tutorials, examples
>    - Interpreter
>       - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>    - Notebook storage
>       - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>    - Visualization
>       - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>       - Customize graph (show/hide label, color, etc)
>
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
> What do you think? Any feedback would be appreciated.
>
> Thanks,
> moon
>
>

Re: [DISCUSS] Update Roadmap

Posted by Sourav Mazumder <so...@gmail.com>.

Hi Moon,

This looks great.

My only suggestion would be to include a PR/feature - Support for Running
Concurrent paragraphs/queries in Zeppelin.

Right now if more than one user tries to run paragraphs in multiple
notebooks concurrently through a single Zeppelin instance (and single
interpreter instance) the performance is very slow. It is obvious that the
queue gets built up within the zeppelin process and interpreter process in
that scenario as the time taken to move the status from start to pending
and pending to running is very high compared to the actual running time of
a paragraph.

Without this the multi tenancy support would be meaningless as no one can
practically use it in a situation where multiple users are trying to
connect to the same instance of Zeppelin (and the related interpreter). A
possible solution would be to spawn separate instance of the same
interpreter at every notebook/user level.

Regards,
Sourav

On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee <mo...@apache.org> wrote:

> Hi Zeppelin users and developers,
>
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
> And i could list related subjects under each categories.
>
>    - Enterprise ready
>       - Authentication
>          - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>       - Authorization
>          - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>       - Security
>       - Multi-tenancy
>       - Stability
>    - Usability Improvement
>       - UX improvement
>       - Better Table data support
>          - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>          - Featureful table data display (pagenation, etc)
>       - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>       - Pluggable visualization
>       - Dynamic Interpreter, notebook, visualization loading
>       - Repository and registry for pluggable components
>    - Improve documentation
>       - Improve contents and readability
>       - more tutorials, examples
>    - Interpreter
>       - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>    - Notebook storage
>       - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>    - Visualization
>       - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>       - Customize graph (show/hide label, color, etc)
>
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
> What do you think? Any feedback would be appreciated.
>
> Thanks,
> moon
>
>

Re: [DISCUSS] Update Roadmap

Posted by DuyHai Doan <do...@gmail.com>.

It's a great update Moon.

 Monday I'll give a talk at Voxxed Days Vienna about Zeppelin, your email
will be helpful to give some hints about the future of Zeppelin



On Sat, Feb 27, 2016 at 9:48 PM, moon soo Lee <mo...@apache.org> wrote:

> Hi Zeppelin users and developers,
>
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
> is almost 9 month old, and it doesn't reflect where the community goes
> anymore. It's time to update.
>
> Based on mailing list, jira issues, pullrequests, feedbacks from users,
> conferences and meetings, I could summarize the major interest of users and
> developers in 7 categories. Enterprise ready, Usability improvement,
> Pluggability, Documentation, Backend integration, Notebook storage, and
> Visualization.
>
> And i could list related subjects under each categories.
>
>    - Enterprise ready
>       - Authentication
>          - Shiro authentication ZEPPELIN-548
>          <https://issues.apache.org/jira/browse/ZEPPELIN-548>
>       - Authorization
>          - Notebook authorization PR-681
>          <https://github.com/apache/incubator-zeppelin/pull/681>
>       - Security
>       - Multi-tenancy
>       - Stability
>    - Usability Improvement
>       - UX improvement
>       - Better Table data support
>          - Download data as csv, etc PR-725
>          <https://github.com/apache/incubator-zeppelin/pull/725>, PR-714
>          <https://github.com/apache/incubator-zeppelin/pull/714>, PR-6
>          <https://github.com/apache/incubator-zeppelin/pull/6>, PR-89
>          <https://github.com/apache/incubator-zeppelin/pull/89>
>          - Featureful table data display (pagenation, etc)
>       - Pluggability ZEPPELIN-533
>    <https://issues.apache.org/jira/browse/ZEPPELIN-533>
>       - Pluggable visualization
>       - Dynamic Interpreter, notebook, visualization loading
>       - Repository and registry for pluggable components
>    - Improve documentation
>       - Improve contents and readability
>       - more tutorials, examples
>    - Interpreter
>       - Generic JDBC Interpreter
>       - (spark)R Interpreter
>       - Cluster manager for interpreter (Proposal
>       <https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Proposal>
>       )
>       - more interpreters
>    - Notebook storage
>       - Versioning ZEPPELIN-540
>       <http://issues.apache.org/jira/browse/ZEPPELIN-540>
>       - more notebook storages
>    - Visualization
>       - More visualizations PR-152
>       <https://github.com/apache/incubator-zeppelin/pull/152>, PR-728
>       <https://github.com/apache/incubator-zeppelin/pull/728>, PR-336
>       <https://github.com/apache/incubator-zeppelin/pull/336>, PR-321
>       <https://github.com/apache/incubator-zeppelin/pull/321>
>       - Customize graph (show/hide label, color, etc)
>
>
> It will help anyone quickly get overall interest of project and the
> direction. And based on this roadmap, we can discuss and re-define the next
> release 0.6.0 scope and it's schedule.
>
> What do you think? Any feedback would be appreciated.
>
> Thanks,
> moon
>
>