You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Jesang Yoon (JIRA)" <ji...@apache.org> on 2016/07/08 00:40:10 UTC

[jira] [Created] (ZEPPELIN-1135) Provide a manifest for data & interface to use it

Jesang Yoon created ZEPPELIN-1135:
-------------------------------------

             Summary: Provide a manifest for data & interface to use it
                 Key: ZEPPELIN-1135
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1135
             Project: Zeppelin
          Issue Type: New Feature
          Components: documentation, GUI, zeppelin-interpreter
    Affects Versions: 0.7.0
            Reporter: Jesang Yoon
            Priority: Minor


While using various data at various sources (difference URLs) to run a mixed data analysis via zeppelin, my team encounter problem with manging many different data source URLs and share between teammates.
So I propose a idea to solve this problem by providing "manifest of data and interface to use it" and want to build consensus between contributors and PPMC before build and commit a code.

h4. Pain points
* Files or resources tend to be displaced to various location. (HDFS, Web, etc...)
* It's bit complicated to remember & identify location of data and use a long URL for it.
* URL for data is not enough to describe what is inside of it.

h4. How to resolve it
# Define a format of web based document(XML/JSON/YAML) contains  manifest(or meta) of data that can be used by team.
#* Title of data
#* Location of data (URL)
#* Description of data
#* Tags of data (for search)
# Build a zeppelin interface function to search & view description of data described at 1.
# Build a zeppelin interface function to return a real location of data captured at 2. to using with load() functions of various interpreters.

h4. Effects
* Able to share single clean and neat information about data between teammates.
* Do not have to follow & change all URLs in notebooks when location of data has been modified.
* Easy to search and use data in analysis codes.

Please review this idea and give comments :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)