You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Jeff Zhang <zj...@gmail.com> on 2018/08/13 00:33:41 UTC

[DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Motivation

   The motivation of ZEPPELIN-2619 is to change the notes storage
structure. Previously we store it using {noteId}/note.json, we’d like to
change it into {note_name}_{note_id}.zpln. There are several reasons for
this change.


   1.

   {noteId}/note.json is not scalable. We put all notes in one root folder
   in flat structure. And when zeppelin server starts, we need to read all
   note.json to get the note file name and build the note folder structure
   (Because we need to get the note name which is stored in note.json to build
   the notebook menu). This would be a nightmare when you have large amounts
   of notes.
   2.

   {noteId}/note.json is not maintainable. It is difficult for a
   developer/administrator to find note file based on note name.
   3.

   {noteId}/note.json has no folder structure. Currently zeppelin have to
   build the folder structure internally in memory according note name which
   is a big overhead.


New Approach

   As I mentioned above, I propose to change the note storage structure to
{note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
folder_1/mynote_abcd.zpln

This kind of note storage structure could bring several benefits.

   1.

   We don’t need to load all notes when zeppelin starts. We just need to
   list each folder to get the note name and note_id.
   2.

   It is much maintainable so that it is easy to find the note file based
   on note name.
   3.

   It has the folder structure already. That can be mapped to the note
   folder structure.


Side Effect

This approach only works for file system storage, so that means we have to
drop support for MongoNotebookRepo. I think it is ok because I didn’t see
any users talk about this in community, so I assume no one is using it.


This is overall design, welcome any comments and feedback. Thanks.


Here's the google docs, you can also comment it here.

https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing

[DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by "Partridge, Lucas (GE Aviation)" <Lu...@ge.com>.
Ok, thanks Jeff – that all makes sense!  Yes, rendering and diffing notebooks in github would be very nice.

From: Jeff Zhang <zj...@gmail.com>
Sent: 13 August 2018 10:50
To: users@zeppelin.apache.org
Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

>>> Do we need the note id in the file name at all? What’s wrong with just note_name.zpln?
The reason I keep note id is because currently we use noteId to identify one note. e.g. we use note id in both websocket api and rest api. It is almost impossible to remove noteId for the current architecture. If we put note id into file content of note_name.zpln, then we have to read the note file every time, then we meet the issues I mentioned above again.

>>> If the file content is json then why not use note_name.json instead of .zpln? That would make it easier for editors to know how to load/highlight the file contents.
I am not strongly biased on *.zpln. But I think one purpose is to help third parties to identify zeppelin note properly. e.g. github can identify jupyter notebook (*.ipynb) and render it properly.

>>> Is there any reason for not using real folders or directories for organising the notebooks rather than embedding the folder hierarchy in the names of the notebooks?  If someone wants to ‘move’ the notebooks to another folder they’d have to manually rename all the files/notebooks at present.  That’s not very user-friendly.

Actually my proposal is to use real folders. What user see in zeppelin note menu is the actual notes folder structure. If they want to move the notebooks to another folder, they can change the folder name just like what user did in file system.





Partridge, Lucas (GE Aviation) <Lu...@ge.com>>于2018年8月13日周一 下午4:43写道:
Hi Jeff,
I have some questions about this proposal (I can’t edit the design doc):


  1.  Do we need the note id in the file name at all? What’s wrong with just note_name.zpln?
  2.  If the file content is json then why not use note_name.json instead of .zpln? That would make it easier for editors to know how to load/highlight the file contents.
  3.  Is there any reason for not using real folders or directories for organising the notebooks rather than embedding the folder hierarchy in the names of the notebooks?  If someone wants to ‘move’ the notebooks to another folder they’d have to manually rename all the files/notebooks at present.  That’s not very user-friendly.

Thanks, Lucas.
From: Jeff Zhang <zj...@gmail.com>>
Sent: 13 August 2018 09:06
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Cc: dev <de...@zeppelin.apache.org>>
Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

In that case, zeppelin should fail to create note.

Felix Cheung <fe...@hotmail.com>>于2018年8月13日周一 下午3:47写道:
Perhaps one concern is users having characters in note name that are invalid for file name/file path?


________________________________
From: Mohit Jaggi <mo...@gmail.com>>
Sent: Sunday, August 12, 2018 6:02 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Cc: dev
Subject: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

sounds like a good idea!

On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>> wrote:
Motivation

   The motivation of ZEPPELIN-2619 is to change the notes storage structure. Previously we store it using {noteId}/note.json, we’d like to change it into {note_name}_{note_id}.zpln. There are several reasons for this change.


  1.  {noteId}/note.json is not scalable. We put all notes in one root folder in flat structure. And when zeppelin server starts, we need to read all note.json to get the note file name and build the note folder structure (Because we need to get the note name which is stored in note.json to build the notebook menu). This would be a nightmare when you have large amounts of notes.
  2.  {noteId}/note.json is not maintainable. It is difficult for a developer/administrator to find note file based on note name.
  3.  {noteId}/note.json has no folder structure. Currently zeppelin have to build the folder structure internally in memory according note name which is a big overhead.

New Approach

   As I mentioned above, I propose to change the note storage structure to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g. folder_1/mynote_abcd.zpln

This kind of note storage structure could bring several benefits.

  1.  We don’t need to load all notes when zeppelin starts. We just need to list each folder to get the note name and note_id.
  2.  It is much maintainable so that it is easy to find the note file based on note name.
  3.  It has the folder structure already. That can be mapped to the note folder structure.

Side Effect

This approach only works for file system storage, so that means we have to drop support for MongoNotebookRepo. I think it is ok because I didn’t see any users talk about this in community, so I assume no one is using it.



This is overall design, welcome any comments and feedback. Thanks.



Here's the google docs, you can also comment it here.

https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing




Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jeff Zhang <zj...@gmail.com>.
I also agree to keep noteId, lots of zeppelin api use noteid to identify
note.
Removing it would cause lots of changes.


andreas.weise@gmail.com <an...@gmail.com>于2018年9月4日周二 上午3:03写道:

> Somehow subject was deleted in earlier mail. So here again for adding my
> thoughts to the proper thread:
>
> Sure, there might exist some kind of naming policy (or let's better call
> it naming convention) in zeppelin multiuser environments. But as long as
> there is no way to technically enforce a certain naming convention to turn
> it into a naming policy (which would be a nice idea BTW), it's IMHO a vague
> assumption, that there exist policies and users are following these. I'm
> thinking here of problems that might come up when changing the existing
> implementation and then deal with migration, because assumptions do not
> match reality.
>
> My real life scenario here is, that zeppelin can be configured to make
> notebooks visible only to the owner (and invisible to any other user) by
> default: ZEPPELIN_NOTEBOOK_PUBLIC=false, which is IMHO a good idea when
> setting up zeppelin as multiuser environment in larger scenarios. In this
> case note owners can use any or no naming convention they like when
> creating and using a note for personal purposes only, because only the
> owner will see it - also and even if there exists certain naming policies
> on an global organisation level. A global naming convention must only be
> followed when users start sharing notes (means: adding at least reader
> permissions to any other user).
>
> So I think noteId is a must have in the filename.
>
> Andreas
>
> On 2018/08/31 01:46:38, Jongyoul Lee <jo...@gmail.com> wrote:
> > Hi,
> >
> > I have a bit different thoughts about the conflicts of the name of a new
> > note created. In a multiuser environment, AFAIK, most teams and
> companies,
> > generally, use a prefix for the group policy internally. In my case,
> > user/{user_id}/{notebook_name_they_want}.zpln. In this case, naming
> > conflicts rarely happen. And it will be stored under a specific folder.
> If
> > someone needed two different same named notes in the same directory, I
> > might not be appropriate. WDYT?
> >
> > JL
> >
> > On Fri, Aug 31, 2018 at 4:44 AM, andreas.weise@gmail.com <
> > andreas.weise@gmail.com> wrote:
> >
> > > another reason for keeping noteId is uniqueness in case of multi-user
> > > environments. In that case users have separate zeppelin workspaces,
> which
> > > is something we are using in production: see
> ZEPPELIN_NOTEBOOK_PUBLIC=false
> > > in the doc [1]. In that case users might be very confused when they
> can not
> > > create notebooks with a name that already exists, but they most likely
> > > don't see (yet).
> > >
> > > So I like the proposal {note_name}_{note_id}.zpln. where note_name
> could
> > > contains folders, e.g. folder_1/mynote_abcd.zpln. Even though I like
> > > {note_name}.{note_id}.zpln (dot in between note_name and note_id) even
> > > better :-)
> > >
> > > Regards
> > > Andreas
> > >
> > >
> > > [1] http://zeppelin.apache.org/docs/0.8.0/setup/security/
> > >
> notebook_authorization.html#separate-notebook-workspaces-public-vs-private
> > >
> > > On 2018/08/18 08:42:44, Jeff Zhang <zj...@gmail.com> wrote:
> > > > BTW, I also prefer to use note name as identify of note if the issue
> I
> > > > mentioned before is acceptable for most of users.
> > > >
> > > >
> > > >
> > > > Jeff Zhang <zj...@gmail.com>于2018年8月18日周六 下午4:40写道:
> > > >
> > > > >
> > > > > I am afraid we can not remove noteId, as noteId is the unique
> > > identifier
> > > > > of note and is immutable which is used in a lot places, such as
> > > paragraph
> > > > > share and rest api.
> > > > > If we use note name as note id then it may break user's app if note
> > > name
> > > > > is changed
> > > > >
> > > > >
> > > > > Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:33写道:
> > > > >
> > > > >> Hi, thanks for this kind of discussion.
> > > > >>
> > > > >> About noteId, How about changing note id to note name? AFAIK,
> Note id
> > > is
> > > > >> just an identifier and we can set any value to it.
> > > > >>
> > > > >> There’re two potential problems. We should be more careful to
> handle
> > > note
> > > > >> id as it could have very various type of characters. And Second,
> in
> > > case
> > > > >> where someone changes a note name, those who are seeing and
> updating
> > > the
> > > > >> same note wouldn’t access that note. We could handle it by using
> > > websockets.
> > > > >>
> > > > >> WDYT?
> > > > >>
> > > > >> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zj...@gmail.com>
> wrote:
> > > > >>
> > > > >>> >>> But I’m still not comfortable with note ids in the name of
> the
> > > > >>> notebook itself.  Those names would look ugly if you shared your
> > > notebooks
> > > > >>> on github for example.  You don’t see Jupyter notebooks with
> names
> > > like
> > > > >>> that. If you have to keep the note ids with the notebooks could
> you
> > > not
> > > > >>> simply put the note id at the top of the notebook as Ruslan
> > > suggested? Then
> > > > >>> you’d only have to read the first line of each notebook.
> > > > >>>
> > > > >>> I know putting note_id in the note file name is not so elegant,
> but
> > > this
> > > > >>> is what we have to compromise to keep compatibility as we use
> noteId
> > > to
> > > > >>> uniquely identify note right now. And I don't think putting
> noteId
> > > in the
> > > > >>> top first line of note would help much. We still have to read
> note
> > > files
> > > > >>> which take much more time than just read the file names via file
> > > system.
> > > > >>>
> > > > >>> Regarding the readability of note file name, I think it won't
> affect
> > > > >>> much. E.g. This is the note book file name like:  *My Project/My
> > > Spark
> > > > >>> Tutorial Note_2A94M5J1Z.zpln*
> > > > >>> What user see in notebook menu is still *My Project/My Spark
> > > Tutorial* *Note
> > > > >>> *which is no difference from what we see now.
> > > > >>>
> > > > >>> And thanks again for the feedback and comments, I am so glad to
> see
> > > so
> > > > >>> many discussion in community.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> Partridge, Lucas (GE Aviation) <Lucas.Partridge@ge.com
> >于2018年8月14日周二
> > > > >>> 下午4:29写道:
> > > > >>>
> > > > >>>> I agree you’re inviting consistency issues if you maintained a
> > > separate
> > > > >>>> note id-to-note name mapping file.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> But I’m still not comfortable with note ids in the name of the
> > > notebook
> > > > >>>> itself.  Those names would look ugly if you shared your
> notebooks
> > > on github
> > > > >>>> for example.  You don’t see Jupyter notebooks with names like
> > > that.  If you
> > > > >>>> have to keep the note ids with the notebooks could you not
> simply
> > > put the
> > > > >>>> note id at the top of the notebook as Ruslan suggested? Then
> you’d
> > > only
> > > > >>>> have to read the first line of each notebook.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Presumably if you copied the notebooks to another Zeppelin
> server
> > > they
> > > > >>>> would be restored with the same note ids there too? And
> hopefully
> > > there
> > > > >>>> would be no id clash with notebooks already on that server…
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> *From:* Jeff Zhang <zj...@gmail.com>
> > > > >>>> *Sent:* 14 August 2018 03:49
> > > > >>>> *To:* users@zeppelin.apache.org
> > > > >>>>
> > > > >>>>
> > > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> > > [Title].zpln
> > > > >>>> instead of [NOTEID]/note.json
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Thanks for the discussion.
> > > > >>>>
> > > > >>>> >>> I'm afraid about non-latin symbols in folder and note name.
> And
> > > > >>>> what about hieroglyphs?
> > > > >>>>
> > > > >>>> AFAIK, linux allow all the characters to be file name except
> `\0`
> > > and
> > > > >>>> '/'.  I can create file name with Chinese character in linux, I
> > > guess you
> > > > >>>> can use Russian as well.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >>> If I understand correctly, this is being done solely to
> speed up
> > > > >>>> loading list of notebooks? What if a list of notebook names,
> their
> > > ids,
> > > > >>>> folder structure, etc can be *cached* in a separate small json
> > > file? Or
> > > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> > > would
> > > > >>>> do? Just thinking out loud. This would require a way to lazily
> > > re-sync the
> > > > >>>> cache.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> This not only to speed up the loading but also make the system
> > > > >>>> architecture easy to maintain. Because for now we have to build
> the
> > > folder
> > > > >>>> structure of notes in memory, many code in zeppelin is doing
> this
> > > > >>>> (Personally I don't think we need any code for this function if
> we
> > > could
> > > > >>>> get the folder structure from the note file storage system). Use
> > > another
> > > > >>>> storage to keep the mapping of note name and note id will bring
> > > another
> > > > >>>> classic problem of distributed system: consistency. How do we
> make
> > > sure the
> > > > >>>> consistency between the real note file and this mapping
> component.
> > > If we
> > > > >>>> create/rename/remove note, we have to both update the notebook
> repo
> > > and the
> > > > >>>> mapping storage. Any bug in code would bring inconsistency issue
> > > based on
> > > > >>>> my experience.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
> > > > >>>>
> > > > >>>> Thanks for bringing this up for discussion. My 2 cents below.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> I am with Maksim and Felix on concerns with special characters
> now
> > > > >>>> allowed in notebook names, and also concerns with different
> > > charsets.
> > > > >>>> Russian language, for example, most commonly use iso-8859-5,
> > > koi-8r/u,
> > > > >>>> windows-1251 charsets etc. This seems like will bring whole new
> set
> > > of
> > > > >>>> localization issues.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> If I understand correctly, this is being done solely to speed up
> > > > >>>> loading list of notebooks? What if a list of notebook names,
> their
> > > ids,
> > > > >>>> folder structure, etc can be *cached* in a separate small json
> > > file? Or
> > > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> > > would
> > > > >>>> do? Just thinking out loud. This would require a way to lazily
> > > re-sync the
> > > > >>>> cache.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Another way to speed up json reads is to somehow force "name"
> > > attribute
> > > > >>>> to be at the top of the json document that's written to disk.
> Then
> > > > >>>> re-implement json files reader to read just header of the file
> and
> > > do a
> > > > >>>> partial json parse ( or in the lack of options, grab "name"
> > > attribute from
> > > > >>>> the json file header by a regex for example).
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Back to filenames and charsets, I think issue may be more
> > > complicated,
> > > > >>>> if you store notebooks on a remote filesystem (nfs/ samba etc),
> and
> > > what if
> > > > >>>> remote server and local nfs client have differences in default
> fs
> > > charsets?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Ideally would be if all filesystems would use UTF-8 for example,
> > > but I
> > > > >>>> am not certain that's a good assumption to make. Also exposing
> > > notebook
> > > > >>>> names can bring some other issues, like I know some users
> > > occasionally add
> > > > >>>> trailing/leading spaces etc.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> > > > >>>> m.belousov@tinkoff.ru> wrote:
> > > > >>>>
> > > > >>>> The use of Russian and other specific letters in the note name
> is
> > > big
> > > > >>>> advantage of Zeppelin. I would not like to give up this
> > > functionality.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> I support the idea about `zpln` file extension.
> > > > >>>>
> > > > >>>> The folder structure also sounds good.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> I'm afraid about non-latin symbols in folder and note name. And
> what
> > > > >>>> about hieroglyphs?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Apache Zeppelin may be the first to use Russian letters in file
> > > system
> > > > >>>> in our company.
> > > > >>>>
> > > > >>>> I see a lot of risks to use non-latin symbols and a lot of
> issues to
> > > > >>>> make new folder structure stable.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> ------------------------------
> > > > >>>>
> > > > >>>> *От:* Jeff Zhang <zj...@gmail.com>
> > > > >>>> *Отправлено:* 13 августа 2018 г. 12:50
> > > > >>>> *Кому:* users@zeppelin.apache.org
> > > > >>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> > > instead
> > > > >>>> of [NOTEID]/note.json
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >>> Do we need the note id in the file name at all? What’s wrong
> > > with
> > > > >>>> just note_name.zpln?
> > > > >>>>
> > > > >>>> The reason I keep note id is because currently we use noteId to
> > > > >>>> identify one note. e.g. we use note id in both websocket api and
> > > rest api.
> > > > >>>> It is almost impossible to remove noteId for the current
> > > architecture. If
> > > > >>>> we put note id into file content of note_name.zpln, then we
> have to
> > > read
> > > > >>>> the note file every time, then we meet the issues I mentioned
> above
> > > again.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >>> If the file content is json then why not use note_name.json
> > > instead
> > > > >>>> of .zpln? That would make it easier for editors to know how to
> > > > >>>> load/highlight the file contents.
> > > > >>>>
> > > > >>>> I am not strongly biased on *.zpln. But I think one purpose is
> to
> > > help
> > > > >>>> third parties to identify zeppelin note properly. e.g. github
> can
> > > identify
> > > > >>>> jupyter notebook (*.ipynb) and render it properly.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >>> Is there any reason for not using *real* folders or
> directories
> > > > >>>> for organising the notebooks rather than embedding the folder
> > > hierarchy in
> > > > >>>> the names of the notebooks?  If someone wants to ‘move’ the
> > > notebooks to
> > > > >>>> another folder they’d have to manually rename all the
> > > files/notebooks at
> > > > >>>> present.  That’s not very user-friendly.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Actually my proposal is to use real folders. What user see in
> > > zeppelin
> > > > >>>> note menu is the actual notes folder structure. If they want to
> > > move the
> > > > >>>> notebooks to another folder, they can change the folder name
> just
> > > like what
> > > > >>>> user did in file system.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Partridge, Lucas (GE Aviation) <Lucas.Partridge@ge.com
> >于2018年8月13日周一
> > > 下午
> > > > >>>> 4:43写道:
> > > > >>>>
> > > > >>>> Hi Jeff,
> > > > >>>>
> > > > >>>> I have some questions about this proposal (I can’t edit the
> design
> > > doc):
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>    1. Do we need the note id in the file name at all? What’s
> wrong
> > > > >>>>    with just note_name.zpln?
> > > > >>>>    2. If the file content is json then why not use
> note_name.json
> > > > >>>>    instead of .zpln? That would make it easier for editors to
> know
> > > how to
> > > > >>>>    load/highlight the file contents.
> > > > >>>>    3. Is there any reason for not using *real* folders or
> > > directories
> > > > >>>>    for organising the notebooks rather than embedding the folder
> > > hierarchy in
> > > > >>>>    the names of the notebooks?  If someone wants to ‘move’ the
> > > notebooks to
> > > > >>>>    another folder they’d have to manually rename all the
> > > files/notebooks at
> > > > >>>>    present.  That’s not very user-friendly.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Thanks, Lucas.
> > > > >>>>
> > > > >>>> *From:* Jeff Zhang <zj...@gmail.com>
> > > > >>>> *Sent:* 13 August 2018 09:06
> > > > >>>> *To:* users@zeppelin.apache.org
> > > > >>>> *Cc:* dev <de...@zeppelin.apache.org>
> > > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> > > [Title].zpln
> > > > >>>> instead of [NOTEID]/note.json
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> In that case, zeppelin should fail to create note.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
> > > > >>>>
> > > > >>>> Perhaps one concern is users having characters in note name
> that are
> > > > >>>> invalid for file name/file path?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> ------------------------------
> > > > >>>>
> > > > >>>> *From:* Mohit Jaggi <mo...@gmail.com>
> > > > >>>> *Sent:* Sunday, August 12, 2018 6:02 PM
> > > > >>>> *To:* users@zeppelin.apache.org
> > > > >>>> *Cc:* dev
> > > > >>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in
> [Title].zpln
> > > > >>>> instead of [NOTEID]/note.json
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> sounds like a good idea!
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>
> > > wrote:
> > > > >>>>
> > > > >>>> Motivation
> > > > >>>>
> > > > >>>>    The motivation of ZEPPELIN-2619 is to change the notes
> storage
> > > > >>>> structure. Previously we store it using {noteId}/note.json, we’d
> > > like to
> > > > >>>> change it into {note_name}_{note_id}.zpln. There are several
> > > reasons for
> > > > >>>> this change.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>    1. {noteId}/note.json is not scalable. We put all notes in
> one
> > > root
> > > > >>>>    folder in flat structure. And when zeppelin server starts, we
> > > need to read
> > > > >>>>    all note.json to get the note file name and build the note
> > > folder structure
> > > > >>>>    (Because we need to get the note name which is stored in
> > > note.json to build
> > > > >>>>    the notebook menu). This would be a nightmare when you have
> > > large amounts
> > > > >>>>    of notes.
> > > > >>>>    2. {noteId}/note.json is not maintainable. It is difficult
> for a
> > > > >>>>    developer/administrator to find note file based on note name.
> > > > >>>>    3. {noteId}/note.json has no folder structure. Currently
> zeppelin
> > > > >>>>    have to build the folder structure internally in memory
> > > according note name
> > > > >>>>    which is a big overhead.
> > > > >>>>
> > > > >>>>
> > > > >>>> New Approach
> > > > >>>>
> > > > >>>>    As I mentioned above, I propose to change the note storage
> > > structure
> > > > >>>> to {note_name}_{note_id}.zpln.  note_name could contains
> folders,
> > > e.g.
> > > > >>>> folder_1/mynote_abcd.zpln
> > > > >>>>
> > > > >>>> This kind of note storage structure could bring several
> benefits.
> > > > >>>>
> > > > >>>>    1. We don’t need to load all notes when zeppelin starts. We
> just
> > > > >>>>    need to list each folder to get the note name and note_id.
> > > > >>>>    2. It is much maintainable so that it is easy to find the
> note
> > > file
> > > > >>>>    based on note name.
> > > > >>>>    3. It has the folder structure already. That can be mapped
> to the
> > > > >>>>    note folder structure.
> > > > >>>>
> > > > >>>>
> > > > >>>> Side Effect
> > > > >>>>
> > > > >>>> This approach only works for file system storage, so that means
> we
> > > have
> > > > >>>> to drop support for MongoNotebookRepo. I think it is ok because
> I
> > > didn’t
> > > > >>>> see any users talk about this in community, so I assume no one
> is
> > > using it.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> This is overall design, welcome any comments and feedback.
> Thanks.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Here's the google docs, you can also comment it here.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8T
> > > ATYcGkDL1DNZoE/edit?usp=sharing
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> --
> > > > >> 이종열, Jongyoul Lee, 李宗烈
> > > > >> http://madeng.net
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > 이종열, Jongyoul Lee, 李宗烈
> > http://madeng.net
> >
>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by an...@gmail.com, an...@gmail.com.
Somehow subject was deleted in earlier mail. So here again for adding my thoughts to the proper thread:

Sure, there might exist some kind of naming policy (or let's better call it naming convention) in zeppelin multiuser environments. But as long as there is no way to technically enforce a certain naming convention to turn it into a naming policy (which would be a nice idea BTW), it's IMHO a vague assumption, that there exist policies and users are following these. I'm thinking here of problems that might come up when changing the existing implementation and then deal with migration, because assumptions do not match reality.

My real life scenario here is, that zeppelin can be configured to make notebooks visible only to the owner (and invisible to any other user) by default: ZEPPELIN_NOTEBOOK_PUBLIC=false, which is IMHO a good idea when setting up zeppelin as multiuser environment in larger scenarios. In this case note owners can use any or no naming convention they like when creating and using a note for personal purposes only, because only the owner will see it - also and even if there exists certain naming policies on an global organisation level. A global naming convention must only be followed when users start sharing notes (means: adding at least reader permissions to any other user).

So I think noteId is a must have in the filename.

Andreas

On 2018/08/31 01:46:38, Jongyoul Lee <jo...@gmail.com> wrote: 
> Hi,
> 
> I have a bit different thoughts about the conflicts of the name of a new
> note created. In a multiuser environment, AFAIK, most teams and companies,
> generally, use a prefix for the group policy internally. In my case,
> user/{user_id}/{notebook_name_they_want}.zpln. In this case, naming
> conflicts rarely happen. And it will be stored under a specific folder. If
> someone needed two different same named notes in the same directory, I
> might not be appropriate. WDYT?
> 
> JL
> 
> On Fri, Aug 31, 2018 at 4:44 AM, andreas.weise@gmail.com <
> andreas.weise@gmail.com> wrote:
> 
> > another reason for keeping noteId is uniqueness in case of multi-user
> > environments. In that case users have separate zeppelin workspaces, which
> > is something we are using in production: see ZEPPELIN_NOTEBOOK_PUBLIC=false
> > in the doc [1]. In that case users might be very confused when they can not
> > create notebooks with a name that already exists, but they most likely
> > don't see (yet).
> >
> > So I like the proposal {note_name}_{note_id}.zpln. where note_name could
> > contains folders, e.g. folder_1/mynote_abcd.zpln. Even though I like
> > {note_name}.{note_id}.zpln (dot in between note_name and note_id) even
> > better :-)
> >
> > Regards
> > Andreas
> >
> >
> > [1] http://zeppelin.apache.org/docs/0.8.0/setup/security/
> > notebook_authorization.html#separate-notebook-workspaces-public-vs-private
> >
> > On 2018/08/18 08:42:44, Jeff Zhang <zj...@gmail.com> wrote:
> > > BTW, I also prefer to use note name as identify of note if the issue I
> > > mentioned before is acceptable for most of users.
> > >
> > >
> > >
> > > Jeff Zhang <zj...@gmail.com>于2018年8月18日周六 下午4:40写道:
> > >
> > > >
> > > > I am afraid we can not remove noteId, as noteId is the unique
> > identifier
> > > > of note and is immutable which is used in a lot places, such as
> > paragraph
> > > > share and rest api.
> > > > If we use note name as note id then it may break user's app if note
> > name
> > > > is changed
> > > >
> > > >
> > > > Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:33写道:
> > > >
> > > >> Hi, thanks for this kind of discussion.
> > > >>
> > > >> About noteId, How about changing note id to note name? AFAIK, Note id
> > is
> > > >> just an identifier and we can set any value to it.
> > > >>
> > > >> There’re two potential problems. We should be more careful to handle
> > note
> > > >> id as it could have very various type of characters. And Second, in
> > case
> > > >> where someone changes a note name, those who are seeing and updating
> > the
> > > >> same note wouldn’t access that note. We could handle it by using
> > websockets.
> > > >>
> > > >> WDYT?
> > > >>
> > > >> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zj...@gmail.com> wrote:
> > > >>
> > > >>> >>> But I’m still not comfortable with note ids in the name of the
> > > >>> notebook itself.  Those names would look ugly if you shared your
> > notebooks
> > > >>> on github for example.  You don’t see Jupyter notebooks with names
> > like
> > > >>> that. If you have to keep the note ids with the notebooks could you
> > not
> > > >>> simply put the note id at the top of the notebook as Ruslan
> > suggested? Then
> > > >>> you’d only have to read the first line of each notebook.
> > > >>>
> > > >>> I know putting note_id in the note file name is not so elegant, but
> > this
> > > >>> is what we have to compromise to keep compatibility as we use noteId
> > to
> > > >>> uniquely identify note right now. And I don't think putting noteId
> > in the
> > > >>> top first line of note would help much. We still have to read note
> > files
> > > >>> which take much more time than just read the file names via file
> > system.
> > > >>>
> > > >>> Regarding the readability of note file name, I think it won't affect
> > > >>> much. E.g. This is the note book file name like:  *My Project/My
> > Spark
> > > >>> Tutorial Note_2A94M5J1Z.zpln*
> > > >>> What user see in notebook menu is still *My Project/My Spark
> > Tutorial* *Note
> > > >>> *which is no difference from what we see now.
> > > >>>
> > > >>> And thanks again for the feedback and comments, I am so glad to see
> > so
> > > >>> many discussion in community.
> > > >>>
> > > >>>
> > > >>>
> > > >>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月14日周二
> > > >>> 下午4:29写道:
> > > >>>
> > > >>>> I agree you’re inviting consistency issues if you maintained a
> > separate
> > > >>>> note id-to-note name mapping file.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> But I’m still not comfortable with note ids in the name of the
> > notebook
> > > >>>> itself.  Those names would look ugly if you shared your notebooks
> > on github
> > > >>>> for example.  You don’t see Jupyter notebooks with names like
> > that.  If you
> > > >>>> have to keep the note ids with the notebooks could you not simply
> > put the
> > > >>>> note id at the top of the notebook as Ruslan suggested? Then you’d
> > only
> > > >>>> have to read the first line of each notebook.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Presumably if you copied the notebooks to another Zeppelin server
> > they
> > > >>>> would be restored with the same note ids there too? And hopefully
> > there
> > > >>>> would be no id clash with notebooks already on that server…
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> *From:* Jeff Zhang <zj...@gmail.com>
> > > >>>> *Sent:* 14 August 2018 03:49
> > > >>>> *To:* users@zeppelin.apache.org
> > > >>>>
> > > >>>>
> > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> > [Title].zpln
> > > >>>> instead of [NOTEID]/note.json
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Thanks for the discussion.
> > > >>>>
> > > >>>> >>> I'm afraid about non-latin symbols in folder and note name. And
> > > >>>> what about hieroglyphs?
> > > >>>>
> > > >>>> AFAIK, linux allow all the characters to be file name except `\0`
> > and
> > > >>>> '/'.  I can create file name with Chinese character in linux, I
> > guess you
> > > >>>> can use Russian as well.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >>> If I understand correctly, this is being done solely to speed up
> > > >>>> loading list of notebooks? What if a list of notebook names, their
> > ids,
> > > >>>> folder structure, etc can be *cached* in a separate small json
> > file? Or
> > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> > would
> > > >>>> do? Just thinking out loud. This would require a way to lazily
> > re-sync the
> > > >>>> cache.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> This not only to speed up the loading but also make the system
> > > >>>> architecture easy to maintain. Because for now we have to build the
> > folder
> > > >>>> structure of notes in memory, many code in zeppelin is doing this
> > > >>>> (Personally I don't think we need any code for this function if we
> > could
> > > >>>> get the folder structure from the note file storage system). Use
> > another
> > > >>>> storage to keep the mapping of note name and note id will bring
> > another
> > > >>>> classic problem of distributed system: consistency. How do we make
> > sure the
> > > >>>> consistency between the real note file and this mapping component.
> > If we
> > > >>>> create/rename/remove note, we have to both update the notebook repo
> > and the
> > > >>>> mapping storage. Any bug in code would bring inconsistency issue
> > based on
> > > >>>> my experience.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
> > > >>>>
> > > >>>> Thanks for bringing this up for discussion. My 2 cents below.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> I am with Maksim and Felix on concerns with special characters now
> > > >>>> allowed in notebook names, and also concerns with different
> > charsets.
> > > >>>> Russian language, for example, most commonly use iso-8859-5,
> > koi-8r/u,
> > > >>>> windows-1251 charsets etc. This seems like will bring whole new set
> > of
> > > >>>> localization issues.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> If I understand correctly, this is being done solely to speed up
> > > >>>> loading list of notebooks? What if a list of notebook names, their
> > ids,
> > > >>>> folder structure, etc can be *cached* in a separate small json
> > file? Or
> > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> > would
> > > >>>> do? Just thinking out loud. This would require a way to lazily
> > re-sync the
> > > >>>> cache.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Another way to speed up json reads is to somehow force "name"
> > attribute
> > > >>>> to be at the top of the json document that's written to disk. Then
> > > >>>> re-implement json files reader to read just header of the file and
> > do a
> > > >>>> partial json parse ( or in the lack of options, grab "name"
> > attribute from
> > > >>>> the json file header by a regex for example).
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Back to filenames and charsets, I think issue may be more
> > complicated,
> > > >>>> if you store notebooks on a remote filesystem (nfs/ samba etc), and
> > what if
> > > >>>> remote server and local nfs client have differences in default fs
> > charsets?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Ideally would be if all filesystems would use UTF-8 for example,
> > but I
> > > >>>> am not certain that's a good assumption to make. Also exposing
> > notebook
> > > >>>> names can bring some other issues, like I know some users
> > occasionally add
> > > >>>> trailing/leading spaces etc.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> > > >>>> m.belousov@tinkoff.ru> wrote:
> > > >>>>
> > > >>>> The use of Russian and other specific letters in the note name is
> > big
> > > >>>> advantage of Zeppelin. I would not like to give up this
> > functionality.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> I support the idea about `zpln` file extension.
> > > >>>>
> > > >>>> The folder structure also sounds good.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> I'm afraid about non-latin symbols in folder and note name. And what
> > > >>>> about hieroglyphs?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Apache Zeppelin may be the first to use Russian letters in file
> > system
> > > >>>> in our company.
> > > >>>>
> > > >>>> I see a lot of risks to use non-latin symbols and a lot of issues to
> > > >>>> make new folder structure stable.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> ------------------------------
> > > >>>>
> > > >>>> *От:* Jeff Zhang <zj...@gmail.com>
> > > >>>> *Отправлено:* 13 августа 2018 г. 12:50
> > > >>>> *Кому:* users@zeppelin.apache.org
> > > >>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> > instead
> > > >>>> of [NOTEID]/note.json
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >>> Do we need the note id in the file name at all? What’s wrong
> > with
> > > >>>> just note_name.zpln?
> > > >>>>
> > > >>>> The reason I keep note id is because currently we use noteId to
> > > >>>> identify one note. e.g. we use note id in both websocket api and
> > rest api.
> > > >>>> It is almost impossible to remove noteId for the current
> > architecture. If
> > > >>>> we put note id into file content of note_name.zpln, then we have to
> > read
> > > >>>> the note file every time, then we meet the issues I mentioned above
> > again.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >>> If the file content is json then why not use note_name.json
> > instead
> > > >>>> of .zpln? That would make it easier for editors to know how to
> > > >>>> load/highlight the file contents.
> > > >>>>
> > > >>>> I am not strongly biased on *.zpln. But I think one purpose is to
> > help
> > > >>>> third parties to identify zeppelin note properly. e.g. github can
> > identify
> > > >>>> jupyter notebook (*.ipynb) and render it properly.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >>> Is there any reason for not using *real* folders or directories
> > > >>>> for organising the notebooks rather than embedding the folder
> > hierarchy in
> > > >>>> the names of the notebooks?  If someone wants to ‘move’ the
> > notebooks to
> > > >>>> another folder they’d have to manually rename all the
> > files/notebooks at
> > > >>>> present.  That’s not very user-friendly.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Actually my proposal is to use real folders. What user see in
> > zeppelin
> > > >>>> note menu is the actual notes folder structure. If they want to
> > move the
> > > >>>> notebooks to another folder, they can change the folder name just
> > like what
> > > >>>> user did in file system.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一
> > 下午
> > > >>>> 4:43写道:
> > > >>>>
> > > >>>> Hi Jeff,
> > > >>>>
> > > >>>> I have some questions about this proposal (I can’t edit the design
> > doc):
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>    1. Do we need the note id in the file name at all? What’s wrong
> > > >>>>    with just note_name.zpln?
> > > >>>>    2. If the file content is json then why not use note_name.json
> > > >>>>    instead of .zpln? That would make it easier for editors to know
> > how to
> > > >>>>    load/highlight the file contents.
> > > >>>>    3. Is there any reason for not using *real* folders or
> > directories
> > > >>>>    for organising the notebooks rather than embedding the folder
> > hierarchy in
> > > >>>>    the names of the notebooks?  If someone wants to ‘move’ the
> > notebooks to
> > > >>>>    another folder they’d have to manually rename all the
> > files/notebooks at
> > > >>>>    present.  That’s not very user-friendly.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Thanks, Lucas.
> > > >>>>
> > > >>>> *From:* Jeff Zhang <zj...@gmail.com>
> > > >>>> *Sent:* 13 August 2018 09:06
> > > >>>> *To:* users@zeppelin.apache.org
> > > >>>> *Cc:* dev <de...@zeppelin.apache.org>
> > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> > [Title].zpln
> > > >>>> instead of [NOTEID]/note.json
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> In that case, zeppelin should fail to create note.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
> > > >>>>
> > > >>>> Perhaps one concern is users having characters in note name that are
> > > >>>> invalid for file name/file path?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> ------------------------------
> > > >>>>
> > > >>>> *From:* Mohit Jaggi <mo...@gmail.com>
> > > >>>> *Sent:* Sunday, August 12, 2018 6:02 PM
> > > >>>> *To:* users@zeppelin.apache.org
> > > >>>> *Cc:* dev
> > > >>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> > > >>>> instead of [NOTEID]/note.json
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> sounds like a good idea!
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>
> > wrote:
> > > >>>>
> > > >>>> Motivation
> > > >>>>
> > > >>>>    The motivation of ZEPPELIN-2619 is to change the notes storage
> > > >>>> structure. Previously we store it using {noteId}/note.json, we’d
> > like to
> > > >>>> change it into {note_name}_{note_id}.zpln. There are several
> > reasons for
> > > >>>> this change.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>    1. {noteId}/note.json is not scalable. We put all notes in one
> > root
> > > >>>>    folder in flat structure. And when zeppelin server starts, we
> > need to read
> > > >>>>    all note.json to get the note file name and build the note
> > folder structure
> > > >>>>    (Because we need to get the note name which is stored in
> > note.json to build
> > > >>>>    the notebook menu). This would be a nightmare when you have
> > large amounts
> > > >>>>    of notes.
> > > >>>>    2. {noteId}/note.json is not maintainable. It is difficult for a
> > > >>>>    developer/administrator to find note file based on note name.
> > > >>>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
> > > >>>>    have to build the folder structure internally in memory
> > according note name
> > > >>>>    which is a big overhead.
> > > >>>>
> > > >>>>
> > > >>>> New Approach
> > > >>>>
> > > >>>>    As I mentioned above, I propose to change the note storage
> > structure
> > > >>>> to {note_name}_{note_id}.zpln.  note_name could contains folders,
> > e.g.
> > > >>>> folder_1/mynote_abcd.zpln
> > > >>>>
> > > >>>> This kind of note storage structure could bring several benefits.
> > > >>>>
> > > >>>>    1. We don’t need to load all notes when zeppelin starts. We just
> > > >>>>    need to list each folder to get the note name and note_id.
> > > >>>>    2. It is much maintainable so that it is easy to find the note
> > file
> > > >>>>    based on note name.
> > > >>>>    3. It has the folder structure already. That can be mapped to the
> > > >>>>    note folder structure.
> > > >>>>
> > > >>>>
> > > >>>> Side Effect
> > > >>>>
> > > >>>> This approach only works for file system storage, so that means we
> > have
> > > >>>> to drop support for MongoNotebookRepo. I think it is ok because I
> > didn’t
> > > >>>> see any users talk about this in community, so I assume no one is
> > using it.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> This is overall design, welcome any comments and feedback. Thanks.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Here's the google docs, you can also comment it here.
> > > >>>>
> > > >>>>
> > > >>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8T
> > ATYcGkDL1DNZoE/edit?usp=sharing
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >> 이종열, Jongyoul Lee, 李宗烈
> > > >> http://madeng.net
> > > >>
> > > >
> > >
> >
> 
> 
> 
> -- 
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
> 

Posted by an...@gmail.com, an...@gmail.com.
Sure, there might exist some kind of naming policy (or let's better call it naming convention) in zeppelin multiuser environments. But as long as there is no way to technically enforce a certain naming convention to turn it into a naming policy (which would be a nice idea BTW), it's IMHO a vague assumption, that there exist policies and users are following these. I'm thinking here of problems that might come up when changing the existing implementation and then deal with migration, because assumptions do not match reality.

My real life scenario here is, that zeppelin can be configured to make notebooks visible only to the owner (and invisible to any other user) by default: ZEPPELIN_NOTEBOOK_PUBLIC=false, which is IMHO a good idea when setting up zeppelin as multiuser environment in larger scenarios. In this case note owners can use any or no naming convention they like when creating and using a note for personal purposes only, because only the owner will see it - also and even if there exists certain naming policies on an global organisation level. A global naming convention must only be followed when users start sharing notes (means: adding at least reader permissions to any other user).

So I think noteId is a must have in the filename.

Andreas

On 2018/08/31 01:46:38, Jongyoul Lee <jo...@gmail.com> wrote: 
> Hi,
> 
> I have a bit different thoughts about the conflicts of the name of a new
> note created. In a multiuser environment, AFAIK, most teams and companies,
> generally, use a prefix for the group policy internally. In my case,
> user/{user_id}/{notebook_name_they_want}.zpln. In this case, naming
> conflicts rarely happen. And it will be stored under a specific folder. If
> someone needed two different same named notes in the same directory, I
> might not be appropriate. WDYT?
> 
> JL
> 
> On Fri, Aug 31, 2018 at 4:44 AM, andreas.weise@gmail.com <
> andreas.weise@gmail.com> wrote:
> 
> > another reason for keeping noteId is uniqueness in case of multi-user
> > environments. In that case users have separate zeppelin workspaces, which
> > is something we are using in production: see ZEPPELIN_NOTEBOOK_PUBLIC=false
> > in the doc [1]. In that case users might be very confused when they can not
> > create notebooks with a name that already exists, but they most likely
> > don't see (yet).
> >
> > So I like the proposal {note_name}_{note_id}.zpln. where note_name could
> > contains folders, e.g. folder_1/mynote_abcd.zpln. Even though I like
> > {note_name}.{note_id}.zpln (dot in between note_name and note_id) even
> > better :-)
> >
> > Regards
> > Andreas
> >
> >
> > [1] http://zeppelin.apache.org/docs/0.8.0/setup/security/
> > notebook_authorization.html#separate-notebook-workspaces-public-vs-private
> >
> > On 2018/08/18 08:42:44, Jeff Zhang <zj...@gmail.com> wrote:
> > > BTW, I also prefer to use note name as identify of note if the issue I
> > > mentioned before is acceptable for most of users.
> > >
> > >
> > >
> > > Jeff Zhang <zj...@gmail.com>于2018年8月18日周六 下午4:40写道:
> > >
> > > >
> > > > I am afraid we can not remove noteId, as noteId is the unique
> > identifier
> > > > of note and is immutable which is used in a lot places, such as
> > paragraph
> > > > share and rest api.
> > > > If we use note name as note id then it may break user's app if note
> > name
> > > > is changed
> > > >
> > > >
> > > > Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:33写道:
> > > >
> > > >> Hi, thanks for this kind of discussion.
> > > >>
> > > >> About noteId, How about changing note id to note name? AFAIK, Note id
> > is
> > > >> just an identifier and we can set any value to it.
> > > >>
> > > >> There’re two potential problems. We should be more careful to handle
> > note
> > > >> id as it could have very various type of characters. And Second, in
> > case
> > > >> where someone changes a note name, those who are seeing and updating
> > the
> > > >> same note wouldn’t access that note. We could handle it by using
> > websockets.
> > > >>
> > > >> WDYT?
> > > >>
> > > >> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zj...@gmail.com> wrote:
> > > >>
> > > >>> >>> But I’m still not comfortable with note ids in the name of the
> > > >>> notebook itself.  Those names would look ugly if you shared your
> > notebooks
> > > >>> on github for example.  You don’t see Jupyter notebooks with names
> > like
> > > >>> that. If you have to keep the note ids with the notebooks could you
> > not
> > > >>> simply put the note id at the top of the notebook as Ruslan
> > suggested? Then
> > > >>> you’d only have to read the first line of each notebook.
> > > >>>
> > > >>> I know putting note_id in the note file name is not so elegant, but
> > this
> > > >>> is what we have to compromise to keep compatibility as we use noteId
> > to
> > > >>> uniquely identify note right now. And I don't think putting noteId
> > in the
> > > >>> top first line of note would help much. We still have to read note
> > files
> > > >>> which take much more time than just read the file names via file
> > system.
> > > >>>
> > > >>> Regarding the readability of note file name, I think it won't affect
> > > >>> much. E.g. This is the note book file name like:  *My Project/My
> > Spark
> > > >>> Tutorial Note_2A94M5J1Z.zpln*
> > > >>> What user see in notebook menu is still *My Project/My Spark
> > Tutorial* *Note
> > > >>> *which is no difference from what we see now.
> > > >>>
> > > >>> And thanks again for the feedback and comments, I am so glad to see
> > so
> > > >>> many discussion in community.
> > > >>>
> > > >>>
> > > >>>
> > > >>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月14日周二
> > > >>> 下午4:29写道:
> > > >>>
> > > >>>> I agree you’re inviting consistency issues if you maintained a
> > separate
> > > >>>> note id-to-note name mapping file.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> But I’m still not comfortable with note ids in the name of the
> > notebook
> > > >>>> itself.  Those names would look ugly if you shared your notebooks
> > on github
> > > >>>> for example.  You don’t see Jupyter notebooks with names like
> > that.  If you
> > > >>>> have to keep the note ids with the notebooks could you not simply
> > put the
> > > >>>> note id at the top of the notebook as Ruslan suggested? Then you’d
> > only
> > > >>>> have to read the first line of each notebook.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Presumably if you copied the notebooks to another Zeppelin server
> > they
> > > >>>> would be restored with the same note ids there too? And hopefully
> > there
> > > >>>> would be no id clash with notebooks already on that server…
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> *From:* Jeff Zhang <zj...@gmail.com>
> > > >>>> *Sent:* 14 August 2018 03:49
> > > >>>> *To:* users@zeppelin.apache.org
> > > >>>>
> > > >>>>
> > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> > [Title].zpln
> > > >>>> instead of [NOTEID]/note.json
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Thanks for the discussion.
> > > >>>>
> > > >>>> >>> I'm afraid about non-latin symbols in folder and note name. And
> > > >>>> what about hieroglyphs?
> > > >>>>
> > > >>>> AFAIK, linux allow all the characters to be file name except `\0`
> > and
> > > >>>> '/'.  I can create file name with Chinese character in linux, I
> > guess you
> > > >>>> can use Russian as well.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >>> If I understand correctly, this is being done solely to speed up
> > > >>>> loading list of notebooks? What if a list of notebook names, their
> > ids,
> > > >>>> folder structure, etc can be *cached* in a separate small json
> > file? Or
> > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> > would
> > > >>>> do? Just thinking out loud. This would require a way to lazily
> > re-sync the
> > > >>>> cache.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> This not only to speed up the loading but also make the system
> > > >>>> architecture easy to maintain. Because for now we have to build the
> > folder
> > > >>>> structure of notes in memory, many code in zeppelin is doing this
> > > >>>> (Personally I don't think we need any code for this function if we
> > could
> > > >>>> get the folder structure from the note file storage system). Use
> > another
> > > >>>> storage to keep the mapping of note name and note id will bring
> > another
> > > >>>> classic problem of distributed system: consistency. How do we make
> > sure the
> > > >>>> consistency between the real note file and this mapping component.
> > If we
> > > >>>> create/rename/remove note, we have to both update the notebook repo
> > and the
> > > >>>> mapping storage. Any bug in code would bring inconsistency issue
> > based on
> > > >>>> my experience.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
> > > >>>>
> > > >>>> Thanks for bringing this up for discussion. My 2 cents below.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> I am with Maksim and Felix on concerns with special characters now
> > > >>>> allowed in notebook names, and also concerns with different
> > charsets.
> > > >>>> Russian language, for example, most commonly use iso-8859-5,
> > koi-8r/u,
> > > >>>> windows-1251 charsets etc. This seems like will bring whole new set
> > of
> > > >>>> localization issues.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> If I understand correctly, this is being done solely to speed up
> > > >>>> loading list of notebooks? What if a list of notebook names, their
> > ids,
> > > >>>> folder structure, etc can be *cached* in a separate small json
> > file? Or
> > > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> > would
> > > >>>> do? Just thinking out loud. This would require a way to lazily
> > re-sync the
> > > >>>> cache.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Another way to speed up json reads is to somehow force "name"
> > attribute
> > > >>>> to be at the top of the json document that's written to disk. Then
> > > >>>> re-implement json files reader to read just header of the file and
> > do a
> > > >>>> partial json parse ( or in the lack of options, grab "name"
> > attribute from
> > > >>>> the json file header by a regex for example).
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Back to filenames and charsets, I think issue may be more
> > complicated,
> > > >>>> if you store notebooks on a remote filesystem (nfs/ samba etc), and
> > what if
> > > >>>> remote server and local nfs client have differences in default fs
> > charsets?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Ideally would be if all filesystems would use UTF-8 for example,
> > but I
> > > >>>> am not certain that's a good assumption to make. Also exposing
> > notebook
> > > >>>> names can bring some other issues, like I know some users
> > occasionally add
> > > >>>> trailing/leading spaces etc.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> > > >>>> m.belousov@tinkoff.ru> wrote:
> > > >>>>
> > > >>>> The use of Russian and other specific letters in the note name is
> > big
> > > >>>> advantage of Zeppelin. I would not like to give up this
> > functionality.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> I support the idea about `zpln` file extension.
> > > >>>>
> > > >>>> The folder structure also sounds good.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> I'm afraid about non-latin symbols in folder and note name. And what
> > > >>>> about hieroglyphs?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Apache Zeppelin may be the first to use Russian letters in file
> > system
> > > >>>> in our company.
> > > >>>>
> > > >>>> I see a lot of risks to use non-latin symbols and a lot of issues to
> > > >>>> make new folder structure stable.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> ------------------------------
> > > >>>>
> > > >>>> *От:* Jeff Zhang <zj...@gmail.com>
> > > >>>> *Отправлено:* 13 августа 2018 г. 12:50
> > > >>>> *Кому:* users@zeppelin.apache.org
> > > >>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> > instead
> > > >>>> of [NOTEID]/note.json
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >>> Do we need the note id in the file name at all? What’s wrong
> > with
> > > >>>> just note_name.zpln?
> > > >>>>
> > > >>>> The reason I keep note id is because currently we use noteId to
> > > >>>> identify one note. e.g. we use note id in both websocket api and
> > rest api.
> > > >>>> It is almost impossible to remove noteId for the current
> > architecture. If
> > > >>>> we put note id into file content of note_name.zpln, then we have to
> > read
> > > >>>> the note file every time, then we meet the issues I mentioned above
> > again.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >>> If the file content is json then why not use note_name.json
> > instead
> > > >>>> of .zpln? That would make it easier for editors to know how to
> > > >>>> load/highlight the file contents.
> > > >>>>
> > > >>>> I am not strongly biased on *.zpln. But I think one purpose is to
> > help
> > > >>>> third parties to identify zeppelin note properly. e.g. github can
> > identify
> > > >>>> jupyter notebook (*.ipynb) and render it properly.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >>> Is there any reason for not using *real* folders or directories
> > > >>>> for organising the notebooks rather than embedding the folder
> > hierarchy in
> > > >>>> the names of the notebooks?  If someone wants to ‘move’ the
> > notebooks to
> > > >>>> another folder they’d have to manually rename all the
> > files/notebooks at
> > > >>>> present.  That’s not very user-friendly.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Actually my proposal is to use real folders. What user see in
> > zeppelin
> > > >>>> note menu is the actual notes folder structure. If they want to
> > move the
> > > >>>> notebooks to another folder, they can change the folder name just
> > like what
> > > >>>> user did in file system.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一
> > 下午
> > > >>>> 4:43写道:
> > > >>>>
> > > >>>> Hi Jeff,
> > > >>>>
> > > >>>> I have some questions about this proposal (I can’t edit the design
> > doc):
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>    1. Do we need the note id in the file name at all? What’s wrong
> > > >>>>    with just note_name.zpln?
> > > >>>>    2. If the file content is json then why not use note_name.json
> > > >>>>    instead of .zpln? That would make it easier for editors to know
> > how to
> > > >>>>    load/highlight the file contents.
> > > >>>>    3. Is there any reason for not using *real* folders or
> > directories
> > > >>>>    for organising the notebooks rather than embedding the folder
> > hierarchy in
> > > >>>>    the names of the notebooks?  If someone wants to ‘move’ the
> > notebooks to
> > > >>>>    another folder they’d have to manually rename all the
> > files/notebooks at
> > > >>>>    present.  That’s not very user-friendly.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Thanks, Lucas.
> > > >>>>
> > > >>>> *From:* Jeff Zhang <zj...@gmail.com>
> > > >>>> *Sent:* 13 August 2018 09:06
> > > >>>> *To:* users@zeppelin.apache.org
> > > >>>> *Cc:* dev <de...@zeppelin.apache.org>
> > > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> > [Title].zpln
> > > >>>> instead of [NOTEID]/note.json
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> In that case, zeppelin should fail to create note.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
> > > >>>>
> > > >>>> Perhaps one concern is users having characters in note name that are
> > > >>>> invalid for file name/file path?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> ------------------------------
> > > >>>>
> > > >>>> *From:* Mohit Jaggi <mo...@gmail.com>
> > > >>>> *Sent:* Sunday, August 12, 2018 6:02 PM
> > > >>>> *To:* users@zeppelin.apache.org
> > > >>>> *Cc:* dev
> > > >>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> > > >>>> instead of [NOTEID]/note.json
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> sounds like a good idea!
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>
> > wrote:
> > > >>>>
> > > >>>> Motivation
> > > >>>>
> > > >>>>    The motivation of ZEPPELIN-2619 is to change the notes storage
> > > >>>> structure. Previously we store it using {noteId}/note.json, we’d
> > like to
> > > >>>> change it into {note_name}_{note_id}.zpln. There are several
> > reasons for
> > > >>>> this change.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>    1. {noteId}/note.json is not scalable. We put all notes in one
> > root
> > > >>>>    folder in flat structure. And when zeppelin server starts, we
> > need to read
> > > >>>>    all note.json to get the note file name and build the note
> > folder structure
> > > >>>>    (Because we need to get the note name which is stored in
> > note.json to build
> > > >>>>    the notebook menu). This would be a nightmare when you have
> > large amounts
> > > >>>>    of notes.
> > > >>>>    2. {noteId}/note.json is not maintainable. It is difficult for a
> > > >>>>    developer/administrator to find note file based on note name.
> > > >>>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
> > > >>>>    have to build the folder structure internally in memory
> > according note name
> > > >>>>    which is a big overhead.
> > > >>>>
> > > >>>>
> > > >>>> New Approach
> > > >>>>
> > > >>>>    As I mentioned above, I propose to change the note storage
> > structure
> > > >>>> to {note_name}_{note_id}.zpln.  note_name could contains folders,
> > e.g.
> > > >>>> folder_1/mynote_abcd.zpln
> > > >>>>
> > > >>>> This kind of note storage structure could bring several benefits.
> > > >>>>
> > > >>>>    1. We don’t need to load all notes when zeppelin starts. We just
> > > >>>>    need to list each folder to get the note name and note_id.
> > > >>>>    2. It is much maintainable so that it is easy to find the note
> > file
> > > >>>>    based on note name.
> > > >>>>    3. It has the folder structure already. That can be mapped to the
> > > >>>>    note folder structure.
> > > >>>>
> > > >>>>
> > > >>>> Side Effect
> > > >>>>
> > > >>>> This approach only works for file system storage, so that means we
> > have
> > > >>>> to drop support for MongoNotebookRepo. I think it is ok because I
> > didn’t
> > > >>>> see any users talk about this in community, so I assume no one is
> > using it.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> This is overall design, welcome any comments and feedback. Thanks.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Here's the google docs, you can also comment it here.
> > > >>>>
> > > >>>>
> > > >>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8T
> > ATYcGkDL1DNZoE/edit?usp=sharing
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >> 이종열, Jongyoul Lee, 李宗烈
> > > >> http://madeng.net
> > > >>
> > > >
> > >
> >
> 
> 
> 
> -- 
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
> 

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jongyoul Lee <jo...@gmail.com>.
Hi,

I have a bit different thoughts about the conflicts of the name of a new
note created. In a multiuser environment, AFAIK, most teams and companies,
generally, use a prefix for the group policy internally. In my case,
user/{user_id}/{notebook_name_they_want}.zpln. In this case, naming
conflicts rarely happen. And it will be stored under a specific folder. If
someone needed two different same named notes in the same directory, I
might not be appropriate. WDYT?

JL

On Fri, Aug 31, 2018 at 4:44 AM, andreas.weise@gmail.com <
andreas.weise@gmail.com> wrote:

> another reason for keeping noteId is uniqueness in case of multi-user
> environments. In that case users have separate zeppelin workspaces, which
> is something we are using in production: see ZEPPELIN_NOTEBOOK_PUBLIC=false
> in the doc [1]. In that case users might be very confused when they can not
> create notebooks with a name that already exists, but they most likely
> don't see (yet).
>
> So I like the proposal {note_name}_{note_id}.zpln. where note_name could
> contains folders, e.g. folder_1/mynote_abcd.zpln. Even though I like
> {note_name}.{note_id}.zpln (dot in between note_name and note_id) even
> better :-)
>
> Regards
> Andreas
>
>
> [1] http://zeppelin.apache.org/docs/0.8.0/setup/security/
> notebook_authorization.html#separate-notebook-workspaces-public-vs-private
>
> On 2018/08/18 08:42:44, Jeff Zhang <zj...@gmail.com> wrote:
> > BTW, I also prefer to use note name as identify of note if the issue I
> > mentioned before is acceptable for most of users.
> >
> >
> >
> > Jeff Zhang <zj...@gmail.com>于2018年8月18日周六 下午4:40写道:
> >
> > >
> > > I am afraid we can not remove noteId, as noteId is the unique
> identifier
> > > of note and is immutable which is used in a lot places, such as
> paragraph
> > > share and rest api.
> > > If we use note name as note id then it may break user's app if note
> name
> > > is changed
> > >
> > >
> > > Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:33写道:
> > >
> > >> Hi, thanks for this kind of discussion.
> > >>
> > >> About noteId, How about changing note id to note name? AFAIK, Note id
> is
> > >> just an identifier and we can set any value to it.
> > >>
> > >> There’re two potential problems. We should be more careful to handle
> note
> > >> id as it could have very various type of characters. And Second, in
> case
> > >> where someone changes a note name, those who are seeing and updating
> the
> > >> same note wouldn’t access that note. We could handle it by using
> websockets.
> > >>
> > >> WDYT?
> > >>
> > >> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zj...@gmail.com> wrote:
> > >>
> > >>> >>> But I’m still not comfortable with note ids in the name of the
> > >>> notebook itself.  Those names would look ugly if you shared your
> notebooks
> > >>> on github for example.  You don’t see Jupyter notebooks with names
> like
> > >>> that. If you have to keep the note ids with the notebooks could you
> not
> > >>> simply put the note id at the top of the notebook as Ruslan
> suggested? Then
> > >>> you’d only have to read the first line of each notebook.
> > >>>
> > >>> I know putting note_id in the note file name is not so elegant, but
> this
> > >>> is what we have to compromise to keep compatibility as we use noteId
> to
> > >>> uniquely identify note right now. And I don't think putting noteId
> in the
> > >>> top first line of note would help much. We still have to read note
> files
> > >>> which take much more time than just read the file names via file
> system.
> > >>>
> > >>> Regarding the readability of note file name, I think it won't affect
> > >>> much. E.g. This is the note book file name like:  *My Project/My
> Spark
> > >>> Tutorial Note_2A94M5J1Z.zpln*
> > >>> What user see in notebook menu is still *My Project/My Spark
> Tutorial* *Note
> > >>> *which is no difference from what we see now.
> > >>>
> > >>> And thanks again for the feedback and comments, I am so glad to see
> so
> > >>> many discussion in community.
> > >>>
> > >>>
> > >>>
> > >>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月14日周二
> > >>> 下午4:29写道:
> > >>>
> > >>>> I agree you’re inviting consistency issues if you maintained a
> separate
> > >>>> note id-to-note name mapping file.
> > >>>>
> > >>>>
> > >>>>
> > >>>> But I’m still not comfortable with note ids in the name of the
> notebook
> > >>>> itself.  Those names would look ugly if you shared your notebooks
> on github
> > >>>> for example.  You don’t see Jupyter notebooks with names like
> that.  If you
> > >>>> have to keep the note ids with the notebooks could you not simply
> put the
> > >>>> note id at the top of the notebook as Ruslan suggested? Then you’d
> only
> > >>>> have to read the first line of each notebook.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Presumably if you copied the notebooks to another Zeppelin server
> they
> > >>>> would be restored with the same note ids there too? And hopefully
> there
> > >>>> would be no id clash with notebooks already on that server…
> > >>>>
> > >>>>
> > >>>>
> > >>>> *From:* Jeff Zhang <zj...@gmail.com>
> > >>>> *Sent:* 14 August 2018 03:49
> > >>>> *To:* users@zeppelin.apache.org
> > >>>>
> > >>>>
> > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> [Title].zpln
> > >>>> instead of [NOTEID]/note.json
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> Thanks for the discussion.
> > >>>>
> > >>>> >>> I'm afraid about non-latin symbols in folder and note name. And
> > >>>> what about hieroglyphs?
> > >>>>
> > >>>> AFAIK, linux allow all the characters to be file name except `\0`
> and
> > >>>> '/'.  I can create file name with Chinese character in linux, I
> guess you
> > >>>> can use Russian as well.
> > >>>>
> > >>>>
> > >>>>
> > >>>> >>> If I understand correctly, this is being done solely to speed up
> > >>>> loading list of notebooks? What if a list of notebook names, their
> ids,
> > >>>> folder structure, etc can be *cached* in a separate small json
> file? Or
> > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> would
> > >>>> do? Just thinking out loud. This would require a way to lazily
> re-sync the
> > >>>> cache.
> > >>>>
> > >>>>
> > >>>>
> > >>>> This not only to speed up the loading but also make the system
> > >>>> architecture easy to maintain. Because for now we have to build the
> folder
> > >>>> structure of notes in memory, many code in zeppelin is doing this
> > >>>> (Personally I don't think we need any code for this function if we
> could
> > >>>> get the folder structure from the note file storage system). Use
> another
> > >>>> storage to keep the mapping of note name and note id will bring
> another
> > >>>> classic problem of distributed system: consistency. How do we make
> sure the
> > >>>> consistency between the real note file and this mapping component.
> If we
> > >>>> create/rename/remove note, we have to both update the notebook repo
> and the
> > >>>> mapping storage. Any bug in code would bring inconsistency issue
> based on
> > >>>> my experience.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
> > >>>>
> > >>>> Thanks for bringing this up for discussion. My 2 cents below.
> > >>>>
> > >>>>
> > >>>>
> > >>>> I am with Maksim and Felix on concerns with special characters now
> > >>>> allowed in notebook names, and also concerns with different
> charsets.
> > >>>> Russian language, for example, most commonly use iso-8859-5,
> koi-8r/u,
> > >>>> windows-1251 charsets etc. This seems like will bring whole new set
> of
> > >>>> localization issues.
> > >>>>
> > >>>>
> > >>>>
> > >>>> If I understand correctly, this is being done solely to speed up
> > >>>> loading list of notebooks? What if a list of notebook names, their
> ids,
> > >>>> folder structure, etc can be *cached* in a separate small json
> file? Or
> > >>>> perhaps in a small embedded key-value store, like www.mapdb.org
> would
> > >>>> do? Just thinking out loud. This would require a way to lazily
> re-sync the
> > >>>> cache.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Another way to speed up json reads is to somehow force "name"
> attribute
> > >>>> to be at the top of the json document that's written to disk. Then
> > >>>> re-implement json files reader to read just header of the file and
> do a
> > >>>> partial json parse ( or in the lack of options, grab "name"
> attribute from
> > >>>> the json file header by a regex for example).
> > >>>>
> > >>>>
> > >>>>
> > >>>> Back to filenames and charsets, I think issue may be more
> complicated,
> > >>>> if you store notebooks on a remote filesystem (nfs/ samba etc), and
> what if
> > >>>> remote server and local nfs client have differences in default fs
> charsets?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Ideally would be if all filesystems would use UTF-8 for example,
> but I
> > >>>> am not certain that's a good assumption to make. Also exposing
> notebook
> > >>>> names can bring some other issues, like I know some users
> occasionally add
> > >>>> trailing/leading spaces etc.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> > >>>> m.belousov@tinkoff.ru> wrote:
> > >>>>
> > >>>> The use of Russian and other specific letters in the note name is
> big
> > >>>> advantage of Zeppelin. I would not like to give up this
> functionality.
> > >>>>
> > >>>>
> > >>>>
> > >>>> I support the idea about `zpln` file extension.
> > >>>>
> > >>>> The folder structure also sounds good.
> > >>>>
> > >>>>
> > >>>>
> > >>>> I'm afraid about non-latin symbols in folder and note name. And what
> > >>>> about hieroglyphs?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Apache Zeppelin may be the first to use Russian letters in file
> system
> > >>>> in our company.
> > >>>>
> > >>>> I see a lot of risks to use non-latin symbols and a lot of issues to
> > >>>> make new folder structure stable.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> ------------------------------
> > >>>>
> > >>>> *От:* Jeff Zhang <zj...@gmail.com>
> > >>>> *Отправлено:* 13 августа 2018 г. 12:50
> > >>>> *Кому:* users@zeppelin.apache.org
> > >>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> instead
> > >>>> of [NOTEID]/note.json
> > >>>>
> > >>>>
> > >>>>
> > >>>> >>> Do we need the note id in the file name at all? What’s wrong
> with
> > >>>> just note_name.zpln?
> > >>>>
> > >>>> The reason I keep note id is because currently we use noteId to
> > >>>> identify one note. e.g. we use note id in both websocket api and
> rest api.
> > >>>> It is almost impossible to remove noteId for the current
> architecture. If
> > >>>> we put note id into file content of note_name.zpln, then we have to
> read
> > >>>> the note file every time, then we meet the issues I mentioned above
> again.
> > >>>>
> > >>>>
> > >>>>
> > >>>> >>> If the file content is json then why not use note_name.json
> instead
> > >>>> of .zpln? That would make it easier for editors to know how to
> > >>>> load/highlight the file contents.
> > >>>>
> > >>>> I am not strongly biased on *.zpln. But I think one purpose is to
> help
> > >>>> third parties to identify zeppelin note properly. e.g. github can
> identify
> > >>>> jupyter notebook (*.ipynb) and render it properly.
> > >>>>
> > >>>>
> > >>>>
> > >>>> >>> Is there any reason for not using *real* folders or directories
> > >>>> for organising the notebooks rather than embedding the folder
> hierarchy in
> > >>>> the names of the notebooks?  If someone wants to ‘move’ the
> notebooks to
> > >>>> another folder they’d have to manually rename all the
> files/notebooks at
> > >>>> present.  That’s not very user-friendly.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Actually my proposal is to use real folders. What user see in
> zeppelin
> > >>>> note menu is the actual notes folder structure. If they want to
> move the
> > >>>> notebooks to another folder, they can change the folder name just
> like what
> > >>>> user did in file system.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一
> 下午
> > >>>> 4:43写道:
> > >>>>
> > >>>> Hi Jeff,
> > >>>>
> > >>>> I have some questions about this proposal (I can’t edit the design
> doc):
> > >>>>
> > >>>>
> > >>>>
> > >>>>    1. Do we need the note id in the file name at all? What’s wrong
> > >>>>    with just note_name.zpln?
> > >>>>    2. If the file content is json then why not use note_name.json
> > >>>>    instead of .zpln? That would make it easier for editors to know
> how to
> > >>>>    load/highlight the file contents.
> > >>>>    3. Is there any reason for not using *real* folders or
> directories
> > >>>>    for organising the notebooks rather than embedding the folder
> hierarchy in
> > >>>>    the names of the notebooks?  If someone wants to ‘move’ the
> notebooks to
> > >>>>    another folder they’d have to manually rename all the
> files/notebooks at
> > >>>>    present.  That’s not very user-friendly.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Thanks, Lucas.
> > >>>>
> > >>>> *From:* Jeff Zhang <zj...@gmail.com>
> > >>>> *Sent:* 13 August 2018 09:06
> > >>>> *To:* users@zeppelin.apache.org
> > >>>> *Cc:* dev <de...@zeppelin.apache.org>
> > >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in
> [Title].zpln
> > >>>> instead of [NOTEID]/note.json
> > >>>>
> > >>>>
> > >>>>
> > >>>> In that case, zeppelin should fail to create note.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
> > >>>>
> > >>>> Perhaps one concern is users having characters in note name that are
> > >>>> invalid for file name/file path?
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> ------------------------------
> > >>>>
> > >>>> *From:* Mohit Jaggi <mo...@gmail.com>
> > >>>> *Sent:* Sunday, August 12, 2018 6:02 PM
> > >>>> *To:* users@zeppelin.apache.org
> > >>>> *Cc:* dev
> > >>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> > >>>> instead of [NOTEID]/note.json
> > >>>>
> > >>>>
> > >>>>
> > >>>> sounds like a good idea!
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>
> wrote:
> > >>>>
> > >>>> Motivation
> > >>>>
> > >>>>    The motivation of ZEPPELIN-2619 is to change the notes storage
> > >>>> structure. Previously we store it using {noteId}/note.json, we’d
> like to
> > >>>> change it into {note_name}_{note_id}.zpln. There are several
> reasons for
> > >>>> this change.
> > >>>>
> > >>>>
> > >>>>
> > >>>>    1. {noteId}/note.json is not scalable. We put all notes in one
> root
> > >>>>    folder in flat structure. And when zeppelin server starts, we
> need to read
> > >>>>    all note.json to get the note file name and build the note
> folder structure
> > >>>>    (Because we need to get the note name which is stored in
> note.json to build
> > >>>>    the notebook menu). This would be a nightmare when you have
> large amounts
> > >>>>    of notes.
> > >>>>    2. {noteId}/note.json is not maintainable. It is difficult for a
> > >>>>    developer/administrator to find note file based on note name.
> > >>>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
> > >>>>    have to build the folder structure internally in memory
> according note name
> > >>>>    which is a big overhead.
> > >>>>
> > >>>>
> > >>>> New Approach
> > >>>>
> > >>>>    As I mentioned above, I propose to change the note storage
> structure
> > >>>> to {note_name}_{note_id}.zpln.  note_name could contains folders,
> e.g.
> > >>>> folder_1/mynote_abcd.zpln
> > >>>>
> > >>>> This kind of note storage structure could bring several benefits.
> > >>>>
> > >>>>    1. We don’t need to load all notes when zeppelin starts. We just
> > >>>>    need to list each folder to get the note name and note_id.
> > >>>>    2. It is much maintainable so that it is easy to find the note
> file
> > >>>>    based on note name.
> > >>>>    3. It has the folder structure already. That can be mapped to the
> > >>>>    note folder structure.
> > >>>>
> > >>>>
> > >>>> Side Effect
> > >>>>
> > >>>> This approach only works for file system storage, so that means we
> have
> > >>>> to drop support for MongoNotebookRepo. I think it is ok because I
> didn’t
> > >>>> see any users talk about this in community, so I assume no one is
> using it.
> > >>>>
> > >>>>
> > >>>>
> > >>>> This is overall design, welcome any comments and feedback. Thanks.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Here's the google docs, you can also comment it here.
> > >>>>
> > >>>>
> > >>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8T
> ATYcGkDL1DNZoE/edit?usp=sharing
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >> 이종열, Jongyoul Lee, 李宗烈
> > >> http://madeng.net
> > >>
> > >
> >
>



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by an...@gmail.com, an...@gmail.com.
another reason for keeping noteId is uniqueness in case of multi-user environments. In that case users have separate zeppelin workspaces, which is something we are using in production: see ZEPPELIN_NOTEBOOK_PUBLIC=false in the doc [1]. In that case users might be very confused when they can not create notebooks with a name that already exists, but they most likely don't see (yet).

So I like the proposal {note_name}_{note_id}.zpln. where note_name could contains folders, e.g. folder_1/mynote_abcd.zpln. Even though I like {note_name}.{note_id}.zpln (dot in between note_name and note_id) even better :-)

Regards
Andreas


[1] http://zeppelin.apache.org/docs/0.8.0/setup/security/notebook_authorization.html#separate-notebook-workspaces-public-vs-private

On 2018/08/18 08:42:44, Jeff Zhang <zj...@gmail.com> wrote: 
> BTW, I also prefer to use note name as identify of note if the issue I
> mentioned before is acceptable for most of users.
> 
> 
> 
> Jeff Zhang <zj...@gmail.com>于2018年8月18日周六 下午4:40写道:
> 
> >
> > I am afraid we can not remove noteId, as noteId is the unique identifier
> > of note and is immutable which is used in a lot places, such as paragraph
> > share and rest api.
> > If we use note name as note id then it may break user's app if note name
> > is changed
> >
> >
> > Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:33写道:
> >
> >> Hi, thanks for this kind of discussion.
> >>
> >> About noteId, How about changing note id to note name? AFAIK, Note id is
> >> just an identifier and we can set any value to it.
> >>
> >> There’re two potential problems. We should be more careful to handle note
> >> id as it could have very various type of characters. And Second, in case
> >> where someone changes a note name, those who are seeing and updating the
> >> same note wouldn’t access that note. We could handle it by using websockets.
> >>
> >> WDYT?
> >>
> >> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zj...@gmail.com> wrote:
> >>
> >>> >>> But I’m still not comfortable with note ids in the name of the
> >>> notebook itself.  Those names would look ugly if you shared your notebooks
> >>> on github for example.  You don’t see Jupyter notebooks with names like
> >>> that. If you have to keep the note ids with the notebooks could you not
> >>> simply put the note id at the top of the notebook as Ruslan suggested? Then
> >>> you’d only have to read the first line of each notebook.
> >>>
> >>> I know putting note_id in the note file name is not so elegant, but this
> >>> is what we have to compromise to keep compatibility as we use noteId to
> >>> uniquely identify note right now. And I don't think putting noteId in the
> >>> top first line of note would help much. We still have to read note files
> >>> which take much more time than just read the file names via file system.
> >>>
> >>> Regarding the readability of note file name, I think it won't affect
> >>> much. E.g. This is the note book file name like:  *My Project/My Spark
> >>> Tutorial Note_2A94M5J1Z.zpln*
> >>> What user see in notebook menu is still *My Project/My Spark Tutorial* *Note
> >>> *which is no difference from what we see now.
> >>>
> >>> And thanks again for the feedback and comments, I am so glad to see so
> >>> many discussion in community.
> >>>
> >>>
> >>>
> >>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月14日周二
> >>> 下午4:29写道:
> >>>
> >>>> I agree you’re inviting consistency issues if you maintained a separate
> >>>> note id-to-note name mapping file.
> >>>>
> >>>>
> >>>>
> >>>> But I’m still not comfortable with note ids in the name of the notebook
> >>>> itself.  Those names would look ugly if you shared your notebooks on github
> >>>> for example.  You don’t see Jupyter notebooks with names like that.  If you
> >>>> have to keep the note ids with the notebooks could you not simply put the
> >>>> note id at the top of the notebook as Ruslan suggested? Then you’d only
> >>>> have to read the first line of each notebook.
> >>>>
> >>>>
> >>>>
> >>>> Presumably if you copied the notebooks to another Zeppelin server they
> >>>> would be restored with the same note ids there too? And hopefully there
> >>>> would be no id clash with notebooks already on that server…
> >>>>
> >>>>
> >>>>
> >>>> *From:* Jeff Zhang <zj...@gmail.com>
> >>>> *Sent:* 14 August 2018 03:49
> >>>> *To:* users@zeppelin.apache.org
> >>>>
> >>>>
> >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> >>>> instead of [NOTEID]/note.json
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Thanks for the discussion.
> >>>>
> >>>> >>> I'm afraid about non-latin symbols in folder and note name. And
> >>>> what about hieroglyphs?
> >>>>
> >>>> AFAIK, linux allow all the characters to be file name except `\0` and
> >>>> '/'.  I can create file name with Chinese character in linux, I guess you
> >>>> can use Russian as well.
> >>>>
> >>>>
> >>>>
> >>>> >>> If I understand correctly, this is being done solely to speed up
> >>>> loading list of notebooks? What if a list of notebook names, their ids,
> >>>> folder structure, etc can be *cached* in a separate small json file? Or
> >>>> perhaps in a small embedded key-value store, like www.mapdb.org would
> >>>> do? Just thinking out loud. This would require a way to lazily re-sync the
> >>>> cache.
> >>>>
> >>>>
> >>>>
> >>>> This not only to speed up the loading but also make the system
> >>>> architecture easy to maintain. Because for now we have to build the folder
> >>>> structure of notes in memory, many code in zeppelin is doing this
> >>>> (Personally I don't think we need any code for this function if we could
> >>>> get the folder structure from the note file storage system). Use another
> >>>> storage to keep the mapping of note name and note id will bring another
> >>>> classic problem of distributed system: consistency. How do we make sure the
> >>>> consistency between the real note file and this mapping component. If we
> >>>> create/rename/remove note, we have to both update the notebook repo and the
> >>>> mapping storage. Any bug in code would bring inconsistency issue based on
> >>>> my experience.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
> >>>>
> >>>> Thanks for bringing this up for discussion. My 2 cents below.
> >>>>
> >>>>
> >>>>
> >>>> I am with Maksim and Felix on concerns with special characters now
> >>>> allowed in notebook names, and also concerns with different charsets.
> >>>> Russian language, for example, most commonly use iso-8859-5, koi-8r/u,
> >>>> windows-1251 charsets etc. This seems like will bring whole new set of
> >>>> localization issues.
> >>>>
> >>>>
> >>>>
> >>>> If I understand correctly, this is being done solely to speed up
> >>>> loading list of notebooks? What if a list of notebook names, their ids,
> >>>> folder structure, etc can be *cached* in a separate small json file? Or
> >>>> perhaps in a small embedded key-value store, like www.mapdb.org would
> >>>> do? Just thinking out loud. This would require a way to lazily re-sync the
> >>>> cache.
> >>>>
> >>>>
> >>>>
> >>>> Another way to speed up json reads is to somehow force "name" attribute
> >>>> to be at the top of the json document that's written to disk. Then
> >>>> re-implement json files reader to read just header of the file and do a
> >>>> partial json parse ( or in the lack of options, grab "name" attribute from
> >>>> the json file header by a regex for example).
> >>>>
> >>>>
> >>>>
> >>>> Back to filenames and charsets, I think issue may be more complicated,
> >>>> if you store notebooks on a remote filesystem (nfs/ samba etc), and what if
> >>>> remote server and local nfs client have differences in default fs charsets?
> >>>>
> >>>>
> >>>>
> >>>> Ideally would be if all filesystems would use UTF-8 for example, but I
> >>>> am not certain that's a good assumption to make. Also exposing notebook
> >>>> names can bring some other issues, like I know some users occasionally add
> >>>> trailing/leading spaces etc.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> >>>> m.belousov@tinkoff.ru> wrote:
> >>>>
> >>>> The use of Russian and other specific letters in the note name is big
> >>>> advantage of Zeppelin. I would not like to give up this functionality.
> >>>>
> >>>>
> >>>>
> >>>> I support the idea about `zpln` file extension.
> >>>>
> >>>> The folder structure also sounds good.
> >>>>
> >>>>
> >>>>
> >>>> I'm afraid about non-latin symbols in folder and note name. And what
> >>>> about hieroglyphs?
> >>>>
> >>>>
> >>>>
> >>>> Apache Zeppelin may be the first to use Russian letters in file system
> >>>> in our company.
> >>>>
> >>>> I see a lot of risks to use non-latin symbols and a lot of issues to
> >>>> make new folder structure stable.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> *От:* Jeff Zhang <zj...@gmail.com>
> >>>> *Отправлено:* 13 августа 2018 г. 12:50
> >>>> *Кому:* users@zeppelin.apache.org
> >>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
> >>>> of [NOTEID]/note.json
> >>>>
> >>>>
> >>>>
> >>>> >>> Do we need the note id in the file name at all? What’s wrong with
> >>>> just note_name.zpln?
> >>>>
> >>>> The reason I keep note id is because currently we use noteId to
> >>>> identify one note. e.g. we use note id in both websocket api and rest api.
> >>>> It is almost impossible to remove noteId for the current architecture. If
> >>>> we put note id into file content of note_name.zpln, then we have to read
> >>>> the note file every time, then we meet the issues I mentioned above again.
> >>>>
> >>>>
> >>>>
> >>>> >>> If the file content is json then why not use note_name.json instead
> >>>> of .zpln? That would make it easier for editors to know how to
> >>>> load/highlight the file contents.
> >>>>
> >>>> I am not strongly biased on *.zpln. But I think one purpose is to help
> >>>> third parties to identify zeppelin note properly. e.g. github can identify
> >>>> jupyter notebook (*.ipynb) and render it properly.
> >>>>
> >>>>
> >>>>
> >>>> >>> Is there any reason for not using *real* folders or directories
> >>>> for organising the notebooks rather than embedding the folder hierarchy in
> >>>> the names of the notebooks?  If someone wants to ‘move’ the notebooks to
> >>>> another folder they’d have to manually rename all the files/notebooks at
> >>>> present.  That’s not very user-friendly.
> >>>>
> >>>>
> >>>>
> >>>> Actually my proposal is to use real folders. What user see in zeppelin
> >>>> note menu is the actual notes folder structure. If they want to move the
> >>>> notebooks to another folder, they can change the folder name just like what
> >>>> user did in file system.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一 下午
> >>>> 4:43写道:
> >>>>
> >>>> Hi Jeff,
> >>>>
> >>>> I have some questions about this proposal (I can’t edit the design doc):
> >>>>
> >>>>
> >>>>
> >>>>    1. Do we need the note id in the file name at all? What’s wrong
> >>>>    with just note_name.zpln?
> >>>>    2. If the file content is json then why not use note_name.json
> >>>>    instead of .zpln? That would make it easier for editors to know how to
> >>>>    load/highlight the file contents.
> >>>>    3. Is there any reason for not using *real* folders or directories
> >>>>    for organising the notebooks rather than embedding the folder hierarchy in
> >>>>    the names of the notebooks?  If someone wants to ‘move’ the notebooks to
> >>>>    another folder they’d have to manually rename all the files/notebooks at
> >>>>    present.  That’s not very user-friendly.
> >>>>
> >>>>
> >>>>
> >>>> Thanks, Lucas.
> >>>>
> >>>> *From:* Jeff Zhang <zj...@gmail.com>
> >>>> *Sent:* 13 August 2018 09:06
> >>>> *To:* users@zeppelin.apache.org
> >>>> *Cc:* dev <de...@zeppelin.apache.org>
> >>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> >>>> instead of [NOTEID]/note.json
> >>>>
> >>>>
> >>>>
> >>>> In that case, zeppelin should fail to create note.
> >>>>
> >>>>
> >>>>
> >>>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
> >>>>
> >>>> Perhaps one concern is users having characters in note name that are
> >>>> invalid for file name/file path?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------
> >>>>
> >>>> *From:* Mohit Jaggi <mo...@gmail.com>
> >>>> *Sent:* Sunday, August 12, 2018 6:02 PM
> >>>> *To:* users@zeppelin.apache.org
> >>>> *Cc:* dev
> >>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> >>>> instead of [NOTEID]/note.json
> >>>>
> >>>>
> >>>>
> >>>> sounds like a good idea!
> >>>>
> >>>>
> >>>>
> >>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
> >>>>
> >>>> Motivation
> >>>>
> >>>>    The motivation of ZEPPELIN-2619 is to change the notes storage
> >>>> structure. Previously we store it using {noteId}/note.json, we’d like to
> >>>> change it into {note_name}_{note_id}.zpln. There are several reasons for
> >>>> this change.
> >>>>
> >>>>
> >>>>
> >>>>    1. {noteId}/note.json is not scalable. We put all notes in one root
> >>>>    folder in flat structure. And when zeppelin server starts, we need to read
> >>>>    all note.json to get the note file name and build the note folder structure
> >>>>    (Because we need to get the note name which is stored in note.json to build
> >>>>    the notebook menu). This would be a nightmare when you have large amounts
> >>>>    of notes.
> >>>>    2. {noteId}/note.json is not maintainable. It is difficult for a
> >>>>    developer/administrator to find note file based on note name.
> >>>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
> >>>>    have to build the folder structure internally in memory according note name
> >>>>    which is a big overhead.
> >>>>
> >>>>
> >>>> New Approach
> >>>>
> >>>>    As I mentioned above, I propose to change the note storage structure
> >>>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
> >>>> folder_1/mynote_abcd.zpln
> >>>>
> >>>> This kind of note storage structure could bring several benefits.
> >>>>
> >>>>    1. We don’t need to load all notes when zeppelin starts. We just
> >>>>    need to list each folder to get the note name and note_id.
> >>>>    2. It is much maintainable so that it is easy to find the note file
> >>>>    based on note name.
> >>>>    3. It has the folder structure already. That can be mapped to the
> >>>>    note folder structure.
> >>>>
> >>>>
> >>>> Side Effect
> >>>>
> >>>> This approach only works for file system storage, so that means we have
> >>>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
> >>>> see any users talk about this in community, so I assume no one is using it.
> >>>>
> >>>>
> >>>>
> >>>> This is overall design, welcome any comments and feedback. Thanks.
> >>>>
> >>>>
> >>>>
> >>>> Here's the google docs, you can also comment it here.
> >>>>
> >>>>
> >>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >> 이종열, Jongyoul Lee, 李宗烈
> >> http://madeng.net
> >>
> >
> 

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jeff Zhang <zj...@gmail.com>.
BTW, I also prefer to use note name as identify of note if the issue I
mentioned before is acceptable for most of users.



Jeff Zhang <zj...@gmail.com>于2018年8月18日周六 下午4:40写道:

>
> I am afraid we can not remove noteId, as noteId is the unique identifier
> of note and is immutable which is used in a lot places, such as paragraph
> share and rest api.
> If we use note name as note id then it may break user's app if note name
> is changed
>
>
> Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:33写道:
>
>> Hi, thanks for this kind of discussion.
>>
>> About noteId, How about changing note id to note name? AFAIK, Note id is
>> just an identifier and we can set any value to it.
>>
>> There’re two potential problems. We should be more careful to handle note
>> id as it could have very various type of characters. And Second, in case
>> where someone changes a note name, those who are seeing and updating the
>> same note wouldn’t access that note. We could handle it by using websockets.
>>
>> WDYT?
>>
>> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> >>> But I’m still not comfortable with note ids in the name of the
>>> notebook itself.  Those names would look ugly if you shared your notebooks
>>> on github for example.  You don’t see Jupyter notebooks with names like
>>> that. If you have to keep the note ids with the notebooks could you not
>>> simply put the note id at the top of the notebook as Ruslan suggested? Then
>>> you’d only have to read the first line of each notebook.
>>>
>>> I know putting note_id in the note file name is not so elegant, but this
>>> is what we have to compromise to keep compatibility as we use noteId to
>>> uniquely identify note right now. And I don't think putting noteId in the
>>> top first line of note would help much. We still have to read note files
>>> which take much more time than just read the file names via file system.
>>>
>>> Regarding the readability of note file name, I think it won't affect
>>> much. E.g. This is the note book file name like:  *My Project/My Spark
>>> Tutorial Note_2A94M5J1Z.zpln*
>>> What user see in notebook menu is still *My Project/My Spark Tutorial* *Note
>>> *which is no difference from what we see now.
>>>
>>> And thanks again for the feedback and comments, I am so glad to see so
>>> many discussion in community.
>>>
>>>
>>>
>>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月14日周二
>>> 下午4:29写道:
>>>
>>>> I agree you’re inviting consistency issues if you maintained a separate
>>>> note id-to-note name mapping file.
>>>>
>>>>
>>>>
>>>> But I’m still not comfortable with note ids in the name of the notebook
>>>> itself.  Those names would look ugly if you shared your notebooks on github
>>>> for example.  You don’t see Jupyter notebooks with names like that.  If you
>>>> have to keep the note ids with the notebooks could you not simply put the
>>>> note id at the top of the notebook as Ruslan suggested? Then you’d only
>>>> have to read the first line of each notebook.
>>>>
>>>>
>>>>
>>>> Presumably if you copied the notebooks to another Zeppelin server they
>>>> would be restored with the same note ids there too? And hopefully there
>>>> would be no id clash with notebooks already on that server…
>>>>
>>>>
>>>>
>>>> *From:* Jeff Zhang <zj...@gmail.com>
>>>> *Sent:* 14 August 2018 03:49
>>>> *To:* users@zeppelin.apache.org
>>>>
>>>>
>>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>>> instead of [NOTEID]/note.json
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks for the discussion.
>>>>
>>>> >>> I'm afraid about non-latin symbols in folder and note name. And
>>>> what about hieroglyphs?
>>>>
>>>> AFAIK, linux allow all the characters to be file name except `\0` and
>>>> '/'.  I can create file name with Chinese character in linux, I guess you
>>>> can use Russian as well.
>>>>
>>>>
>>>>
>>>> >>> If I understand correctly, this is being done solely to speed up
>>>> loading list of notebooks? What if a list of notebook names, their ids,
>>>> folder structure, etc can be *cached* in a separate small json file? Or
>>>> perhaps in a small embedded key-value store, like www.mapdb.org would
>>>> do? Just thinking out loud. This would require a way to lazily re-sync the
>>>> cache.
>>>>
>>>>
>>>>
>>>> This not only to speed up the loading but also make the system
>>>> architecture easy to maintain. Because for now we have to build the folder
>>>> structure of notes in memory, many code in zeppelin is doing this
>>>> (Personally I don't think we need any code for this function if we could
>>>> get the folder structure from the note file storage system). Use another
>>>> storage to keep the mapping of note name and note id will bring another
>>>> classic problem of distributed system: consistency. How do we make sure the
>>>> consistency between the real note file and this mapping component. If we
>>>> create/rename/remove note, we have to both update the notebook repo and the
>>>> mapping storage. Any bug in code would bring inconsistency issue based on
>>>> my experience.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
>>>>
>>>> Thanks for bringing this up for discussion. My 2 cents below.
>>>>
>>>>
>>>>
>>>> I am with Maksim and Felix on concerns with special characters now
>>>> allowed in notebook names, and also concerns with different charsets.
>>>> Russian language, for example, most commonly use iso-8859-5, koi-8r/u,
>>>> windows-1251 charsets etc. This seems like will bring whole new set of
>>>> localization issues.
>>>>
>>>>
>>>>
>>>> If I understand correctly, this is being done solely to speed up
>>>> loading list of notebooks? What if a list of notebook names, their ids,
>>>> folder structure, etc can be *cached* in a separate small json file? Or
>>>> perhaps in a small embedded key-value store, like www.mapdb.org would
>>>> do? Just thinking out loud. This would require a way to lazily re-sync the
>>>> cache.
>>>>
>>>>
>>>>
>>>> Another way to speed up json reads is to somehow force "name" attribute
>>>> to be at the top of the json document that's written to disk. Then
>>>> re-implement json files reader to read just header of the file and do a
>>>> partial json parse ( or in the lack of options, grab "name" attribute from
>>>> the json file header by a regex for example).
>>>>
>>>>
>>>>
>>>> Back to filenames and charsets, I think issue may be more complicated,
>>>> if you store notebooks on a remote filesystem (nfs/ samba etc), and what if
>>>> remote server and local nfs client have differences in default fs charsets?
>>>>
>>>>
>>>>
>>>> Ideally would be if all filesystems would use UTF-8 for example, but I
>>>> am not certain that's a good assumption to make. Also exposing notebook
>>>> names can bring some other issues, like I know some users occasionally add
>>>> trailing/leading spaces etc.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
>>>> m.belousov@tinkoff.ru> wrote:
>>>>
>>>> The use of Russian and other specific letters in the note name is big
>>>> advantage of Zeppelin. I would not like to give up this functionality.
>>>>
>>>>
>>>>
>>>> I support the idea about `zpln` file extension.
>>>>
>>>> The folder structure also sounds good.
>>>>
>>>>
>>>>
>>>> I'm afraid about non-latin symbols in folder and note name. And what
>>>> about hieroglyphs?
>>>>
>>>>
>>>>
>>>> Apache Zeppelin may be the first to use Russian letters in file system
>>>> in our company.
>>>>
>>>> I see a lot of risks to use non-latin symbols and a lot of issues to
>>>> make new folder structure stable.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *От:* Jeff Zhang <zj...@gmail.com>
>>>> *Отправлено:* 13 августа 2018 г. 12:50
>>>> *Кому:* users@zeppelin.apache.org
>>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
>>>> of [NOTEID]/note.json
>>>>
>>>>
>>>>
>>>> >>> Do we need the note id in the file name at all? What’s wrong with
>>>> just note_name.zpln?
>>>>
>>>> The reason I keep note id is because currently we use noteId to
>>>> identify one note. e.g. we use note id in both websocket api and rest api.
>>>> It is almost impossible to remove noteId for the current architecture. If
>>>> we put note id into file content of note_name.zpln, then we have to read
>>>> the note file every time, then we meet the issues I mentioned above again.
>>>>
>>>>
>>>>
>>>> >>> If the file content is json then why not use note_name.json instead
>>>> of .zpln? That would make it easier for editors to know how to
>>>> load/highlight the file contents.
>>>>
>>>> I am not strongly biased on *.zpln. But I think one purpose is to help
>>>> third parties to identify zeppelin note properly. e.g. github can identify
>>>> jupyter notebook (*.ipynb) and render it properly.
>>>>
>>>>
>>>>
>>>> >>> Is there any reason for not using *real* folders or directories
>>>> for organising the notebooks rather than embedding the folder hierarchy in
>>>> the names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>>> another folder they’d have to manually rename all the files/notebooks at
>>>> present.  That’s not very user-friendly.
>>>>
>>>>
>>>>
>>>> Actually my proposal is to use real folders. What user see in zeppelin
>>>> note menu is the actual notes folder structure. If they want to move the
>>>> notebooks to another folder, they can change the folder name just like what
>>>> user did in file system.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一 下午
>>>> 4:43写道:
>>>>
>>>> Hi Jeff,
>>>>
>>>> I have some questions about this proposal (I can’t edit the design doc):
>>>>
>>>>
>>>>
>>>>    1. Do we need the note id in the file name at all? What’s wrong
>>>>    with just note_name.zpln?
>>>>    2. If the file content is json then why not use note_name.json
>>>>    instead of .zpln? That would make it easier for editors to know how to
>>>>    load/highlight the file contents.
>>>>    3. Is there any reason for not using *real* folders or directories
>>>>    for organising the notebooks rather than embedding the folder hierarchy in
>>>>    the names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>>>    another folder they’d have to manually rename all the files/notebooks at
>>>>    present.  That’s not very user-friendly.
>>>>
>>>>
>>>>
>>>> Thanks, Lucas.
>>>>
>>>> *From:* Jeff Zhang <zj...@gmail.com>
>>>> *Sent:* 13 August 2018 09:06
>>>> *To:* users@zeppelin.apache.org
>>>> *Cc:* dev <de...@zeppelin.apache.org>
>>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>>> instead of [NOTEID]/note.json
>>>>
>>>>
>>>>
>>>> In that case, zeppelin should fail to create note.
>>>>
>>>>
>>>>
>>>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>>>>
>>>> Perhaps one concern is users having characters in note name that are
>>>> invalid for file name/file path?
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *From:* Mohit Jaggi <mo...@gmail.com>
>>>> *Sent:* Sunday, August 12, 2018 6:02 PM
>>>> *To:* users@zeppelin.apache.org
>>>> *Cc:* dev
>>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>>> instead of [NOTEID]/note.json
>>>>
>>>>
>>>>
>>>> sounds like a good idea!
>>>>
>>>>
>>>>
>>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>>
>>>> Motivation
>>>>
>>>>    The motivation of ZEPPELIN-2619 is to change the notes storage
>>>> structure. Previously we store it using {noteId}/note.json, we’d like to
>>>> change it into {note_name}_{note_id}.zpln. There are several reasons for
>>>> this change.
>>>>
>>>>
>>>>
>>>>    1. {noteId}/note.json is not scalable. We put all notes in one root
>>>>    folder in flat structure. And when zeppelin server starts, we need to read
>>>>    all note.json to get the note file name and build the note folder structure
>>>>    (Because we need to get the note name which is stored in note.json to build
>>>>    the notebook menu). This would be a nightmare when you have large amounts
>>>>    of notes.
>>>>    2. {noteId}/note.json is not maintainable. It is difficult for a
>>>>    developer/administrator to find note file based on note name.
>>>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
>>>>    have to build the folder structure internally in memory according note name
>>>>    which is a big overhead.
>>>>
>>>>
>>>> New Approach
>>>>
>>>>    As I mentioned above, I propose to change the note storage structure
>>>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
>>>> folder_1/mynote_abcd.zpln
>>>>
>>>> This kind of note storage structure could bring several benefits.
>>>>
>>>>    1. We don’t need to load all notes when zeppelin starts. We just
>>>>    need to list each folder to get the note name and note_id.
>>>>    2. It is much maintainable so that it is easy to find the note file
>>>>    based on note name.
>>>>    3. It has the folder structure already. That can be mapped to the
>>>>    note folder structure.
>>>>
>>>>
>>>> Side Effect
>>>>
>>>> This approach only works for file system storage, so that means we have
>>>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
>>>> see any users talk about this in community, so I assume no one is using it.
>>>>
>>>>
>>>>
>>>> This is overall design, welcome any comments and feedback. Thanks.
>>>>
>>>>
>>>>
>>>> Here's the google docs, you can also comment it here.
>>>>
>>>>
>>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>
>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jeff Zhang <zj...@gmail.com>.
I am afraid we can not remove noteId, as noteId is the unique identifier of
note and is immutable which is used in a lot places, such as paragraph
share and rest api.
If we use note name as note id then it may break user's app if note name is
changed


Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:33写道:

> Hi, thanks for this kind of discussion.
>
> About noteId, How about changing note id to note name? AFAIK, Note id is
> just an identifier and we can set any value to it.
>
> There’re two potential problems. We should be more careful to handle note
> id as it could have very various type of characters. And Second, in case
> where someone changes a note name, those who are seeing and updating the
> same note wouldn’t access that note. We could handle it by using websockets.
>
> WDYT?
>
> On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> >>> But I’m still not comfortable with note ids in the name of the
>> notebook itself.  Those names would look ugly if you shared your notebooks
>> on github for example.  You don’t see Jupyter notebooks with names like
>> that. If you have to keep the note ids with the notebooks could you not
>> simply put the note id at the top of the notebook as Ruslan suggested? Then
>> you’d only have to read the first line of each notebook.
>>
>> I know putting note_id in the note file name is not so elegant, but this
>> is what we have to compromise to keep compatibility as we use noteId to
>> uniquely identify note right now. And I don't think putting noteId in the
>> top first line of note would help much. We still have to read note files
>> which take much more time than just read the file names via file system.
>>
>> Regarding the readability of note file name, I think it won't affect
>> much. E.g. This is the note book file name like:  *My Project/My Spark
>> Tutorial Note_2A94M5J1Z.zpln*
>> What user see in notebook menu is still *My Project/My Spark Tutorial* *Note
>> *which is no difference from what we see now.
>>
>> And thanks again for the feedback and comments, I am so glad to see so
>> many discussion in community.
>>
>>
>>
>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月14日周二
>> 下午4:29写道:
>>
>>> I agree you’re inviting consistency issues if you maintained a separate
>>> note id-to-note name mapping file.
>>>
>>>
>>>
>>> But I’m still not comfortable with note ids in the name of the notebook
>>> itself.  Those names would look ugly if you shared your notebooks on github
>>> for example.  You don’t see Jupyter notebooks with names like that.  If you
>>> have to keep the note ids with the notebooks could you not simply put the
>>> note id at the top of the notebook as Ruslan suggested? Then you’d only
>>> have to read the first line of each notebook.
>>>
>>>
>>>
>>> Presumably if you copied the notebooks to another Zeppelin server they
>>> would be restored with the same note ids there too? And hopefully there
>>> would be no id clash with notebooks already on that server…
>>>
>>>
>>>
>>> *From:* Jeff Zhang <zj...@gmail.com>
>>> *Sent:* 14 August 2018 03:49
>>> *To:* users@zeppelin.apache.org
>>>
>>>
>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>> instead of [NOTEID]/note.json
>>>
>>>
>>>
>>>
>>>
>>> Thanks for the discussion.
>>>
>>> >>> I'm afraid about non-latin symbols in folder and note name. And
>>> what about hieroglyphs?
>>>
>>> AFAIK, linux allow all the characters to be file name except `\0` and
>>> '/'.  I can create file name with Chinese character in linux, I guess you
>>> can use Russian as well.
>>>
>>>
>>>
>>> >>> If I understand correctly, this is being done solely to speed up
>>> loading list of notebooks? What if a list of notebook names, their ids,
>>> folder structure, etc can be *cached* in a separate small json file? Or
>>> perhaps in a small embedded key-value store, like www.mapdb.org would
>>> do? Just thinking out loud. This would require a way to lazily re-sync the
>>> cache.
>>>
>>>
>>>
>>> This not only to speed up the loading but also make the system
>>> architecture easy to maintain. Because for now we have to build the folder
>>> structure of notes in memory, many code in zeppelin is doing this
>>> (Personally I don't think we need any code for this function if we could
>>> get the folder structure from the note file storage system). Use another
>>> storage to keep the mapping of note name and note id will bring another
>>> classic problem of distributed system: consistency. How do we make sure the
>>> consistency between the real note file and this mapping component. If we
>>> create/rename/remove note, we have to both update the notebook repo and the
>>> mapping storage. Any bug in code would bring inconsistency issue based on
>>> my experience.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
>>>
>>> Thanks for bringing this up for discussion. My 2 cents below.
>>>
>>>
>>>
>>> I am with Maksim and Felix on concerns with special characters now
>>> allowed in notebook names, and also concerns with different charsets.
>>> Russian language, for example, most commonly use iso-8859-5, koi-8r/u,
>>> windows-1251 charsets etc. This seems like will bring whole new set of
>>> localization issues.
>>>
>>>
>>>
>>> If I understand correctly, this is being done solely to speed up loading
>>> list of notebooks? What if a list of notebook names, their ids, folder
>>> structure, etc can be *cached* in a separate small json file? Or perhaps in
>>> a small embedded key-value store, like www.mapdb.org would do? Just
>>> thinking out loud. This would require a way to lazily re-sync the cache.
>>>
>>>
>>>
>>> Another way to speed up json reads is to somehow force "name" attribute
>>> to be at the top of the json document that's written to disk. Then
>>> re-implement json files reader to read just header of the file and do a
>>> partial json parse ( or in the lack of options, grab "name" attribute from
>>> the json file header by a regex for example).
>>>
>>>
>>>
>>> Back to filenames and charsets, I think issue may be more complicated,
>>> if you store notebooks on a remote filesystem (nfs/ samba etc), and what if
>>> remote server and local nfs client have differences in default fs charsets?
>>>
>>>
>>>
>>> Ideally would be if all filesystems would use UTF-8 for example, but I
>>> am not certain that's a good assumption to make. Also exposing notebook
>>> names can bring some other issues, like I know some users occasionally add
>>> trailing/leading spaces etc.
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
>>> m.belousov@tinkoff.ru> wrote:
>>>
>>> The use of Russian and other specific letters in the note name is big
>>> advantage of Zeppelin. I would not like to give up this functionality.
>>>
>>>
>>>
>>> I support the idea about `zpln` file extension.
>>>
>>> The folder structure also sounds good.
>>>
>>>
>>>
>>> I'm afraid about non-latin symbols in folder and note name. And what
>>> about hieroglyphs?
>>>
>>>
>>>
>>> Apache Zeppelin may be the first to use Russian letters in file system
>>> in our company.
>>>
>>> I see a lot of risks to use non-latin symbols and a lot of issues to
>>> make new folder structure stable.
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> *От:* Jeff Zhang <zj...@gmail.com>
>>> *Отправлено:* 13 августа 2018 г. 12:50
>>> *Кому:* users@zeppelin.apache.org
>>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
>>> of [NOTEID]/note.json
>>>
>>>
>>>
>>> >>> Do we need the note id in the file name at all? What’s wrong with
>>> just note_name.zpln?
>>>
>>> The reason I keep note id is because currently we use noteId to identify
>>> one note. e.g. we use note id in both websocket api and rest api. It is
>>> almost impossible to remove noteId for the current architecture. If we put
>>> note id into file content of note_name.zpln, then we have to read the note
>>> file every time, then we meet the issues I mentioned above again.
>>>
>>>
>>>
>>> >>> If the file content is json then why not use note_name.json instead
>>> of .zpln? That would make it easier for editors to know how to
>>> load/highlight the file contents.
>>>
>>> I am not strongly biased on *.zpln. But I think one purpose is to help
>>> third parties to identify zeppelin note properly. e.g. github can identify
>>> jupyter notebook (*.ipynb) and render it properly.
>>>
>>>
>>>
>>> >>> Is there any reason for not using *real* folders or directories for
>>> organising the notebooks rather than embedding the folder hierarchy in the
>>> names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>> another folder they’d have to manually rename all the files/notebooks at
>>> present.  That’s not very user-friendly.
>>>
>>>
>>>
>>> Actually my proposal is to use real folders. What user see in zeppelin
>>> note menu is the actual notes folder structure. If they want to move the
>>> notebooks to another folder, they can change the folder name just like what
>>> user did in file system.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一 下午
>>> 4:43写道:
>>>
>>> Hi Jeff,
>>>
>>> I have some questions about this proposal (I can’t edit the design doc):
>>>
>>>
>>>
>>>    1. Do we need the note id in the file name at all? What’s wrong with
>>>    just note_name.zpln?
>>>    2. If the file content is json then why not use note_name.json
>>>    instead of .zpln? That would make it easier for editors to know how to
>>>    load/highlight the file contents.
>>>    3. Is there any reason for not using *real* folders or directories
>>>    for organising the notebooks rather than embedding the folder hierarchy in
>>>    the names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>>    another folder they’d have to manually rename all the files/notebooks at
>>>    present.  That’s not very user-friendly.
>>>
>>>
>>>
>>> Thanks, Lucas.
>>>
>>> *From:* Jeff Zhang <zj...@gmail.com>
>>> *Sent:* 13 August 2018 09:06
>>> *To:* users@zeppelin.apache.org
>>> *Cc:* dev <de...@zeppelin.apache.org>
>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>> instead of [NOTEID]/note.json
>>>
>>>
>>>
>>> In that case, zeppelin should fail to create note.
>>>
>>>
>>>
>>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>>>
>>> Perhaps one concern is users having characters in note name that are
>>> invalid for file name/file path?
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Mohit Jaggi <mo...@gmail.com>
>>> *Sent:* Sunday, August 12, 2018 6:02 PM
>>> *To:* users@zeppelin.apache.org
>>> *Cc:* dev
>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>> instead of [NOTEID]/note.json
>>>
>>>
>>>
>>> sounds like a good idea!
>>>
>>>
>>>
>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>
>>> Motivation
>>>
>>>    The motivation of ZEPPELIN-2619 is to change the notes storage
>>> structure. Previously we store it using {noteId}/note.json, we’d like to
>>> change it into {note_name}_{note_id}.zpln. There are several reasons for
>>> this change.
>>>
>>>
>>>
>>>    1. {noteId}/note.json is not scalable. We put all notes in one root
>>>    folder in flat structure. And when zeppelin server starts, we need to read
>>>    all note.json to get the note file name and build the note folder structure
>>>    (Because we need to get the note name which is stored in note.json to build
>>>    the notebook menu). This would be a nightmare when you have large amounts
>>>    of notes.
>>>    2. {noteId}/note.json is not maintainable. It is difficult for a
>>>    developer/administrator to find note file based on note name.
>>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
>>>    have to build the folder structure internally in memory according note name
>>>    which is a big overhead.
>>>
>>>
>>> New Approach
>>>
>>>    As I mentioned above, I propose to change the note storage structure
>>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
>>> folder_1/mynote_abcd.zpln
>>>
>>> This kind of note storage structure could bring several benefits.
>>>
>>>    1. We don’t need to load all notes when zeppelin starts. We just
>>>    need to list each folder to get the note name and note_id.
>>>    2. It is much maintainable so that it is easy to find the note file
>>>    based on note name.
>>>    3. It has the folder structure already. That can be mapped to the
>>>    note folder structure.
>>>
>>>
>>> Side Effect
>>>
>>> This approach only works for file system storage, so that means we have
>>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
>>> see any users talk about this in community, so I assume no one is using it.
>>>
>>>
>>>
>>> This is overall design, welcome any comments and feedback. Thanks.
>>>
>>>
>>>
>>> Here's the google docs, you can also comment it here.
>>>
>>>
>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>>>
>>>
>>>
>>>
>>>
>>> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jongyoul Lee <jo...@gmail.com>.
Hi, thanks for this kind of discussion.

About noteId, How about changing note id to note name? AFAIK, Note id is
just an identifier and we can set any value to it.

There’re two potential problems. We should be more careful to handle note
id as it could have very various type of characters. And Second, in case
where someone changes a note name, those who are seeing and updating the
same note wouldn’t access that note. We could handle it by using websockets.

WDYT?

On Tue, 14 Aug 2018 at 6:14 PM Jeff Zhang <zj...@gmail.com> wrote:

> >>> But I’m still not comfortable with note ids in the name of the
> notebook itself.  Those names would look ugly if you shared your notebooks
> on github for example.  You don’t see Jupyter notebooks with names like
> that. If you have to keep the note ids with the notebooks could you not
> simply put the note id at the top of the notebook as Ruslan suggested? Then
> you’d only have to read the first line of each notebook.
>
> I know putting note_id in the note file name is not so elegant, but this
> is what we have to compromise to keep compatibility as we use noteId to
> uniquely identify note right now. And I don't think putting noteId in the
> top first line of note would help much. We still have to read note files
> which take much more time than just read the file names via file system.
>
> Regarding the readability of note file name, I think it won't affect much.
> E.g. This is the note book file name like:  *My Project/My Spark Tutorial
> Note_2A94M5J1Z.zpln*
> What user see in notebook menu is still *My Project/My Spark Tutorial* *Note
> *which is no difference from what we see now.
>
> And thanks again for the feedback and comments, I am so glad to see so
> many discussion in community.
>
>
>
> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月14日周二
> 下午4:29写道:
>
>> I agree you’re inviting consistency issues if you maintained a separate
>> note id-to-note name mapping file.
>>
>>
>>
>> But I’m still not comfortable with note ids in the name of the notebook
>> itself.  Those names would look ugly if you shared your notebooks on github
>> for example.  You don’t see Jupyter notebooks with names like that.  If you
>> have to keep the note ids with the notebooks could you not simply put the
>> note id at the top of the notebook as Ruslan suggested? Then you’d only
>> have to read the first line of each notebook.
>>
>>
>>
>> Presumably if you copied the notebooks to another Zeppelin server they
>> would be restored with the same note ids there too? And hopefully there
>> would be no id clash with notebooks already on that server…
>>
>>
>>
>> *From:* Jeff Zhang <zj...@gmail.com>
>> *Sent:* 14 August 2018 03:49
>> *To:* users@zeppelin.apache.org
>>
>>
>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>> instead of [NOTEID]/note.json
>>
>>
>>
>>
>>
>> Thanks for the discussion.
>>
>> >>> I'm afraid about non-latin symbols in folder and note name. And what
>> about hieroglyphs?
>>
>> AFAIK, linux allow all the characters to be file name except `\0` and
>> '/'.  I can create file name with Chinese character in linux, I guess you
>> can use Russian as well.
>>
>>
>>
>> >>> If I understand correctly, this is being done solely to speed up
>> loading list of notebooks? What if a list of notebook names, their ids,
>> folder structure, etc can be *cached* in a separate small json file? Or
>> perhaps in a small embedded key-value store, like www.mapdb.org would
>> do? Just thinking out loud. This would require a way to lazily re-sync the
>> cache.
>>
>>
>>
>> This not only to speed up the loading but also make the system
>> architecture easy to maintain. Because for now we have to build the folder
>> structure of notes in memory, many code in zeppelin is doing this
>> (Personally I don't think we need any code for this function if we could
>> get the folder structure from the note file storage system). Use another
>> storage to keep the mapping of note name and note id will bring another
>> classic problem of distributed system: consistency. How do we make sure the
>> consistency between the real note file and this mapping component. If we
>> create/rename/remove note, we have to both update the notebook repo and the
>> mapping storage. Any bug in code would bring inconsistency issue based on
>> my experience.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
>>
>> Thanks for bringing this up for discussion. My 2 cents below.
>>
>>
>>
>> I am with Maksim and Felix on concerns with special characters now
>> allowed in notebook names, and also concerns with different charsets.
>> Russian language, for example, most commonly use iso-8859-5, koi-8r/u,
>> windows-1251 charsets etc. This seems like will bring whole new set of
>> localization issues.
>>
>>
>>
>> If I understand correctly, this is being done solely to speed up loading
>> list of notebooks? What if a list of notebook names, their ids, folder
>> structure, etc can be *cached* in a separate small json file? Or perhaps in
>> a small embedded key-value store, like www.mapdb.org would do? Just
>> thinking out loud. This would require a way to lazily re-sync the cache.
>>
>>
>>
>> Another way to speed up json reads is to somehow force "name" attribute
>> to be at the top of the json document that's written to disk. Then
>> re-implement json files reader to read just header of the file and do a
>> partial json parse ( or in the lack of options, grab "name" attribute from
>> the json file header by a regex for example).
>>
>>
>>
>> Back to filenames and charsets, I think issue may be more complicated, if
>> you store notebooks on a remote filesystem (nfs/ samba etc), and what if
>> remote server and local nfs client have differences in default fs charsets?
>>
>>
>>
>> Ideally would be if all filesystems would use UTF-8 for example, but I am
>> not certain that's a good assumption to make. Also exposing notebook names
>> can bring some other issues, like I know some users occasionally add
>> trailing/leading spaces etc.
>>
>>
>>
>>
>>
>> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
>> m.belousov@tinkoff.ru> wrote:
>>
>> The use of Russian and other specific letters in the note name is big
>> advantage of Zeppelin. I would not like to give up this functionality.
>>
>>
>>
>> I support the idea about `zpln` file extension.
>>
>> The folder structure also sounds good.
>>
>>
>>
>> I'm afraid about non-latin symbols in folder and note name. And what
>> about hieroglyphs?
>>
>>
>>
>> Apache Zeppelin may be the first to use Russian letters in file system in
>> our company.
>>
>> I see a lot of risks to use non-latin symbols and a lot of issues to make
>> new folder structure stable.
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> *От:* Jeff Zhang <zj...@gmail.com>
>> *Отправлено:* 13 августа 2018 г. 12:50
>> *Кому:* users@zeppelin.apache.org
>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
>> of [NOTEID]/note.json
>>
>>
>>
>> >>> Do we need the note id in the file name at all? What’s wrong with
>> just note_name.zpln?
>>
>> The reason I keep note id is because currently we use noteId to identify
>> one note. e.g. we use note id in both websocket api and rest api. It is
>> almost impossible to remove noteId for the current architecture. If we put
>> note id into file content of note_name.zpln, then we have to read the note
>> file every time, then we meet the issues I mentioned above again.
>>
>>
>>
>> >>> If the file content is json then why not use note_name.json instead
>> of .zpln? That would make it easier for editors to know how to
>> load/highlight the file contents.
>>
>> I am not strongly biased on *.zpln. But I think one purpose is to help
>> third parties to identify zeppelin note properly. e.g. github can identify
>> jupyter notebook (*.ipynb) and render it properly.
>>
>>
>>
>> >>> Is there any reason for not using *real* folders or directories for
>> organising the notebooks rather than embedding the folder hierarchy in the
>> names of the notebooks?  If someone wants to ‘move’ the notebooks to
>> another folder they’d have to manually rename all the files/notebooks at
>> present.  That’s not very user-friendly.
>>
>>
>>
>> Actually my proposal is to use real folders. What user see in zeppelin
>> note menu is the actual notes folder structure. If they want to move the
>> notebooks to another folder, they can change the folder name just like what
>> user did in file system.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一 下午
>> 4:43写道:
>>
>> Hi Jeff,
>>
>> I have some questions about this proposal (I can’t edit the design doc):
>>
>>
>>
>>    1. Do we need the note id in the file name at all? What’s wrong with
>>    just note_name.zpln?
>>    2. If the file content is json then why not use note_name.json
>>    instead of .zpln? That would make it easier for editors to know how to
>>    load/highlight the file contents.
>>    3. Is there any reason for not using *real* folders or directories
>>    for organising the notebooks rather than embedding the folder hierarchy in
>>    the names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>    another folder they’d have to manually rename all the files/notebooks at
>>    present.  That’s not very user-friendly.
>>
>>
>>
>> Thanks, Lucas.
>>
>> *From:* Jeff Zhang <zj...@gmail.com>
>> *Sent:* 13 August 2018 09:06
>> *To:* users@zeppelin.apache.org
>> *Cc:* dev <de...@zeppelin.apache.org>
>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>> instead of [NOTEID]/note.json
>>
>>
>>
>> In that case, zeppelin should fail to create note.
>>
>>
>>
>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>>
>> Perhaps one concern is users having characters in note name that are
>> invalid for file name/file path?
>>
>>
>>
>>
>> ------------------------------
>>
>> *From:* Mohit Jaggi <mo...@gmail.com>
>> *Sent:* Sunday, August 12, 2018 6:02 PM
>> *To:* users@zeppelin.apache.org
>> *Cc:* dev
>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>> instead of [NOTEID]/note.json
>>
>>
>>
>> sounds like a good idea!
>>
>>
>>
>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>> Motivation
>>
>>    The motivation of ZEPPELIN-2619 is to change the notes storage
>> structure. Previously we store it using {noteId}/note.json, we’d like to
>> change it into {note_name}_{note_id}.zpln. There are several reasons for
>> this change.
>>
>>
>>
>>    1. {noteId}/note.json is not scalable. We put all notes in one root
>>    folder in flat structure. And when zeppelin server starts, we need to read
>>    all note.json to get the note file name and build the note folder structure
>>    (Because we need to get the note name which is stored in note.json to build
>>    the notebook menu). This would be a nightmare when you have large amounts
>>    of notes.
>>    2. {noteId}/note.json is not maintainable. It is difficult for a
>>    developer/administrator to find note file based on note name.
>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
>>    have to build the folder structure internally in memory according note name
>>    which is a big overhead.
>>
>>
>> New Approach
>>
>>    As I mentioned above, I propose to change the note storage structure
>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
>> folder_1/mynote_abcd.zpln
>>
>> This kind of note storage structure could bring several benefits.
>>
>>    1. We don’t need to load all notes when zeppelin starts. We just need
>>    to list each folder to get the note name and note_id.
>>    2. It is much maintainable so that it is easy to find the note file
>>    based on note name.
>>    3. It has the folder structure already. That can be mapped to the
>>    note folder structure.
>>
>>
>> Side Effect
>>
>> This approach only works for file system storage, so that means we have
>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
>> see any users talk about this in community, so I assume no one is using it.
>>
>>
>>
>> This is overall design, welcome any comments and feedback. Thanks.
>>
>>
>>
>> Here's the google docs, you can also comment it here.
>>
>>
>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>>
>>
>>
>>
>>
>> --
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jeff Zhang <zj...@gmail.com>.
>>> But I’m still not comfortable with note ids in the name of the notebook
itself.  Those names would look ugly if you shared your notebooks on github
for example.  You don’t see Jupyter notebooks with names like that. If you
have to keep the note ids with the notebooks could you not simply put the
note id at the top of the notebook as Ruslan suggested? Then you’d only
have to read the first line of each notebook.

I know putting note_id in the note file name is not so elegant, but this is
what we have to compromise to keep compatibility as we use noteId to
uniquely identify note right now. And I don't think putting noteId in the
top first line of note would help much. We still have to read note files
which take much more time than just read the file names via file system.

Regarding the readability of note file name, I think it won't affect much.
E.g. This is the note book file name like:  *My Project/My Spark Tutorial
Note_2A94M5J1Z.zpln*
What user see in notebook menu is still *My Project/My Spark Tutorial* *Note
*which is no difference from what we see now.

And thanks again for the feedback and comments, I am so glad to see so many
discussion in community.



Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月14日周二
下午4:29写道:

> I agree you’re inviting consistency issues if you maintained a separate
> note id-to-note name mapping file.
>
>
>
> But I’m still not comfortable with note ids in the name of the notebook
> itself.  Those names would look ugly if you shared your notebooks on github
> for example.  You don’t see Jupyter notebooks with names like that.  If you
> have to keep the note ids with the notebooks could you not simply put the
> note id at the top of the notebook as Ruslan suggested? Then you’d only
> have to read the first line of each notebook.
>
>
>
> Presumably if you copied the notebooks to another Zeppelin server they
> would be restored with the same note ids there too? And hopefully there
> would be no id clash with notebooks already on that server…
>
>
>
> *From:* Jeff Zhang <zj...@gmail.com>
> *Sent:* 14 August 2018 03:49
> *To:* users@zeppelin.apache.org
>
>
> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> instead of [NOTEID]/note.json
>
>
>
>
>
> Thanks for the discussion.
>
> >>> I'm afraid about non-latin symbols in folder and note name. And what
> about hieroglyphs?
>
> AFAIK, linux allow all the characters to be file name except `\0` and
> '/'.  I can create file name with Chinese character in linux, I guess you
> can use Russian as well.
>
>
>
> >>> If I understand correctly, this is being done solely to speed up
> loading list of notebooks? What if a list of notebook names, their ids,
> folder structure, etc can be *cached* in a separate small json file? Or
> perhaps in a small embedded key-value store, like www.mapdb.org would do?
> Just thinking out loud. This would require a way to lazily re-sync the
> cache.
>
>
>
> This not only to speed up the loading but also make the system
> architecture easy to maintain. Because for now we have to build the folder
> structure of notes in memory, many code in zeppelin is doing this
> (Personally I don't think we need any code for this function if we could
> get the folder structure from the note file storage system). Use another
> storage to keep the mapping of note name and note id will bring another
> classic problem of distributed system: consistency. How do we make sure the
> consistency between the real note file and this mapping component. If we
> create/rename/remove note, we have to both update the notebook repo and the
> mapping storage. Any bug in code would bring inconsistency issue based on
> my experience.
>
>
>
>
>
>
>
>
>
> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
>
> Thanks for bringing this up for discussion. My 2 cents below.
>
>
>
> I am with Maksim and Felix on concerns with special characters now allowed
> in notebook names, and also concerns with different charsets. Russian
> language, for example, most commonly use iso-8859-5, koi-8r/u, windows-1251
> charsets etc. This seems like will bring whole new set of localization
> issues.
>
>
>
> If I understand correctly, this is being done solely to speed up loading
> list of notebooks? What if a list of notebook names, their ids, folder
> structure, etc can be *cached* in a separate small json file? Or perhaps in
> a small embedded key-value store, like www.mapdb.org would do? Just
> thinking out loud. This would require a way to lazily re-sync the cache.
>
>
>
> Another way to speed up json reads is to somehow force "name" attribute to
> be at the top of the json document that's written to disk. Then
> re-implement json files reader to read just header of the file and do a
> partial json parse ( or in the lack of options, grab "name" attribute from
> the json file header by a regex for example).
>
>
>
> Back to filenames and charsets, I think issue may be more complicated, if
> you store notebooks on a remote filesystem (nfs/ samba etc), and what if
> remote server and local nfs client have differences in default fs charsets?
>
>
>
> Ideally would be if all filesystems would use UTF-8 for example, but I am
> not certain that's a good assumption to make. Also exposing notebook names
> can bring some other issues, like I know some users occasionally add
> trailing/leading spaces etc.
>
>
>
>
>
> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> m.belousov@tinkoff.ru> wrote:
>
> The use of Russian and other specific letters in the note name is big
> advantage of Zeppelin. I would not like to give up this functionality.
>
>
>
> I support the idea about `zpln` file extension.
>
> The folder structure also sounds good.
>
>
>
> I'm afraid about non-latin symbols in folder and note name. And what about
> hieroglyphs?
>
>
>
> Apache Zeppelin may be the first to use Russian letters in file system in
> our company.
>
> I see a lot of risks to use non-latin symbols and a lot of issues to make
> new folder structure stable.
>
>
>
>
>
>
> ------------------------------
>
> *От:* Jeff Zhang <zj...@gmail.com>
> *Отправлено:* 13 августа 2018 г. 12:50
> *Кому:* users@zeppelin.apache.org
> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of
> [NOTEID]/note.json
>
>
>
> >>> Do we need the note id in the file name at all? What’s wrong with
> just note_name.zpln?
>
> The reason I keep note id is because currently we use noteId to identify
> one note. e.g. we use note id in both websocket api and rest api. It is
> almost impossible to remove noteId for the current architecture. If we put
> note id into file content of note_name.zpln, then we have to read the note
> file every time, then we meet the issues I mentioned above again.
>
>
>
> >>> If the file content is json then why not use note_name.json instead of
> .zpln? That would make it easier for editors to know how to load/highlight
> the file contents.
>
> I am not strongly biased on *.zpln. But I think one purpose is to help
> third parties to identify zeppelin note properly. e.g. github can identify
> jupyter notebook (*.ipynb) and render it properly.
>
>
>
> >>> Is there any reason for not using *real* folders or directories for
> organising the notebooks rather than embedding the folder hierarchy in the
> names of the notebooks?  If someone wants to ‘move’ the notebooks to
> another folder they’d have to manually rename all the files/notebooks at
> present.  That’s not very user-friendly.
>
>
>
> Actually my proposal is to use real folders. What user see in zeppelin
> note menu is the actual notes folder structure. If they want to move the
> notebooks to another folder, they can change the folder name just like what
> user did in file system.
>
>
>
>
>
>
>
>
>
>
>
> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一 下午
> 4:43写道:
>
> Hi Jeff,
>
> I have some questions about this proposal (I can’t edit the design doc):
>
>
>
>    1. Do we need the note id in the file name at all? What’s wrong with
>    just note_name.zpln?
>    2. If the file content is json then why not use note_name.json instead
>    of .zpln? That would make it easier for editors to know how to
>    load/highlight the file contents.
>    3. Is there any reason for not using *real* folders or directories for
>    organising the notebooks rather than embedding the folder hierarchy in the
>    names of the notebooks?  If someone wants to ‘move’ the notebooks to
>    another folder they’d have to manually rename all the files/notebooks at
>    present.  That’s not very user-friendly.
>
>
>
> Thanks, Lucas.
>
> *From:* Jeff Zhang <zj...@gmail.com>
> *Sent:* 13 August 2018 09:06
> *To:* users@zeppelin.apache.org
> *Cc:* dev <de...@zeppelin.apache.org>
> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> instead of [NOTEID]/note.json
>
>
>
> In that case, zeppelin should fail to create note.
>
>
>
> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>
> Perhaps one concern is users having characters in note name that are
> invalid for file name/file path?
>
>
>
>
> ------------------------------
>
> *From:* Mohit Jaggi <mo...@gmail.com>
> *Sent:* Sunday, August 12, 2018 6:02 PM
> *To:* users@zeppelin.apache.org
> *Cc:* dev
> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
> of [NOTEID]/note.json
>
>
>
> sounds like a good idea!
>
>
>
> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>
> Motivation
>
>    The motivation of ZEPPELIN-2619 is to change the notes storage
> structure. Previously we store it using {noteId}/note.json, we’d like to
> change it into {note_name}_{note_id}.zpln. There are several reasons for
> this change.
>
>
>
>    1. {noteId}/note.json is not scalable. We put all notes in one root
>    folder in flat structure. And when zeppelin server starts, we need to read
>    all note.json to get the note file name and build the note folder structure
>    (Because we need to get the note name which is stored in note.json to build
>    the notebook menu). This would be a nightmare when you have large amounts
>    of notes.
>    2. {noteId}/note.json is not maintainable. It is difficult for a
>    developer/administrator to find note file based on note name.
>    3. {noteId}/note.json has no folder structure. Currently zeppelin have
>    to build the folder structure internally in memory according note name
>    which is a big overhead.
>
>
> New Approach
>
>    As I mentioned above, I propose to change the note storage structure to
> {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
> folder_1/mynote_abcd.zpln
>
> This kind of note storage structure could bring several benefits.
>
>    1. We don’t need to load all notes when zeppelin starts. We just need
>    to list each folder to get the note name and note_id.
>    2. It is much maintainable so that it is easy to find the note file
>    based on note name.
>    3. It has the folder structure already. That can be mapped to the note
>    folder structure.
>
>
> Side Effect
>
> This approach only works for file system storage, so that means we have to
> drop support for MongoNotebookRepo. I think it is ok because I didn’t see
> any users talk about this in community, so I assume no one is using it.
>
>
>
> This is overall design, welcome any comments and feedback. Thanks.
>
>
>
> Here's the google docs, you can also comment it here.
>
>
> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>
>
>
>
>
>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Szuromi Tamás <tr...@gmail.com>.
Hey guys,

Great idea.
FYI I created a feature request to Gitlab to render Zeppelin notebooks
after the issue will be finalized and you will change to .zpln.
https://gitlab.com/gitlab-org/gitlab-ce/issues/50244

Partridge, Lucas (GE Aviation) <Lu...@ge.com> ezt írta (időpont:
2018. aug. 14., K, 10:29):

> I agree you’re inviting consistency issues if you maintained a separate
> note id-to-note name mapping file.
>
>
>
> But I’m still not comfortable with note ids in the name of the notebook
> itself.  Those names would look ugly if you shared your notebooks on github
> for example.  You don’t see Jupyter notebooks with names like that.  If you
> have to keep the note ids with the notebooks could you not simply put the
> note id at the top of the notebook as Ruslan suggested? Then you’d only
> have to read the first line of each notebook.
>
>
>
> Presumably if you copied the notebooks to another Zeppelin server they
> would be restored with the same note ids there too? And hopefully there
> would be no id clash with notebooks already on that server…
>
>
>
> *From:* Jeff Zhang <zj...@gmail.com>
> *Sent:* 14 August 2018 03:49
> *To:* users@zeppelin.apache.org
> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> instead of [NOTEID]/note.json
>
>
>
>
>
> Thanks for the discussion.
>
> >>> I'm afraid about non-latin symbols in folder and note name. And what
> about hieroglyphs?
>
> AFAIK, linux allow all the characters to be file name except `\0` and
> '/'.  I can create file name with Chinese character in linux, I guess you
> can use Russian as well.
>
>
>
> >>> If I understand correctly, this is being done solely to speed up
> loading list of notebooks? What if a list of notebook names, their ids,
> folder structure, etc can be *cached* in a separate small json file? Or
> perhaps in a small embedded key-value store, like www.mapdb.org would do?
> Just thinking out loud. This would require a way to lazily re-sync the
> cache.
>
>
>
> This not only to speed up the loading but also make the system
> architecture easy to maintain. Because for now we have to build the folder
> structure of notes in memory, many code in zeppelin is doing this
> (Personally I don't think we need any code for this function if we could
> get the folder structure from the note file storage system). Use another
> storage to keep the mapping of note name and note id will bring another
> classic problem of distributed system: consistency. How do we make sure the
> consistency between the real note file and this mapping component. If we
> create/rename/remove note, we have to both update the notebook repo and the
> mapping storage. Any bug in code would bring inconsistency issue based on
> my experience.
>
>
>
>
>
>
>
>
>
> Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:
>
> Thanks for bringing this up for discussion. My 2 cents below.
>
>
>
> I am with Maksim and Felix on concerns with special characters now allowed
> in notebook names, and also concerns with different charsets. Russian
> language, for example, most commonly use iso-8859-5, koi-8r/u, windows-1251
> charsets etc. This seems like will bring whole new set of localization
> issues.
>
>
>
> If I understand correctly, this is being done solely to speed up loading
> list of notebooks? What if a list of notebook names, their ids, folder
> structure, etc can be *cached* in a separate small json file? Or perhaps in
> a small embedded key-value store, like www.mapdb.org would do? Just
> thinking out loud. This would require a way to lazily re-sync the cache.
>
>
>
> Another way to speed up json reads is to somehow force "name" attribute to
> be at the top of the json document that's written to disk. Then
> re-implement json files reader to read just header of the file and do a
> partial json parse ( or in the lack of options, grab "name" attribute from
> the json file header by a regex for example).
>
>
>
> Back to filenames and charsets, I think issue may be more complicated, if
> you store notebooks on a remote filesystem (nfs/ samba etc), and what if
> remote server and local nfs client have differences in default fs charsets?
>
>
>
> Ideally would be if all filesystems would use UTF-8 for example, but I am
> not certain that's a good assumption to make. Also exposing notebook names
> can bring some other issues, like I know some users occasionally add
> trailing/leading spaces etc.
>
>
>
>
>
> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> m.belousov@tinkoff.ru> wrote:
>
> The use of Russian and other specific letters in the note name is big
> advantage of Zeppelin. I would not like to give up this functionality.
>
>
>
> I support the idea about `zpln` file extension.
>
> The folder structure also sounds good.
>
>
>
> I'm afraid about non-latin symbols in folder and note name. And what about
> hieroglyphs?
>
>
>
> Apache Zeppelin may be the first to use Russian letters in file system in
> our company.
>
> I see a lot of risks to use non-latin symbols and a lot of issues to make
> new folder structure stable.
>
>
>
>
>
>
> ------------------------------
>
> *От:* Jeff Zhang <zj...@gmail.com>
> *Отправлено:* 13 августа 2018 г. 12:50
> *Кому:* users@zeppelin.apache.org
> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of
> [NOTEID]/note.json
>
>
>
> >>> Do we need the note id in the file name at all? What’s wrong with
> just note_name.zpln?
>
> The reason I keep note id is because currently we use noteId to identify
> one note. e.g. we use note id in both websocket api and rest api. It is
> almost impossible to remove noteId for the current architecture. If we put
> note id into file content of note_name.zpln, then we have to read the note
> file every time, then we meet the issues I mentioned above again.
>
>
>
> >>> If the file content is json then why not use note_name.json instead of
> .zpln? That would make it easier for editors to know how to load/highlight
> the file contents.
>
> I am not strongly biased on *.zpln. But I think one purpose is to help
> third parties to identify zeppelin note properly. e.g. github can identify
> jupyter notebook (*.ipynb) and render it properly.
>
>
>
> >>> Is there any reason for not using *real* folders or directories for
> organising the notebooks rather than embedding the folder hierarchy in the
> names of the notebooks?  If someone wants to ‘move’ the notebooks to
> another folder they’d have to manually rename all the files/notebooks at
> present.  That’s not very user-friendly.
>
>
>
> Actually my proposal is to use real folders. What user see in zeppelin
> note menu is the actual notes folder structure. If they want to move the
> notebooks to another folder, they can change the folder name just like what
> user did in file system.
>
>
>
>
>
>
>
>
>
>
>
> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一 下午
> 4:43写道:
>
> Hi Jeff,
>
> I have some questions about this proposal (I can’t edit the design doc):
>
>
>
>    1. Do we need the note id in the file name at all? What’s wrong with
>    just note_name.zpln?
>    2. If the file content is json then why not use note_name.json instead
>    of .zpln? That would make it easier for editors to know how to
>    load/highlight the file contents.
>    3. Is there any reason for not using *real* folders or directories for
>    organising the notebooks rather than embedding the folder hierarchy in the
>    names of the notebooks?  If someone wants to ‘move’ the notebooks to
>    another folder they’d have to manually rename all the files/notebooks at
>    present.  That’s not very user-friendly.
>
>
>
> Thanks, Lucas.
>
> *From:* Jeff Zhang <zj...@gmail.com>
> *Sent:* 13 August 2018 09:06
> *To:* users@zeppelin.apache.org
> *Cc:* dev <de...@zeppelin.apache.org>
> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> instead of [NOTEID]/note.json
>
>
>
> In that case, zeppelin should fail to create note.
>
>
>
> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>
> Perhaps one concern is users having characters in note name that are
> invalid for file name/file path?
>
>
>
>
> ------------------------------
>
> *From:* Mohit Jaggi <mo...@gmail.com>
> *Sent:* Sunday, August 12, 2018 6:02 PM
> *To:* users@zeppelin.apache.org
> *Cc:* dev
> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
> of [NOTEID]/note.json
>
>
>
> sounds like a good idea!
>
>
>
> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>
> Motivation
>
>    The motivation of ZEPPELIN-2619 is to change the notes storage
> structure. Previously we store it using {noteId}/note.json, we’d like to
> change it into {note_name}_{note_id}.zpln. There are several reasons for
> this change.
>
>
>
>    1. {noteId}/note.json is not scalable. We put all notes in one root
>    folder in flat structure. And when zeppelin server starts, we need to read
>    all note.json to get the note file name and build the note folder structure
>    (Because we need to get the note name which is stored in note.json to build
>    the notebook menu). This would be a nightmare when you have large amounts
>    of notes.
>    2. {noteId}/note.json is not maintainable. It is difficult for a
>    developer/administrator to find note file based on note name.
>    3. {noteId}/note.json has no folder structure. Currently zeppelin have
>    to build the folder structure internally in memory according note name
>    which is a big overhead.
>
>
> New Approach
>
>    As I mentioned above, I propose to change the note storage structure to
> {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
> folder_1/mynote_abcd.zpln
>
> This kind of note storage structure could bring several benefits.
>
>    1. We don’t need to load all notes when zeppelin starts. We just need
>    to list each folder to get the note name and note_id.
>    2. It is much maintainable so that it is easy to find the note file
>    based on note name.
>    3. It has the folder structure already. That can be mapped to the note
>    folder structure.
>
>
> Side Effect
>
> This approach only works for file system storage, so that means we have to
> drop support for MongoNotebookRepo. I think it is ok because I didn’t see
> any users talk about this in community, so I assume no one is using it.
>
>
>
> This is overall design, welcome any comments and feedback. Thanks.
>
>
>
> Here's the google docs, you can also comment it here.
>
>
> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>
>
>
>
>
>

[DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by "Partridge, Lucas (GE Aviation)" <Lu...@ge.com>.
I agree you’re inviting consistency issues if you maintained a separate note id-to-note name mapping file.

But I’m still not comfortable with note ids in the name of the notebook itself.  Those names would look ugly if you shared your notebooks on github for example.  You don’t see Jupyter notebooks with names like that.  If you have to keep the note ids with the notebooks could you not simply put the note id at the top of the notebook as Ruslan suggested? Then you’d only have to read the first line of each notebook.

Presumably if you copied the notebooks to another Zeppelin server they would be restored with the same note ids there too? And hopefully there would be no id clash with notebooks already on that server…

From: Jeff Zhang <zj...@gmail.com>
Sent: 14 August 2018 03:49
To: users@zeppelin.apache.org
Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json


Thanks for the discussion.
>>> I'm afraid about non-latin symbols in folder and note name. And what about hieroglyphs?
AFAIK, linux allow all the characters to be file name except `\0` and '/'.  I can create file name with Chinese character in linux, I guess you can use Russian as well.

>>> If I understand correctly, this is being done solely to speed up loading list of notebooks? What if a list of notebook names, their ids, folder structure, etc can be *cached* in a separate small json file? Or perhaps in a small embedded key-value store, like www.mapdb.org<http://www.mapdb.org/> would do? Just thinking out loud. This would require a way to lazily re-sync the cache.

This not only to speed up the loading but also make the system architecture easy to maintain. Because for now we have to build the folder structure of notes in memory, many code in zeppelin is doing this (Personally I don't think we need any code for this function if we could get the folder structure from the note file storage system). Use another storage to keep the mapping of note name and note id will bring another classic problem of distributed system: consistency. How do we make sure the consistency between the real note file and this mapping component. If we create/rename/remove note, we have to both update the notebook repo and the mapping storage. Any bug in code would bring inconsistency issue based on my experience.




Ruslan Dautkhanov <da...@gmail.com>>于2018年8月14日周二 上午3:58写道:
Thanks for bringing this up for discussion. My 2 cents below.

I am with Maksim and Felix on concerns with special characters now allowed in notebook names, and also concerns with different charsets. Russian language, for example, most commonly use iso-8859-5, koi-8r/u, windows-1251 charsets etc. This seems like will bring whole new set of localization issues.

If I understand correctly, this is being done solely to speed up loading list of notebooks? What if a list of notebook names, their ids, folder structure, etc can be *cached* in a separate small json file? Or perhaps in a small embedded key-value store, like www.mapdb.org<http://www.mapdb.org> would do? Just thinking out loud. This would require a way to lazily re-sync the cache.

Another way to speed up json reads is to somehow force "name" attribute to be at the top of the json document that's written to disk. Then re-implement json files reader to read just header of the file and do a partial json parse ( or in the lack of options, grab "name" attribute from the json file header by a regex for example).

Back to filenames and charsets, I think issue may be more complicated, if you store notebooks on a remote filesystem (nfs/ samba etc), and what if remote server and local nfs client have differences in default fs charsets?

Ideally would be if all filesystems would use UTF-8 for example, but I am not certain that's a good assumption to make. Also exposing notebook names can bring some other issues, like I know some users occasionally add trailing/leading spaces etc.


On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <m....@tinkoff.ru>> wrote:
The use of Russian and other specific letters in the note name is big advantage of Zeppelin. I would not like to give up this functionality.

I support the idea about `zpln` file extension.
The folder structure also sounds good.

I'm afraid about non-latin symbols in folder and note name. And what about hieroglyphs?

Apache Zeppelin may be the first to use Russian letters in file system in our company.
I see a lot of risks to use non-latin symbols and a lot of issues to make new folder structure stable.





________________________________
От: Jeff Zhang <zj...@gmail.com>>
Отправлено: 13 августа 2018 г. 12:50
Кому: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Тема: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

>>> Do we need the note id in the file name at all? What’s wrong with just note_name.zpln?
The reason I keep note id is because currently we use noteId to identify one note. e.g. we use note id in both websocket api and rest api. It is almost impossible to remove noteId for the current architecture. If we put note id into file content of note_name.zpln, then we have to read the note file every time, then we meet the issues I mentioned above again.

>>> If the file content is json then why not use note_name.json instead of .zpln? That would make it easier for editors to know how to load/highlight the file contents.
I am not strongly biased on *.zpln. But I think one purpose is to help third parties to identify zeppelin note properly. e.g. github can identify jupyter notebook (*.ipynb) and render it properly.

>>> Is there any reason for not using real folders or directories for organising the notebooks rather than embedding the folder hierarchy in the names of the notebooks?  If someone wants to ‘move’ the notebooks to another folder they’d have to manually rename all the files/notebooks at present.  That’s not very user-friendly.

Actually my proposal is to use real folders. What user see in zeppelin note menu is the actual notes folder structure. If they want to move the notebooks to another folder, they can change the folder name just like what user did in file system.





Partridge, Lucas (GE Aviation) <Lu...@ge.com>>于2018年8月13日周一 下午4:43写道:
Hi Jeff,
I have some questions about this proposal (I can’t edit the design doc):


  1.  Do we need the note id in the file name at all? What’s wrong with just note_name.zpln?
  2.  If the file content is json then why not use note_name.json instead of .zpln? That would make it easier for editors to know how to load/highlight the file contents.
  3.  Is there any reason for not using real folders or directories for organising the notebooks rather than embedding the folder hierarchy in the names of the notebooks?  If someone wants to ‘move’ the notebooks to another folder they’d have to manually rename all the files/notebooks at present.  That’s not very user-friendly.

Thanks, Lucas.
From: Jeff Zhang <zj...@gmail.com>>
Sent: 13 August 2018 09:06
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Cc: dev <de...@zeppelin.apache.org>>
Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

In that case, zeppelin should fail to create note.

Felix Cheung <fe...@hotmail.com>>于2018年8月13日周一 下午3:47写道:
Perhaps one concern is users having characters in note name that are invalid for file name/file path?


________________________________
From: Mohit Jaggi <mo...@gmail.com>>
Sent: Sunday, August 12, 2018 6:02 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Cc: dev
Subject: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

sounds like a good idea!

On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>> wrote:
Motivation

   The motivation of ZEPPELIN-2619 is to change the notes storage structure. Previously we store it using {noteId}/note.json, we’d like to change it into {note_name}_{note_id}.zpln. There are several reasons for this change.


  1.  {noteId}/note.json is not scalable. We put all notes in one root folder in flat structure. And when zeppelin server starts, we need to read all note.json to get the note file name and build the note folder structure (Because we need to get the note name which is stored in note.json to build the notebook menu). This would be a nightmare when you have large amounts of notes.
  2.  {noteId}/note.json is not maintainable. It is difficult for a developer/administrator to find note file based on note name.
  3.  {noteId}/note.json has no folder structure. Currently zeppelin have to build the folder structure internally in memory according note name which is a big overhead.

New Approach

   As I mentioned above, I propose to change the note storage structure to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g. folder_1/mynote_abcd.zpln

This kind of note storage structure could bring several benefits.

  1.  We don’t need to load all notes when zeppelin starts. We just need to list each folder to get the note name and note_id.
  2.  It is much maintainable so that it is easy to find the note file based on note name.
  3.  It has the folder structure already. That can be mapped to the note folder structure.

Side Effect

This approach only works for file system storage, so that means we have to drop support for MongoNotebookRepo. I think it is ok because I didn’t see any users talk about this in community, so I assume no one is using it.



This is overall design, welcome any comments and feedback. Thanks.



Here's the google docs, you can also comment it here.

https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing




Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jeff Zhang <zj...@gmail.com>.
Thanks for the discussion.
>>> I'm afraid about non-latin symbols in folder and note name. And what
about hieroglyphs?
AFAIK, linux allow all the characters to be file name except `\0` and '/'.
I can create file name with Chinese character in linux, I guess you can use
Russian as well.

>>> If I understand correctly, this is being done solely to speed up
loading list of notebooks? What if a list of notebook names, their ids,
folder structure, etc can be *cached* in a separate small json file? Or
perhaps in a small embedded key-value store, like www.mapdb.org would do?
Just thinking out loud. This would require a way to lazily re-sync the
cache.

This not only to speed up the loading but also make the system architecture
easy to maintain. Because for now we have to build the folder structure of
notes in memory, many code in zeppelin is doing this (Personally I don't
think we need any code for this function if we could get the folder
structure from the note file storage system). Use another storage to keep
the mapping of note name and note id will bring another classic problem of
distributed system: consistency. How do we make sure the
consistency between the real note file and this mapping component. If we
create/rename/remove note, we have to both update the notebook repo and the
mapping storage. Any bug in code would bring inconsistency issue based on
my experience.




Ruslan Dautkhanov <da...@gmail.com>于2018年8月14日周二 上午3:58写道:

> Thanks for bringing this up for discussion. My 2 cents below.
>
> I am with Maksim and Felix on concerns with special characters now allowed
> in notebook names, and also concerns with different charsets. Russian
> language, for example, most commonly use iso-8859-5, koi-8r/u, windows-1251
> charsets etc. This seems like will bring whole new set of localization
> issues.
>
> If I understand correctly, this is being done solely to speed up loading
> list of notebooks? What if a list of notebook names, their ids, folder
> structure, etc can be *cached* in a separate small json file? Or perhaps in
> a small embedded key-value store, like www.mapdb.org would do? Just
> thinking out loud. This would require a way to lazily re-sync the cache.
>
> Another way to speed up json reads is to somehow force "name" attribute to
> be at the top of the json document that's written to disk. Then
> re-implement json files reader to read just header of the file and do a
> partial json parse ( or in the lack of options, grab "name" attribute from
> the json file header by a regex for example).
>
> Back to filenames and charsets, I think issue may be more complicated, if
> you store notebooks on a remote filesystem (nfs/ samba etc), and what if
> remote server and local nfs client have differences in default fs charsets?
>
> Ideally would be if all filesystems would use UTF-8 for example, but I am
> not certain that's a good assumption to make. Also exposing notebook names
> can bring some other issues, like I know some users occasionally add
> trailing/leading spaces etc.
>
>
> On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
> m.belousov@tinkoff.ru> wrote:
>
>> The use of Russian and other specific letters in the note name is big
>> advantage of Zeppelin. I would not like to give up this functionality.
>>
>> I support the idea about `zpln` file extension.
>> The folder structure also sounds good.
>>
>> I'm afraid about non-latin symbols in folder and note name. And what
>> about hieroglyphs?
>>
>> Apache Zeppelin may be the first to use Russian letters in file system in
>> our company.
>> I see a lot of risks to use non-latin symbols and a lot of issues to make
>> new folder structure stable.
>>
>>
>>
>> ------------------------------
>> *От:* Jeff Zhang <zj...@gmail.com>
>> *Отправлено:* 13 августа 2018 г. 12:50
>> *Кому:* users@zeppelin.apache.org
>> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
>> of [NOTEID]/note.json
>>
>> >>> Do we need the note id in the file name at all? What’s wrong with
>> just note_name.zpln?
>> The reason I keep note id is because currently we use noteId to identify
>> one note. e.g. we use note id in both websocket api and rest api. It is
>> almost impossible to remove noteId for the current architecture. If we put
>> note id into file content of note_name.zpln, then we have to read the note
>> file every time, then we meet the issues I mentioned above again.
>>
>> >>> If the file content is json then why not use note_name.json instead
>> of .zpln? That would make it easier for editors to know how to
>> load/highlight the file contents.
>> I am not strongly biased on *.zpln. But I think one purpose is to help
>> third parties to identify zeppelin note properly. e.g. github can identify
>> jupyter notebook (*.ipynb) and render it properly.
>>
>> >>> Is there any reason for not using *real* folders or directories for
>> organising the notebooks rather than embedding the folder hierarchy in the
>> names of the notebooks?  If someone wants to ‘move’ the notebooks to
>> another folder they’d have to manually rename all the files/notebooks at
>> present.  That’s not very user-friendly.
>>
>> Actually my proposal is to use real folders. What user see in zeppelin
>> note menu is the actual notes folder structure. If they want to move the
>> notebooks to another folder, they can change the folder name just like what
>> user did in file system.
>>
>>
>>
>>
>>
>> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一
>> 下午4:43写道:
>>
>>> Hi Jeff,
>>>
>>> I have some questions about this proposal (I can’t edit the design doc):
>>>
>>>
>>>
>>>    1. Do we need the note id in the file name at all? What’s wrong with
>>>    just note_name.zpln?
>>>
>>>    2. If the file content is json then why not use note_name.json
>>>    instead of .zpln? That would make it easier for editors to know how to
>>>    load/highlight the file contents.
>>>
>>>    3. Is there any reason for not using *real* folders or directories
>>>    for organising the notebooks rather than embedding the folder hierarchy in
>>>    the names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>>    another folder they’d have to manually rename all the files/notebooks at
>>>    present.  That’s not very user-friendly.
>>>
>>>
>>>
>>> Thanks, Lucas.
>>>
>>> *From:* Jeff Zhang <zj...@gmail.com>
>>> *Sent:* 13 August 2018 09:06
>>> *To:* users@zeppelin.apache.org
>>> *Cc:* dev <de...@zeppelin.apache.org>
>>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>> instead of [NOTEID]/note.json
>>>
>>>
>>>
>>> In that case, zeppelin should fail to create note.
>>>
>>>
>>>
>>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>>>
>>> Perhaps one concern is users having characters in note name that are
>>> invalid for file name/file path?
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Mohit Jaggi <mo...@gmail.com>
>>> *Sent:* Sunday, August 12, 2018 6:02 PM
>>> *To:* users@zeppelin.apache.org
>>> *Cc:* dev
>>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>>> instead of [NOTEID]/note.json
>>>
>>>
>>>
>>> sounds like a good idea!
>>>
>>>
>>>
>>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>
>>> Motivation
>>>
>>>    The motivation of ZEPPELIN-2619 is to change the notes storage
>>> structure. Previously we store it using {noteId}/note.json, we’d like to
>>> change it into {note_name}_{note_id}.zpln. There are several reasons for
>>> this change.
>>>
>>>
>>>
>>>    1. {noteId}/note.json is not scalable. We put all notes in one root
>>>    folder in flat structure. And when zeppelin server starts, we need to read
>>>    all note.json to get the note file name and build the note folder structure
>>>    (Because we need to get the note name which is stored in note.json to build
>>>    the notebook menu). This would be a nightmare when you have large amounts
>>>    of notes.
>>>    2. {noteId}/note.json is not maintainable. It is difficult for a
>>>    developer/administrator to find note file based on note name.
>>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
>>>    have to build the folder structure internally in memory according note name
>>>    which is a big overhead.
>>>
>>>
>>> New Approach
>>>
>>>    As I mentioned above, I propose to change the note storage structure
>>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
>>> folder_1/mynote_abcd.zpln
>>>
>>> This kind of note storage structure could bring several benefits.
>>>
>>>    1. We don’t need to load all notes when zeppelin starts. We just
>>>    need to list each folder to get the note name and note_id.
>>>    2. It is much maintainable so that it is easy to find the note file
>>>    based on note name.
>>>    3. It has the folder structure already. That can be mapped to the
>>>    note folder structure.
>>>
>>>
>>> Side Effect
>>>
>>> This approach only works for file system storage, so that means we have
>>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
>>> see any users talk about this in community, so I assume no one is using it.
>>>
>>>
>>>
>>> This is overall design, welcome any comments and feedback. Thanks.
>>>
>>>
>>>
>>> Here's the google docs, you can also comment it here.
>>>
>>>
>>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>>>
>>>
>>>
>>>
>>>
>>>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Thanks for bringing this up for discussion. My 2 cents below.

I am with Maksim and Felix on concerns with special characters now allowed
in notebook names, and also concerns with different charsets. Russian
language, for example, most commonly use iso-8859-5, koi-8r/u, windows-1251
charsets etc. This seems like will bring whole new set of localization
issues.

If I understand correctly, this is being done solely to speed up loading
list of notebooks? What if a list of notebook names, their ids, folder
structure, etc can be *cached* in a separate small json file? Or perhaps in
a small embedded key-value store, like www.mapdb.org would do? Just
thinking out loud. This would require a way to lazily re-sync the cache.

Another way to speed up json reads is to somehow force "name" attribute to
be at the top of the json document that's written to disk. Then
re-implement json files reader to read just header of the file and do a
partial json parse ( or in the lack of options, grab "name" attribute from
the json file header by a regex for example).

Back to filenames and charsets, I think issue may be more complicated, if
you store notebooks on a remote filesystem (nfs/ samba etc), and what if
remote server and local nfs client have differences in default fs charsets?

Ideally would be if all filesystems would use UTF-8 for example, but I am
not certain that's a good assumption to make. Also exposing notebook names
can bring some other issues, like I know some users occasionally add
trailing/leading spaces etc.


On Mon, Aug 13, 2018 at 10:38 AM Belousov Maksim Eduardovich <
m.belousov@tinkoff.ru> wrote:

> The use of Russian and other specific letters in the note name is big
> advantage of Zeppelin. I would not like to give up this functionality.
>
> I support the idea about `zpln` file extension.
> The folder structure also sounds good.
>
> I'm afraid about non-latin symbols in folder and note name. And what about
> hieroglyphs?
>
> Apache Zeppelin may be the first to use Russian letters in file system in
> our company.
> I see a lot of risks to use non-latin symbols and a lot of issues to make
> new folder structure stable.
>
>
>
> ------------------------------
> *От:* Jeff Zhang <zj...@gmail.com>
> *Отправлено:* 13 августа 2018 г. 12:50
> *Кому:* users@zeppelin.apache.org
> *Тема:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of
> [NOTEID]/note.json
>
> >>> Do we need the note id in the file name at all? What’s wrong with
> just note_name.zpln?
> The reason I keep note id is because currently we use noteId to identify
> one note. e.g. we use note id in both websocket api and rest api. It is
> almost impossible to remove noteId for the current architecture. If we put
> note id into file content of note_name.zpln, then we have to read the note
> file every time, then we meet the issues I mentioned above again.
>
> >>> If the file content is json then why not use note_name.json instead
> of .zpln? That would make it easier for editors to know how to
> load/highlight the file contents.
> I am not strongly biased on *.zpln. But I think one purpose is to help
> third parties to identify zeppelin note properly. e.g. github can identify
> jupyter notebook (*.ipynb) and render it properly.
>
> >>> Is there any reason for not using *real* folders or directories for
> organising the notebooks rather than embedding the folder hierarchy in the
> names of the notebooks?  If someone wants to ‘move’ the notebooks to
> another folder they’d have to manually rename all the files/notebooks at
> present.  That’s not very user-friendly.
>
> Actually my proposal is to use real folders. What user see in zeppelin
> note menu is the actual notes folder structure. If they want to move the
> notebooks to another folder, they can change the folder name just like what
> user did in file system.
>
>
>
>
>
> Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一
> 下午4:43写道:
>
>> Hi Jeff,
>>
>> I have some questions about this proposal (I can’t edit the design doc):
>>
>>
>>
>>    1. Do we need the note id in the file name at all? What’s wrong with
>>    just note_name.zpln?
>>
>>    2. If the file content is json then why not use note_name.json
>>    instead of .zpln? That would make it easier for editors to know how to
>>    load/highlight the file contents.
>>
>>    3. Is there any reason for not using *real* folders or directories
>>    for organising the notebooks rather than embedding the folder hierarchy in
>>    the names of the notebooks?  If someone wants to ‘move’ the notebooks to
>>    another folder they’d have to manually rename all the files/notebooks at
>>    present.  That’s not very user-friendly.
>>
>>
>>
>> Thanks, Lucas.
>>
>> *From:* Jeff Zhang <zj...@gmail.com>
>> *Sent:* 13 August 2018 09:06
>> *To:* users@zeppelin.apache.org
>> *Cc:* dev <de...@zeppelin.apache.org>
>> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>> instead of [NOTEID]/note.json
>>
>>
>>
>> In that case, zeppelin should fail to create note.
>>
>>
>>
>> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>>
>> Perhaps one concern is users having characters in note name that are
>> invalid for file name/file path?
>>
>>
>>
>>
>> ------------------------------
>>
>> *From:* Mohit Jaggi <mo...@gmail.com>
>> *Sent:* Sunday, August 12, 2018 6:02 PM
>> *To:* users@zeppelin.apache.org
>> *Cc:* dev
>> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
>> instead of [NOTEID]/note.json
>>
>>
>>
>> sounds like a good idea!
>>
>>
>>
>> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>> Motivation
>>
>>    The motivation of ZEPPELIN-2619 is to change the notes storage
>> structure. Previously we store it using {noteId}/note.json, we’d like to
>> change it into {note_name}_{note_id}.zpln. There are several reasons for
>> this change.
>>
>>
>>
>>    1. {noteId}/note.json is not scalable. We put all notes in one root
>>    folder in flat structure. And when zeppelin server starts, we need to read
>>    all note.json to get the note file name and build the note folder structure
>>    (Because we need to get the note name which is stored in note.json to build
>>    the notebook menu). This would be a nightmare when you have large amounts
>>    of notes.
>>    2. {noteId}/note.json is not maintainable. It is difficult for a
>>    developer/administrator to find note file based on note name.
>>    3. {noteId}/note.json has no folder structure. Currently zeppelin
>>    have to build the folder structure internally in memory according note name
>>    which is a big overhead.
>>
>>
>> New Approach
>>
>>    As I mentioned above, I propose to change the note storage structure
>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
>> folder_1/mynote_abcd.zpln
>>
>> This kind of note storage structure could bring several benefits.
>>
>>    1. We don’t need to load all notes when zeppelin starts. We just need
>>    to list each folder to get the note name and note_id.
>>    2. It is much maintainable so that it is easy to find the note file
>>    based on note name.
>>    3. It has the folder structure already. That can be mapped to the
>>    note folder structure.
>>
>>
>> Side Effect
>>
>> This approach only works for file system storage, so that means we have
>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
>> see any users talk about this in community, so I assume no one is using it.
>>
>>
>>
>> This is overall design, welcome any comments and feedback. Thanks.
>>
>>
>>
>> Here's the google docs, you can also comment it here.
>>
>>
>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>>
>>
>>
>>
>>
>>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Belousov Maksim Eduardovich <m....@tinkoff.ru>.
The use of Russian and other specific letters in the note name is big advantage of Zeppelin. I would not like to give up this functionality.

I support the idea about `zpln` file extension.
The folder structure also sounds good.

I'm afraid about non-latin symbols in folder and note name. And what about hieroglyphs?

Apache Zeppelin may be the first to use Russian letters in file system in our company.
I see a lot of risks to use non-latin symbols and a lot of issues to make new folder structure stable.




________________________________
От: Jeff Zhang <zj...@gmail.com>
Отправлено: 13 августа 2018 г. 12:50
Кому: users@zeppelin.apache.org
Тема: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

>>> Do we need the note id in the file name at all? What’s wrong with just note_name.zpln?
The reason I keep note id is because currently we use noteId to identify one note. e.g. we use note id in both websocket api and rest api. It is almost impossible to remove noteId for the current architecture. If we put note id into file content of note_name.zpln, then we have to read the note file every time, then we meet the issues I mentioned above again.

>>> If the file content is json then why not use note_name.json instead of .zpln? That would make it easier for editors to know how to load/highlight the file contents.
I am not strongly biased on *.zpln. But I think one purpose is to help third parties to identify zeppelin note properly. e.g. github can identify jupyter notebook (*.ipynb) and render it properly.

>>> Is there any reason for not using real folders or directories for organising the notebooks rather than embedding the folder hierarchy in the names of the notebooks?  If someone wants to ‘move’ the notebooks to another folder they’d have to manually rename all the files/notebooks at present.  That’s not very user-friendly.

Actually my proposal is to use real folders. What user see in zeppelin note menu is the actual notes folder structure. If they want to move the notebooks to another folder, they can change the folder name just like what user did in file system.





Partridge, Lucas (GE Aviation) <Lu...@ge.com>>于2018年8月13日周一 下午4:43写道:
Hi Jeff,
I have some questions about this proposal (I can’t edit the design doc):


  1.  Do we need the note id in the file name at all? What’s wrong with just note_name.zpln?

  2.  If the file content is json then why not use note_name.json instead of .zpln? That would make it easier for editors to know how to load/highlight the file contents.

  3.  Is there any reason for not using real folders or directories for organising the notebooks rather than embedding the folder hierarchy in the names of the notebooks?  If someone wants to ‘move’ the notebooks to another folder they’d have to manually rename all the files/notebooks at present.  That’s not very user-friendly.

Thanks, Lucas.
From: Jeff Zhang <zj...@gmail.com>>
Sent: 13 August 2018 09:06
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Cc: dev <de...@zeppelin.apache.org>>
Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

In that case, zeppelin should fail to create note.

Felix Cheung <fe...@hotmail.com>>于2018年8月13日周一 下午3:47写道:
Perhaps one concern is users having characters in note name that are invalid for file name/file path?


________________________________
From: Mohit Jaggi <mo...@gmail.com>>
Sent: Sunday, August 12, 2018 6:02 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Cc: dev
Subject: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

sounds like a good idea!

On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>> wrote:
Motivation

   The motivation of ZEPPELIN-2619 is to change the notes storage structure. Previously we store it using {noteId}/note.json, we’d like to change it into {note_name}_{note_id}.zpln. There are several reasons for this change.


  1.  {noteId}/note.json is not scalable. We put all notes in one root folder in flat structure. And when zeppelin server starts, we need to read all note.json to get the note file name and build the note folder structure (Because we need to get the note name which is stored in note.json to build the notebook menu). This would be a nightmare when you have large amounts of notes.
  2.  {noteId}/note.json is not maintainable. It is difficult for a developer/administrator to find note file based on note name.
  3.  {noteId}/note.json has no folder structure. Currently zeppelin have to build the folder structure internally in memory according note name which is a big overhead.

New Approach

   As I mentioned above, I propose to change the note storage structure to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g. folder_1/mynote_abcd.zpln

This kind of note storage structure could bring several benefits.

  1.  We don’t need to load all notes when zeppelin starts. We just need to list each folder to get the note name and note_id.
  2.  It is much maintainable so that it is easy to find the note file based on note name.
  3.  It has the folder structure already. That can be mapped to the note folder structure.

Side Effect

This approach only works for file system storage, so that means we have to drop support for MongoNotebookRepo. I think it is ok because I didn’t see any users talk about this in community, so I assume no one is using it.



This is overall design, welcome any comments and feedback. Thanks.



Here's the google docs, you can also comment it here.

https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing




Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jeff Zhang <zj...@gmail.com>.
>>> Do we need the note id in the file name at all? What’s wrong with just
note_name.zpln?
The reason I keep note id is because currently we use noteId to identify
one note. e.g. we use note id in both websocket api and rest api. It is
almost impossible to remove noteId for the current architecture. If we put
note id into file content of note_name.zpln, then we have to read the note
file every time, then we meet the issues I mentioned above again.

>>> If the file content is json then why not use note_name.json instead of
.zpln? That would make it easier for editors to know how to load/highlight
the file contents.
I am not strongly biased on *.zpln. But I think one purpose is to help
third parties to identify zeppelin note properly. e.g. github can identify
jupyter notebook (*.ipynb) and render it properly.

>>> Is there any reason for not using *real* folders or directories for
organising the notebooks rather than embedding the folder hierarchy in the
names of the notebooks?  If someone wants to ‘move’ the notebooks to
another folder they’d have to manually rename all the files/notebooks at
present.  That’s not very user-friendly.

Actually my proposal is to use real folders. What user see in zeppelin note
menu is the actual notes folder structure. If they want to move the
notebooks to another folder, they can change the folder name just like what
user did in file system.





Partridge, Lucas (GE Aviation) <Lu...@ge.com>于2018年8月13日周一
下午4:43写道:

> Hi Jeff,
>
> I have some questions about this proposal (I can’t edit the design doc):
>
>
>
>    1. Do we need the note id in the file name at all? What’s wrong with
>    just note_name.zpln?
>
>    2. If the file content is json then why not use note_name.json instead
>    of .zpln? That would make it easier for editors to know how to
>    load/highlight the file contents.
>
>    3. Is there any reason for not using *real* folders or directories for
>    organising the notebooks rather than embedding the folder hierarchy in the
>    names of the notebooks?  If someone wants to ‘move’ the notebooks to
>    another folder they’d have to manually rename all the files/notebooks at
>    present.  That’s not very user-friendly.
>
>
>
> Thanks, Lucas.
>
> *From:* Jeff Zhang <zj...@gmail.com>
> *Sent:* 13 August 2018 09:06
> *To:* users@zeppelin.apache.org
> *Cc:* dev <de...@zeppelin.apache.org>
> *Subject:* EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln
> instead of [NOTEID]/note.json
>
>
>
> In that case, zeppelin should fail to create note.
>
>
>
> Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:
>
> Perhaps one concern is users having characters in note name that are
> invalid for file name/file path?
>
>
>
>
> ------------------------------
>
> *From:* Mohit Jaggi <mo...@gmail.com>
> *Sent:* Sunday, August 12, 2018 6:02 PM
> *To:* users@zeppelin.apache.org
> *Cc:* dev
> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
> of [NOTEID]/note.json
>
>
>
> sounds like a good idea!
>
>
>
> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>
> Motivation
>
>    The motivation of ZEPPELIN-2619 is to change the notes storage
> structure. Previously we store it using {noteId}/note.json, we’d like to
> change it into {note_name}_{note_id}.zpln. There are several reasons for
> this change.
>
>
>
>    1. {noteId}/note.json is not scalable. We put all notes in one root
>    folder in flat structure. And when zeppelin server starts, we need to read
>    all note.json to get the note file name and build the note folder structure
>    (Because we need to get the note name which is stored in note.json to build
>    the notebook menu). This would be a nightmare when you have large amounts
>    of notes.
>    2. {noteId}/note.json is not maintainable. It is difficult for a
>    developer/administrator to find note file based on note name.
>    3. {noteId}/note.json has no folder structure. Currently zeppelin have
>    to build the folder structure internally in memory according note name
>    which is a big overhead.
>
>
> New Approach
>
>    As I mentioned above, I propose to change the note storage structure to
> {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
> folder_1/mynote_abcd.zpln
>
> This kind of note storage structure could bring several benefits.
>
>    1. We don’t need to load all notes when zeppelin starts. We just need
>    to list each folder to get the note name and note_id.
>    2. It is much maintainable so that it is easy to find the note file
>    based on note name.
>    3. It has the folder structure already. That can be mapped to the note
>    folder structure.
>
>
> Side Effect
>
> This approach only works for file system storage, so that means we have to
> drop support for MongoNotebookRepo. I think it is ok because I didn’t see
> any users talk about this in community, so I assume no one is using it.
>
>
>
> This is overall design, welcome any comments and feedback. Thanks.
>
>
>
> Here's the google docs, you can also comment it here.
>
>
> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>
>
>
>
>
>

[DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by "Partridge, Lucas (GE Aviation)" <Lu...@ge.com>.
Hi Jeff,
I have some questions about this proposal (I can’t edit the design doc):


  1.  Do we need the note id in the file name at all? What’s wrong with just note_name.zpln?

  2.  If the file content is json then why not use note_name.json instead of .zpln? That would make it easier for editors to know how to load/highlight the file contents.

  3.  Is there any reason for not using real folders or directories for organising the notebooks rather than embedding the folder hierarchy in the names of the notebooks?  If someone wants to ‘move’ the notebooks to another folder they’d have to manually rename all the files/notebooks at present.  That’s not very user-friendly.

Thanks, Lucas.
From: Jeff Zhang <zj...@gmail.com>
Sent: 13 August 2018 09:06
To: users@zeppelin.apache.org
Cc: dev <de...@zeppelin.apache.org>
Subject: EXT: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

In that case, zeppelin should fail to create note.

Felix Cheung <fe...@hotmail.com>>于2018年8月13日周一 下午3:47写道:
Perhaps one concern is users having characters in note name that are invalid for file name/file path?


________________________________
From: Mohit Jaggi <mo...@gmail.com>>
Sent: Sunday, August 12, 2018 6:02 PM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Cc: dev
Subject: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

sounds like a good idea!

On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>> wrote:
Motivation

   The motivation of ZEPPELIN-2619 is to change the notes storage structure. Previously we store it using {noteId}/note.json, we’d like to change it into {note_name}_{note_id}.zpln. There are several reasons for this change.


  1.  {noteId}/note.json is not scalable. We put all notes in one root folder in flat structure. And when zeppelin server starts, we need to read all note.json to get the note file name and build the note folder structure (Because we need to get the note name which is stored in note.json to build the notebook menu). This would be a nightmare when you have large amounts of notes.
  2.  {noteId}/note.json is not maintainable. It is difficult for a developer/administrator to find note file based on note name.
  3.  {noteId}/note.json has no folder structure. Currently zeppelin have to build the folder structure internally in memory according note name which is a big overhead.

New Approach

   As I mentioned above, I propose to change the note storage structure to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g. folder_1/mynote_abcd.zpln

This kind of note storage structure could bring several benefits.

  1.  We don’t need to load all notes when zeppelin starts. We just need to list each folder to get the note name and note_id.
  2.  It is much maintainable so that it is easy to find the note file based on note name.
  3.  It has the folder structure already. That can be mapped to the note folder structure.

Side Effect

This approach only works for file system storage, so that means we have to drop support for MongoNotebookRepo. I think it is ok because I didn’t see any users talk about this in community, so I assume no one is using it.



This is overall design, welcome any comments and feedback. Thanks.



Here's the google docs, you can also comment it here.

https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing




Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jeff Zhang <zj...@gmail.com>.
In that case, zeppelin should fail to create note.

Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:

> Perhaps one concern is users having characters in note name that are
> invalid for file name/file path?
>
>
> ------------------------------
> *From:* Mohit Jaggi <mo...@gmail.com>
> *Sent:* Sunday, August 12, 2018 6:02 PM
> *To:* users@zeppelin.apache.org
> *Cc:* dev
> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
> of [NOTEID]/note.json
>
> sounds like a good idea!
>
> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> Motivation
>>
>>    The motivation of ZEPPELIN-2619 is to change the notes storage
>> structure. Previously we store it using {noteId}/note.json, we’d like to
>> change it into {note_name}_{note_id}.zpln. There are several reasons for
>> this change.
>>
>>
>>    1.
>>
>>    {noteId}/note.json is not scalable. We put all notes in one root
>>    folder in flat structure. And when zeppelin server starts, we need to read
>>    all note.json to get the note file name and build the note folder structure
>>    (Because we need to get the note name which is stored in note.json to build
>>    the notebook menu). This would be a nightmare when you have large amounts
>>    of notes.
>>    2.
>>
>>    {noteId}/note.json is not maintainable. It is difficult for a
>>    developer/administrator to find note file based on note name.
>>    3.
>>
>>    {noteId}/note.json has no folder structure. Currently zeppelin have
>>    to build the folder structure internally in memory according note name
>>    which is a big overhead.
>>
>>
>> New Approach
>>
>>    As I mentioned above, I propose to change the note storage structure
>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
>> folder_1/mynote_abcd.zpln
>>
>> This kind of note storage structure could bring several benefits.
>>
>>    1.
>>
>>    We don’t need to load all notes when zeppelin starts. We just need to
>>    list each folder to get the note name and note_id.
>>    2.
>>
>>    It is much maintainable so that it is easy to find the note file
>>    based on note name.
>>    3.
>>
>>    It has the folder structure already. That can be mapped to the note
>>    folder structure.
>>
>>
>> Side Effect
>>
>> This approach only works for file system storage, so that means we have
>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
>> see any users talk about this in community, so I assume no one is using it.
>>
>>
>> This is overall design, welcome any comments and feedback. Thanks.
>>
>>
>> Here's the google docs, you can also comment it here.
>>
>>
>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>>
>>
>>
>>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Jeff Zhang <zj...@gmail.com>.
In that case, zeppelin should fail to create note.

Felix Cheung <fe...@hotmail.com>于2018年8月13日周一 下午3:47写道:

> Perhaps one concern is users having characters in note name that are
> invalid for file name/file path?
>
>
> ------------------------------
> *From:* Mohit Jaggi <mo...@gmail.com>
> *Sent:* Sunday, August 12, 2018 6:02 PM
> *To:* users@zeppelin.apache.org
> *Cc:* dev
> *Subject:* Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead
> of [NOTEID]/note.json
>
> sounds like a good idea!
>
> On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> Motivation
>>
>>    The motivation of ZEPPELIN-2619 is to change the notes storage
>> structure. Previously we store it using {noteId}/note.json, we’d like to
>> change it into {note_name}_{note_id}.zpln. There are several reasons for
>> this change.
>>
>>
>>    1.
>>
>>    {noteId}/note.json is not scalable. We put all notes in one root
>>    folder in flat structure. And when zeppelin server starts, we need to read
>>    all note.json to get the note file name and build the note folder structure
>>    (Because we need to get the note name which is stored in note.json to build
>>    the notebook menu). This would be a nightmare when you have large amounts
>>    of notes.
>>    2.
>>
>>    {noteId}/note.json is not maintainable. It is difficult for a
>>    developer/administrator to find note file based on note name.
>>    3.
>>
>>    {noteId}/note.json has no folder structure. Currently zeppelin have
>>    to build the folder structure internally in memory according note name
>>    which is a big overhead.
>>
>>
>> New Approach
>>
>>    As I mentioned above, I propose to change the note storage structure
>> to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
>> folder_1/mynote_abcd.zpln
>>
>> This kind of note storage structure could bring several benefits.
>>
>>    1.
>>
>>    We don’t need to load all notes when zeppelin starts. We just need to
>>    list each folder to get the note name and note_id.
>>    2.
>>
>>    It is much maintainable so that it is easy to find the note file
>>    based on note name.
>>    3.
>>
>>    It has the folder structure already. That can be mapped to the note
>>    folder structure.
>>
>>
>> Side Effect
>>
>> This approach only works for file system storage, so that means we have
>> to drop support for MongoNotebookRepo. I think it is ok because I didn’t
>> see any users talk about this in community, so I assume no one is using it.
>>
>>
>> This is overall design, welcome any comments and feedback. Thanks.
>>
>>
>> Here's the google docs, you can also comment it here.
>>
>>
>> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>>
>>
>>
>>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Felix Cheung <fe...@hotmail.com>.
Perhaps one concern is users having characters in note name that are invalid for file name/file path?


________________________________
From: Mohit Jaggi <mo...@gmail.com>
Sent: Sunday, August 12, 2018 6:02 PM
To: users@zeppelin.apache.org
Cc: dev
Subject: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

sounds like a good idea!

On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>> wrote:
Motivation

   The motivation of ZEPPELIN-2619 is to change the notes storage structure. Previously we store it using {noteId}/note.json, we’d like to change it into {note_name}_{note_id}.zpln. There are several reasons for this change.


  1.  {noteId}/note.json is not scalable. We put all notes in one root folder in flat structure. And when zeppelin server starts, we need to read all note.json to get the note file name and build the note folder structure (Because we need to get the note name which is stored in note.json to build the notebook menu). This would be a nightmare when you have large amounts of notes.

  2.  {noteId}/note.json is not maintainable. It is difficult for a developer/administrator to find note file based on note name.

  3.  {noteId}/note.json has no folder structure. Currently zeppelin have to build the folder structure internally in memory according note name which is a big overhead.

New Approach

   As I mentioned above, I propose to change the note storage structure to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g. folder_1/mynote_abcd.zpln

This kind of note storage structure could bring several benefits.

  1.  We don’t need to load all notes when zeppelin starts. We just need to list each folder to get the note name and note_id.

  2.  It is much maintainable so that it is easy to find the note file based on note name.

  3.  It has the folder structure already. That can be mapped to the note folder structure.

Side Effect

This approach only works for file system storage, so that means we have to drop support for MongoNotebookRepo. I think it is ok because I didn’t see any users talk about this in community, so I assume no one is using it.


This is overall design, welcome any comments and feedback. Thanks.


Here's the google docs, you can also comment it here.

https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing



Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Felix Cheung <fe...@hotmail.com>.
Perhaps one concern is users having characters in note name that are invalid for file name/file path?


________________________________
From: Mohit Jaggi <mo...@gmail.com>
Sent: Sunday, August 12, 2018 6:02 PM
To: users@zeppelin.apache.org
Cc: dev
Subject: Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

sounds like a good idea!

On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com>> wrote:
Motivation

   The motivation of ZEPPELIN-2619 is to change the notes storage structure. Previously we store it using {noteId}/note.json, we’d like to change it into {note_name}_{note_id}.zpln. There are several reasons for this change.


  1.  {noteId}/note.json is not scalable. We put all notes in one root folder in flat structure. And when zeppelin server starts, we need to read all note.json to get the note file name and build the note folder structure (Because we need to get the note name which is stored in note.json to build the notebook menu). This would be a nightmare when you have large amounts of notes.

  2.  {noteId}/note.json is not maintainable. It is difficult for a developer/administrator to find note file based on note name.

  3.  {noteId}/note.json has no folder structure. Currently zeppelin have to build the folder structure internally in memory according note name which is a big overhead.

New Approach

   As I mentioned above, I propose to change the note storage structure to {note_name}_{note_id}.zpln.  note_name could contains folders, e.g. folder_1/mynote_abcd.zpln

This kind of note storage structure could bring several benefits.

  1.  We don’t need to load all notes when zeppelin starts. We just need to list each folder to get the note name and note_id.

  2.  It is much maintainable so that it is easy to find the note file based on note name.

  3.  It has the folder structure already. That can be mapped to the note folder structure.

Side Effect

This approach only works for file system storage, so that means we have to drop support for MongoNotebookRepo. I think it is ok because I didn’t see any users talk about this in community, so I assume no one is using it.


This is overall design, welcome any comments and feedback. Thanks.


Here's the google docs, you can also comment it here.

https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing



Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Mohit Jaggi <mo...@gmail.com>.
sounds like a good idea!

On Sun, Aug 12, 2018 at 5:34 PM Jeff Zhang <zj...@gmail.com> wrote:

> Motivation
>
>    The motivation of ZEPPELIN-2619 is to change the notes storage
> structure. Previously we store it using {noteId}/note.json, we’d like to
> change it into {note_name}_{note_id}.zpln. There are several reasons for
> this change.
>
>
>    1.
>
>    {noteId}/note.json is not scalable. We put all notes in one root
>    folder in flat structure. And when zeppelin server starts, we need to read
>    all note.json to get the note file name and build the note folder structure
>    (Because we need to get the note name which is stored in note.json to build
>    the notebook menu). This would be a nightmare when you have large amounts
>    of notes.
>    2.
>
>    {noteId}/note.json is not maintainable. It is difficult for a
>    developer/administrator to find note file based on note name.
>    3.
>
>    {noteId}/note.json has no folder structure. Currently zeppelin have to
>    build the folder structure internally in memory according note name which
>    is a big overhead.
>
>
> New Approach
>
>    As I mentioned above, I propose to change the note storage structure to
> {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
> folder_1/mynote_abcd.zpln
>
> This kind of note storage structure could bring several benefits.
>
>    1.
>
>    We don’t need to load all notes when zeppelin starts. We just need to
>    list each folder to get the note name and note_id.
>    2.
>
>    It is much maintainable so that it is easy to find the note file based
>    on note name.
>    3.
>
>    It has the folder structure already. That can be mapped to the note
>    folder structure.
>
>
> Side Effect
>
> This approach only works for file system storage, so that means we have to
> drop support for MongoNotebookRepo. I think it is ok because I didn’t see
> any users talk about this in community, so I assume no one is using it.
>
>
> This is overall design, welcome any comments and feedback. Thanks.
>
>
> Here's the google docs, you can also comment it here.
>
>
> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>
>
>
>

Re: [DISCUSS] ZEPPELIN-2619. Save note in [Title].zpln instead of [NOTEID]/note.json

Posted by Ricardo Martinelli de Oliveira <rm...@redhat.com>.
I totally agree with the change. As I created an image for Zeppelin, it's
been painful store the notebooks in GitHub with a directory named with some
random hash and with the note.json in it. Having <name>.zpln is even better
since I only need to store a file in the root of my repo and copy to the
right location.

With this scenario, I'd like to say this is an important feature to add in
Zeppelin. I'll be glad to help with anything, including code.

Em dom, 12 de ago de 2018 20:34, Jeff Zhang <zj...@gmail.com> escreveu:

> Motivation
>
>    The motivation of ZEPPELIN-2619 is to change the notes storage
> structure. Previously we store it using {noteId}/note.json, we’d like to
> change it into {note_name}_{note_id}.zpln. There are several reasons for
> this change.
>
>
>    1.
>
>    {noteId}/note.json is not scalable. We put all notes in one root
>    folder in flat structure. And when zeppelin server starts, we need to read
>    all note.json to get the note file name and build the note folder structure
>    (Because we need to get the note name which is stored in note.json to build
>    the notebook menu). This would be a nightmare when you have large amounts
>    of notes.
>    2.
>
>    {noteId}/note.json is not maintainable. It is difficult for a
>    developer/administrator to find note file based on note name.
>    3.
>
>    {noteId}/note.json has no folder structure. Currently zeppelin have to
>    build the folder structure internally in memory according note name which
>    is a big overhead.
>
>
> New Approach
>
>    As I mentioned above, I propose to change the note storage structure to
> {note_name}_{note_id}.zpln.  note_name could contains folders, e.g.
> folder_1/mynote_abcd.zpln
>
> This kind of note storage structure could bring several benefits.
>
>    1.
>
>    We don’t need to load all notes when zeppelin starts. We just need to
>    list each folder to get the note name and note_id.
>    2.
>
>    It is much maintainable so that it is easy to find the note file based
>    on note name.
>    3.
>
>    It has the folder structure already. That can be mapped to the note
>    folder structure.
>
>
> Side Effect
>
> This approach only works for file system storage, so that means we have to
> drop support for MongoNotebookRepo. I think it is ok because I didn’t see
> any users talk about this in community, so I assume no one is using it.
>
>
> This is overall design, welcome any comments and feedback. Thanks.
>
>
> Here's the google docs, you can also comment it here.
>
>
> https://docs.google.com/document/d/126egAQmhQOL4ynxJ3AQJQRBBLdW8TATYcGkDL1DNZoE/edit?usp=sharing
>
>
>
>