You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Francesco Mari (JIRA)" <ji...@apache.org> on 2018/10/17 16:35:00 UTC

[jira] [Commented] (OAK-7846) Add a tool to export the tree pointed to by a node record

    [ https://issues.apache.org/jira/browse/OAK-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653807#comment-16653807 ] 

Francesco Mari commented on OAK-7846:
-------------------------------------

The tool receives as input one or more IDs of node records and produces a representation of the subtrees rooted at those nodes.

I propose a line-based representation for the export that could be easily consumed in a streaming fashion. For example:

{noformat}
#
b
p TYPE NAME
v VALUE
c NAME
u
e
{noformat}

where
* {{#}} is the beginning of a comment. The rest of the line will be ignored. This is useful to add textual information that does not belong to the export but might still be useful for debugging purposes.
* {{b}} and {{e}} mark the beginning and the end of an export, respectively. Given that the tool might receive more than one node record ID as input, it might produce more than one export in a single stream.
* {{p}} represents a property for the current node. For each property a {{NAME}} and a {{TYPE}} are always provided.
* {{v}} represents a value for the current property. {{VALUE}} spans until the end of the line. More than one {{v}} line can be produced for multi-value properties. If {{VALUE}} contains a newline character, it has to be escaped to {{\n}}. It follows that any slash character will need to be escaped too.
* {{c}} is the beginning of a child node named {{NAME}}. When such a line is processed, the context of the following lines is supposed to be consumed in the context of this new node.
* {{u}} marks the end of the current node. When such a line is processed, the context will switch back to the one of the parent node. Every {{c}} line has a corresponding {{u}} line.

The format is designed in such a way that it can be consumed by a finite state automata processing one line at a time. This idea was heavily inspired by some work by [~ahanikel], which I hope he will contribute soon!

There are some alternatives to this proposal:
* reuse the JSON export like the {{export}} command does. I don't like it because the produced JSON is incorrect. Child node names and property names are conflated as keys in the same JSON export. Moreover, property types are encoded as part of the property values, which makes the import of such values non deterministic.
* use the CND export generated by the {{export}} command. That's simply not an adequate format.
* write the nodes directly into a Segment Store. An export is eventually going to be imported, so why not importing it directly? I think that having a text-based format that you can zip and send around is just too valuable to forgo, especially if the format is both lossless and easy to parse. Processing the format defined above and write the forest into a repository is quite trivial and should be the responsibility of yet another tool.

[~mduerig], [~dulceanu], [~ahanikel], what is your take on this?

> Add a tool to export the tree pointed to by a node record
> ---------------------------------------------------------
>
>                 Key: OAK-7846
>                 URL: https://issues.apache.org/jira/browse/OAK-7846
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>    Affects Versions: 1.10
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>            Priority: Major
>
> oak-segment-tar should have a tool that allows exporting a tree pointed to by a node record. The tool must be written in a way that plays along with existing Oak tools (see OAK-7834) and conventional UNIX ones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)