You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2017/08/11 10:33:00 UTC

[jira] [Comment Edited] (OAK-6545) Tooling to serialize NodeState as json along with blobs

    [ https://issues.apache.org/jira/browse/OAK-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123155#comment-16123155 ] 

Chetan Mehrotra edited comment on OAK-6545 at 8/11/17 10:32 AM:
----------------------------------------------------------------

Done the implemenation in 1804763- 1804770

Implementation has following support

* Supports exporting NodeState in json and cnd format
* Export can be done via explicit {{export}} command and a groovy console command
* Support serializing blobs in FileDataStore storage i.e. blobs would be stored in a local FDS
* Blob serialization can skip problamatic binaries by writing a marker blobId. Such blobs would fail on deserialize and marked as "*ERROR*-<blob id>" in the serialized form
* json is written in a streaming way so supports serializing large tree

*Export Command*

Refer to [Oak Run NodeStore Connection|https://jackrabbit.apache.org/oak/docs/features/oak-run-nodestore-connection-options.html] for details on how to connect to various NodeStore and BlobStore

{noformat}
$ java -jar oak-run-*.jar export -p /path/in/repo /path/of/segmentstore -o /path/of/output/dir
$ java -jar oak-run-*.jar export -h
Exports NodeState as json                                 


The export command supports exporting nodes from a repository in json. It also provide options to export the blobs
  which are stored in FileDataStore format                                                                        

Option                           Description                                                                       
------                           -----------                                                                       
-b, --blobs [Boolean]            Export blobs also. By default blobs are not exported (default: false)             
-d, --depth [Integer]            Max depth to include in output (default: 2147483647)                              
-f, --filter <String>            Filter expression as json to filter out which nodes and properties are included in
                                   exported file (default: {"properties":["*", "-:childOrder"]})                   
--filter-file <File>             Filter file which contains the filter json expression                             
--format <String>                Export format 'json' or 'txt' (default: json)                                     
-n, --max-child-nodes [Integer]  Maximum number of child nodes to include for a any parent (default: 2147483647)   
-o, --out <File>                 Output directory where the exported json and blobs are stored (default: .)        
-p, --path <String>              Repository path to export (default: /)                                            
--pretty [Boolean]               Pretty print the json output (default: true)   
{noformat}

*Export in Groovy Console*
{noformat}
$  java -jar oak-run-*.jar console /path/of/segmentstore
Apache Jackrabbit Oak 1.8-SNAPSHOT
Repository connected in read-only mode. Use '--read-write' for write operations
Jackrabbit Oak Shell (Apache Jackrabbit Oak 1.8-SNAPSHOT, JVM: 1.8.0_66)
Type ':help' or ':h' for help.
----------------------------------------------------------------------------------------------------------------------------
/> cd /var/reports
/var/reports> export -c
{
 "jcr:primaryType": "nam:sling:Folder",
 "jcr:mixinTypes": [
  "nam:rep:AccessControllable"
 ],
 "jcr:createdBy": "admin",
 "jcr:created": "dat:2017-01-26T08:02:24.122+05:30",
 "rep:policy": {
  "jcr:primaryType": "nam:rep:ACL",
  "allow": {
   "jcr:primaryType": "nam:rep:GrantACE",
   "rep:principalName": "snapshotservice",
   "rep:privileges": [
    "nam:jcr:read",
    "nam:rep:write"
   ]
  }
 }
}
/var/reports> export -h
usage: export-nodes [-h] [-p <repo_path_to_export>] [-o <dir_name>]
Export nodes and its children as json
 -b,--blobs                   Serialize blob contents also
 -c,--console                 Output to console
 -d,--depth <arg>             Maximum tree depth to write out. Default to
                              all
 -f,--filter <arg>            Filter for nodes and properties to include
                              in json format. Default {"properties":["*",
                              "-:childOrder"]}
 -h,--help                    Print usage
 -n,--max-child-nodes <arg>   maximum number of child nodes to include
 -o,--out <out>               Directory name to store json and blobs
                              (default: .)
 -p,--path <path>             Repository path to export (default: current
                              node)

{noformat}


was (Author: chetanm):
Done the implemenation in 1804763- 1804770

Implementation has following support

* Supports exporting NodeState in json and cnd format
* Export can be done via explicit {{export}} command and a groovy console command
* Support serializing blobs in FileDataStore storage i.e. blobs would be stored in a local FDS
* Blob serialization can skip problamatic binaries by writing a marker blobId. Such blobs would fail on deserialize and marked as "*ERROR*-<blob id>" in the serialized form
* json is written in a streaming way so supports serializing large tree

*Export Command*

{noformat}
$ java -jar oak-run-*.jar export -p /path/in/repo /path/of/segmentstore -o /path/of/output/dir
$ java -jar oak-run-*.jar export -h
Exports NodeState as json                                 


The export command supports exporting nodes from a repository in json. It also provide options to export the blobs
  which are stored in FileDataStore format                                                                        

Option                           Description                                                                       
------                           -----------                                                                       
-b, --blobs [Boolean]            Export blobs also. By default blobs are not exported (default: false)             
-d, --depth [Integer]            Max depth to include in output (default: 2147483647)                              
-f, --filter <String>            Filter expression as json to filter out which nodes and properties are included in
                                   exported file (default: {"properties":["*", "-:childOrder"]})                   
--filter-file <File>             Filter file which contains the filter json expression                             
--format <String>                Export format 'json' or 'txt' (default: json)                                     
-n, --max-child-nodes [Integer]  Maximum number of child nodes to include for a any parent (default: 2147483647)   
-o, --out <File>                 Output directory where the exported json and blobs are stored (default: .)        
-p, --path <String>              Repository path to export (default: /)                                            
--pretty [Boolean]               Pretty print the json output (default: true)   
{noformat}

*Export in Groovy Console*
{noformat}
$  java -jar oak-run-*.jar console /path/of/segmentstore
Apache Jackrabbit Oak 1.8-SNAPSHOT
Repository connected in read-only mode. Use '--read-write' for write operations
Jackrabbit Oak Shell (Apache Jackrabbit Oak 1.8-SNAPSHOT, JVM: 1.8.0_66)
Type ':help' or ':h' for help.
----------------------------------------------------------------------------------------------------------------------------
/> cd /var/reports
/var/reports> export -c
{
 "jcr:primaryType": "nam:sling:Folder",
 "jcr:mixinTypes": [
  "nam:rep:AccessControllable"
 ],
 "jcr:createdBy": "admin",
 "jcr:created": "dat:2017-01-26T08:02:24.122+05:30",
 "rep:policy": {
  "jcr:primaryType": "nam:rep:ACL",
  "allow": {
   "jcr:primaryType": "nam:rep:GrantACE",
   "rep:principalName": "snapshotservice",
   "rep:privileges": [
    "nam:jcr:read",
    "nam:rep:write"
   ]
  }
 }
}
/var/reports> export -h
usage: export-nodes [-h] [-p <repo_path_to_export>] [-o <dir_name>]
Export nodes and its children as json
 -b,--blobs                   Serialize blob contents also
 -c,--console                 Output to console
 -d,--depth <arg>             Maximum tree depth to write out. Default to
                              all
 -f,--filter <arg>            Filter for nodes and properties to include
                              in json format. Default {"properties":["*",
                              "-:childOrder"]}
 -h,--help                    Print usage
 -n,--max-child-nodes <arg>   maximum number of child nodes to include
 -o,--out <out>               Directory name to store json and blobs
                              (default: .)
 -p,--path <path>             Repository path to export (default: current
                              node)

{noformat}

> Tooling to serialize NodeState as json along with blobs
> -------------------------------------------------------
>
>                 Key: OAK-6545
>                 URL: https://issues.apache.org/jira/browse/OAK-6545
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>
> For debugging certain cases like OAK-6525 we need a way to analyze the hidden NodeState structure used by indexes. To simplify the effort I would like to add some tooling to oak-run which allows dumping the NodeState and its children for certain path along with the blob contents



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)