You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@usergrid.apache.org by "David Johnson (JIRA)" <ji...@apache.org> on 2015/07/15 13:56:04 UTC

[jira] [Updated] (USERGRID-788) New ExportApp tool

     [ https://issues.apache.org/jira/browse/USERGRID-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Johnson updated USERGRID-788:
-----------------------------------
    Description: 
Provide new tool that exports an app:
1. Writes two types of files: entities and connections
2. Each line of output files will be one complete JSON object
3. Use RxJava for multithreading of reads and of writes

  was:
The idea is to use multiple files to make the export-app tool (in the MigrationTool branch) export run faster and to support entities with a huge number of connections. Here are some questions to consider and a proposal.

h3. Should application be saved as multiple files?

One advantage of saving to multiple files is that we can use multiple threads to write the files and that will make the export faster.  For example, we could start a thread to write out each collection of an app as it's own file, or set of files.

h3. Should each collection be saved as multiple files?

Each collection must be written out serially if we want to preserve order. If that is the case, then saving each collection to multiple files won't help much there.

h3. Should connections be separated out from entities in collections?

Currently, we write an entities connections right into the entity itself inside. This will be a problem if we have entities with a huge number of connections, it will cause entity size to bloat and could cause an import program to fail.  Connections should be stored in a separate file.

h3. Should result be one large file for the sake of convenience?

We could concatenate the multiple files together, or use tar and gzip them at the end of the process.

h3. Multiple files proposal

1. Each collection will be written out to a set of files named like this:

   {{<orgname>_<appname>_<collname>_collection_N.json}}


2. For each collection, outgoing connections will be written to a set of files named like this:

   {{<orgname>_<appname>_<collname>_connections.N.json}}


Each connection will be a JSON object with fields: 

   {{source, sourceType, target, targetType, targetType}}


3. A command-line parameter specifies max size of each output file.

4. Implementation should use a thread for each collection of an application. Currently, we have only one write thread which limits our throughput.

5. Implementation should use RxJava as we do in 2.0 to simplify the multi-threading and eliminate the use of blocking queues.



        Summary: New ExportApp tool  (was: Better use of multithreading via RxJava in ExportApp tool)

> New ExportApp tool
> ------------------
>
>                 Key: USERGRID-788
>                 URL: https://issues.apache.org/jira/browse/USERGRID-788
>             Project: Usergrid
>          Issue Type: Story
>            Reporter: David Johnson
>            Assignee: David Johnson
>
> Provide new tool that exports an app:
> 1. Writes two types of files: entities and connections
> 2. Each line of output files will be one complete JSON object
> 3. Use RxJava for multithreading of reads and of writes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)