You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@usergrid.apache.org by "David Johnson (JIRA)" <ji...@apache.org> on 2015/07/14 22:59:04 UTC

[jira] [Updated] (USERGRID-788) Better use of multithreading via RxJava in ExportApp tool

     [ https://issues.apache.org/jira/browse/USERGRID-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Johnson updated USERGRID-788:
-----------------------------------
    Assignee: David Johnson
     Summary: Better use of multithreading via RxJava in ExportApp tool  (was: Use multiple output files in Migration/export tool)

> Better use of multithreading via RxJava in ExportApp tool
> ---------------------------------------------------------
>
>                 Key: USERGRID-788
>                 URL: https://issues.apache.org/jira/browse/USERGRID-788
>             Project: Usergrid
>          Issue Type: Story
>            Reporter: David Johnson
>            Assignee: David Johnson
>
> The idea is to use multiple files to make the Migration tool export run faster and to support entities with a huge number of connections. Here are some questions to consider and a proposal.
> h3. Should application be saved as multiple files?
> One advantage of saving to multiple files is that we can use multiple threads to write the files and that will make the export faster.  For example, we could start a thread to write out each collection of an app as it's own file, or set of files.
> h3. Should each collection be saved as multiple files?
> Each collection must be written out serially if we want to preserve order. If that is the case, then saving each collection to multiple files won't help much there.
> h3. Should connections be separated out from entities in collections?
> Currently, we write an entities connections right into the entity itself inside. This will be a problem if we have entities with a huge number of connections, it will cause entity size to bloat and could cause an import program to fail.  Connections should be stored in a separate file.
> h3. Should result be one large file for the sake of convenience?
> We could concatenate the multiple files together, or use tar and gzip them at the end of the process.
> h3. Multiple files proposal
> 1. Each collection will be written out to a set of files named like this:
>    {{<orgname>_<appname>_<collname>_collection_N.json}}
> 2. For each collection, outgoing connections will be written to a set of files named like this:
>    {{<orgname>_<appname>_<collname>_connections.N.json}}
> Each connection will be a JSON object with fields: 
>    {{source, sourceType, target, targetType, targetType}}
> 3. A command-line parameter specifies max size of each output file.
> 4. Implementation should use a thread for each collection of an application. Currently, we have only one write thread which limits our throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)