You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@usergrid.apache.org by "David Johnson (JIRA)" <ji...@apache.org> on 2015/07/08 17:55:04 UTC

[jira] [Updated] (USERGRID-788) Use multiple output files in Migration/export tool

     [ https://issues.apache.org/jira/browse/USERGRID-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Johnson updated USERGRID-788:
-----------------------------------
    Description: 
The idea is to use multiple files to make the Migration tool export run faster and to support entities with a huge number of connections. 

Here are some questions to consider:

h3. Should application be saved as multiple files?

One advantage of saving to multiple files is that we can use multiple threads to write the files and that will make the export faster.  For example, we could start a thread to write out each collection of an app as it's own file, or set of files.

h3. Should each collection be saved as multiple files?

Each collection must be written out serially if we want to preserve order. If that is the case, then saving each collection to multiple files won't help much there.

h3. Should connections be separated out from entities in collections?

Currently, we write an entities connections right into the entity itself inside. This will be a problem if we have entities with a huge number of connections, it will cause entity size to bloat and could cause an import program to fail.  Connections should be stored in a separate file.

h3. Multiple files proposal

1. Each collection will be written out to a set of files named like this:
{{monospaced}}
   <orgname>_<appname>_<collname>_collection_N.json
{{monospaced}}

2. For each collection, outgoing connections will be written to a set of files named like this:
{{monospaced}}
   <orgname>_<appname>_<collname>_connections.N.json
{{monospaced}}

Each connection will be a JSON object with fields: 
{{monospaced}}
   source, sourceType, target, targetType, targetType
{{monospaced}}

3. A command-line parameter specifies max size of each output file.

4. Implementation should use a thread for each collection of an application. Currently, we have only one write thread which limits our throughput.



  was:
The idea is to use multiple files to make the Migration tool export run faster and to support entities with a huge number of connections. 

Here are some questions to consider:

Should application be saved as multiple files?

One advantage of saving to multiple files is that we can use multiple threads to write the files and that will make the export faster.  For example, we could start a thread to write out each collection of an app as it's own file, or set of files.

Should each collection be saved as multiple files?

Each collection must be written out serially if we want to preserve order. If that is the case, then saving each collection to multiple files won't help much there.

Should connections be separated out from entities in collections?

Currently, we write an entities connections right into the entity itself inside. This will be a problem if we have entities with a huge number of connections, it will cause entity size to bloat and could cause an import program to fail.  Connections should be stored in a separate file.

Multiple files proposal

1. Each collection will be written out to a set of files named like this:

   <orgname>_<appname>_<collname>_collection_N.json

2. For each collection, outgoing connections will be written to a set of files named like this:

   <orgname>_<appname>_<collname>_connections.N.json

Each connection will be a JSON object with fields: 

   source, sourceType, target, targetType, targetType

3. A command-line parameter specifies max size of each output file.

4. Implementation should use a thread for each collection of an application. Currently, we have only one write thread which limits our throughput.




> Use multiple output files in Migration/export tool
> --------------------------------------------------
>
>                 Key: USERGRID-788
>                 URL: https://issues.apache.org/jira/browse/USERGRID-788
>             Project: Usergrid
>          Issue Type: Story
>            Reporter: David Johnson
>
> The idea is to use multiple files to make the Migration tool export run faster and to support entities with a huge number of connections. 
> Here are some questions to consider:
> h3. Should application be saved as multiple files?
> One advantage of saving to multiple files is that we can use multiple threads to write the files and that will make the export faster.  For example, we could start a thread to write out each collection of an app as it's own file, or set of files.
> h3. Should each collection be saved as multiple files?
> Each collection must be written out serially if we want to preserve order. If that is the case, then saving each collection to multiple files won't help much there.
> h3. Should connections be separated out from entities in collections?
> Currently, we write an entities connections right into the entity itself inside. This will be a problem if we have entities with a huge number of connections, it will cause entity size to bloat and could cause an import program to fail.  Connections should be stored in a separate file.
> h3. Multiple files proposal
> 1. Each collection will be written out to a set of files named like this:
> {{monospaced}}
>    <orgname>_<appname>_<collname>_collection_N.json
> {{monospaced}}
> 2. For each collection, outgoing connections will be written to a set of files named like this:
> {{monospaced}}
>    <orgname>_<appname>_<collname>_connections.N.json
> {{monospaced}}
> Each connection will be a JSON object with fields: 
> {{monospaced}}
>    source, sourceType, target, targetType, targetType
> {{monospaced}}
> 3. A command-line parameter specifies max size of each output file.
> 4. Implementation should use a thread for each collection of an application. Currently, we have only one write thread which limits our throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)