You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Ashutosh Mestry via Review Board <no...@reviews.apache.org> on 2020/03/02 18:57:16 UTC

Re: Review Request 71025: Import Service: Support Concurrent Ingest

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/
-----------------------------------------------------------

(Updated March 2, 2020, 6:57 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

Updates include:
- Reduces size of patch by breaking in to smaller implementations.


Bugs: ATLAS-3320
    https://issues.apache.org/jira/browse/ATLAS-3320


Repository: atlas


Description
-------

**Approach**
- Use existing producer-consumer (PC) framework.
- Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
- Add support for configuring number of workers and batch size within _AtlasImportRequest_.
- Existing import implementation continues to function as before. This is maintained for backward compatibility.
- New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
- The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.

_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25
    }
}
```
Support for ZipDirect format:
_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25,
        "format": "zipDirect",
        "migration": "true"
    }
}
```


**CURL**
```
curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
```


Diffs (updated)
-----

  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
  intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
  repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
  repository/src/main/java/org/apache/atlas/repository/impexp/AuditsWriter.java 55990f780 
  repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
  repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
  repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java fdf117a25 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/StatusReporter.java PRE-CREATION 


Diff: https://reviews.apache.org/r/71025/diff/7/

Changes: https://reviews.apache.org/r/71025/diff/6-7/


Testing
-------

**Unit tests**
Existing tests.

**Functional tests**
- Verified import for pre-1.0 and post-1.0 exported ZIP files.

**Pre-commit**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1292

**Volume tests**
- Measure performance with large data.

+----------+----------+----------+------------------------+
| File     | Before   | After    | Configuration          |
+----------+----------+----------+------------------------+
| smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
| (2.2 MB) |          |          |                        |
+----------+----------+----------+------------------------+
| largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
| (40 MB)  |          |          |                        |
+----------+----------+----------+------------------------+


Thanks,

Ashutosh Mestry


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/#review219749
-----------------------------------------------------------




repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java
Lines 384 (patched)
<https://reviews.apache.org/r/71025/#comment307948>

    can you avoid this null check? consider initializing 'entityChangeNotifier' to a no-op operation.


- Sarath Subramanian


On March 2, 2020, 9:13 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71025/
> -----------------------------------------------------------
> 
> (Updated March 2, 2020, 9:13 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3320
>     https://issues.apache.org/jira/browse/ATLAS-3320
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> - Use existing producer-consumer (PC) framework.
> - Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
> - Add support for configuring number of workers and batch size within _AtlasImportRequest_.
> - Existing import implementation continues to function as before. This is maintained for backward compatibility.
> - New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
> - The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.
> 
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25
>     }
> }
> ```
> Support for ZipDirect format:
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25,
>         "format": "zipDirect",
>         "migration": "true"
>     }
> }
> ```
> 
> 
> **CURL**
> ```
> curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
> ```
> 
> 
> Diffs
> -----
> 
>   graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
>   repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
>   repository/src/main/java/org/apache/atlas/repository/impexp/AuditsWriter.java 55990f780 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
>   repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java fdf117a25 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71025/diff/9/
> 
> 
> Testing
> -------
> 
> **Unit tests**
> Existing tests.
> 
> **Functional tests**
> - Verified import for pre-1.0 and post-1.0 exported ZIP files.
> 
> **Pre-commit**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1292
> 
> **Volume tests**
> - Measure performance with large data.
> 
> +----------+----------+----------+------------------------+
> | File     | Before   | After    | Configuration          |
> +----------+----------+----------+------------------------+
> | smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
> | (2.2 MB) |          |          |                        |
> +----------+----------+----------+------------------------+
> | largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
> | (40 MB)  |          |          |                        |
> +----------+----------+----------+------------------------+
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Nikhil Bonte <ni...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/#review219746
-----------------------------------------------------------


Ship it!




Ship It!

- Nikhil Bonte


On March 3, 2020, 5:13 a.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71025/
> -----------------------------------------------------------
> 
> (Updated March 3, 2020, 5:13 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3320
>     https://issues.apache.org/jira/browse/ATLAS-3320
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> - Use existing producer-consumer (PC) framework.
> - Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
> - Add support for configuring number of workers and batch size within _AtlasImportRequest_.
> - Existing import implementation continues to function as before. This is maintained for backward compatibility.
> - New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
> - The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.
> 
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25
>     }
> }
> ```
> Support for ZipDirect format:
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25,
>         "format": "zipDirect",
>         "migration": "true"
>     }
> }
> ```
> 
> 
> **CURL**
> ```
> curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
> ```
> 
> 
> Diffs
> -----
> 
>   graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
>   repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
>   repository/src/main/java/org/apache/atlas/repository/impexp/AuditsWriter.java 55990f780 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
>   repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java fdf117a25 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71025/diff/9/
> 
> 
> Testing
> -------
> 
> **Unit tests**
> Existing tests.
> 
> **Functional tests**
> - Verified import for pre-1.0 and post-1.0 exported ZIP files.
> 
> **Pre-commit**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1292
> 
> **Volume tests**
> - Measure performance with large data.
> 
> +----------+----------+----------+------------------------+
> | File     | Before   | After    | Configuration          |
> +----------+----------+----------+------------------------+
> | smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
> | (2.2 MB) |          |          |                        |
> +----------+----------+----------+------------------------+
> | largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
> | (40 MB)  |          |          |                        |
> +----------+----------+----------+------------------------+
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Nixon Rodrigues <ni...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/#review219732
-----------------------------------------------------------


Ship it!




Ship It!

- Nixon Rodrigues


On March 3, 2020, 5:13 a.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71025/
> -----------------------------------------------------------
> 
> (Updated March 3, 2020, 5:13 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3320
>     https://issues.apache.org/jira/browse/ATLAS-3320
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> - Use existing producer-consumer (PC) framework.
> - Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
> - Add support for configuring number of workers and batch size within _AtlasImportRequest_.
> - Existing import implementation continues to function as before. This is maintained for backward compatibility.
> - New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
> - The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.
> 
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25
>     }
> }
> ```
> Support for ZipDirect format:
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25,
>         "format": "zipDirect",
>         "migration": "true"
>     }
> }
> ```
> 
> 
> **CURL**
> ```
> curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
> ```
> 
> 
> Diffs
> -----
> 
>   graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
>   repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
>   repository/src/main/java/org/apache/atlas/repository/impexp/AuditsWriter.java 55990f780 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
>   repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java fdf117a25 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71025/diff/9/
> 
> 
> Testing
> -------
> 
> **Unit tests**
> Existing tests.
> 
> **Functional tests**
> - Verified import for pre-1.0 and post-1.0 exported ZIP files.
> 
> **Pre-commit**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1292
> 
> **Volume tests**
> - Measure performance with large data.
> 
> +----------+----------+----------+------------------------+
> | File     | Before   | After    | Configuration          |
> +----------+----------+----------+------------------------+
> | smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
> | (2.2 MB) |          |          |                        |
> +----------+----------+----------+------------------------+
> | largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
> | (40 MB)  |          |          |                        |
> +----------+----------+----------+------------------------+
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/#review219784
-----------------------------------------------------------




intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java
Line 114 (original), 117 (patched)
<https://reviews.apache.org/r/71025/#comment307980>

    nit: casting to String is not needed.



repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java
Line 56 (original), 56 (patched)
<https://reviews.apache.org/r/71025/#comment307982>

    add '@Override' annotation to methods overriding from interface.



repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java
Line 69 (original), 69 (patched)
<https://reviews.apache.org/r/71025/#comment307983>

    add '@Override' annotation to methods overriding from interface.



repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java
Line 73 (original), 64 (patched)
<https://reviews.apache.org/r/71025/#comment307981>

    ternary operation here is long and not intuitive. Consider refactoring to method:
    
    ImportStrategy importStrategy = initImportStrategy(importResult);


- Sarath Subramanian


On March 4, 2020, 10:09 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71025/
> -----------------------------------------------------------
> 
> (Updated March 4, 2020, 10:09 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3320
>     https://issues.apache.org/jira/browse/ATLAS-3320
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> - Use existing producer-consumer (PC) framework.
> - Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
> - Add support for configuring number of workers and batch size within _AtlasImportRequest_.
> - Existing import implementation continues to function as before. This is maintained for backward compatibility.
> - New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
> - The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.
> 
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25
>     }
> }
> ```
> Support for ZipDirect format:
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25,
>         "format": "zipDirect",
>         "migration": "true"
>     }
> }
> ```
> 
> 
> **CURL**
> ```
> curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
> ```
> 
> 
> Diffs
> -----
> 
>   graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
>   repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java 0f2b4bfae 
>   repository/src/main/java/org/apache/atlas/repository/graph/IFullTextMapper.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
>   repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java d7020a702 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/IAtlasEntityChangeNotifier.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/EntityChangeNotifierNop.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/FullTextMapperV2Nop.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/TestModules.java 06e0ebc6c 
> 
> 
> Diff: https://reviews.apache.org/r/71025/diff/12/
> 
> 
> Testing
> -------
> 
> **Unit tests**
> Existing tests.
> 
> **Functional tests**
> - Verified import for pre-1.0 and post-1.0 exported ZIP files.
> 
> **Pre-commit**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1712/
> 
> **Volume tests**
> - Measure performance with large data.
> 
> +----------+----------+----------+------------------------+
> | File     | Before   | After    | Configuration          |
> +----------+----------+----------+------------------------+
> | smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
> | (2.2 MB) |          |          |                        |
> +----------+----------+----------+------------------------+
> | largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
> | (40 MB)  |          |          |                        |
> +----------+----------+----------+------------------------+
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Ashutosh Mestry via Review Board <no...@reviews.apache.org>.

> On March 5, 2020, 9:30 a.m., Sarath Subramanian wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/IFullTextMapper.java
> > Lines 34 (patched)
> > <https://reviews.apache.org/r/71025/diff/12/?file=2212930#file2212930line34>
> >
> >     methods defined here looks more of like helper methods  than interface methods.

Since this is a drop-in for reduced impact, it needs to have same signature as the original concrete implementation. Changing this will involve refactoring original code. I can take it up after this commit.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/#review219785
-----------------------------------------------------------


On March 5, 2020, 5:43 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71025/
> -----------------------------------------------------------
> 
> (Updated March 5, 2020, 5:43 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3320
>     https://issues.apache.org/jira/browse/ATLAS-3320
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> - Use existing producer-consumer (PC) framework.
> - Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
> - Add support for configuring number of workers and batch size within _AtlasImportRequest_.
> - Existing import implementation continues to function as before. This is maintained for backward compatibility.
> - New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
> - The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.
> 
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25
>     }
> }
> ```
> Support for ZipDirect format:
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25,
>         "format": "zipDirect",
>         "migration": "true"
>     }
> }
> ```
> 
> 
> **CURL**
> ```
> curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
> ```
> 
> 
> Diffs
> -----
> 
>   graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
>   repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java 0f2b4bfae 
>   repository/src/main/java/org/apache/atlas/repository/graph/IFullTextMapper.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
>   repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java d7020a702 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/IAtlasEntityChangeNotifier.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/EntityChangeNotifierNop.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/FullTextMapperV2Nop.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/TestModules.java 06e0ebc6c 
> 
> 
> Diff: https://reviews.apache.org/r/71025/diff/13/
> 
> 
> Testing
> -------
> 
> **Unit tests**
> Existing tests.
> 
> **Functional tests**
> - Verified import for pre-1.0 and post-1.0 exported ZIP files.
> 
> **Pre-commit**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1712/
> 
> **Volume tests**
> - Measure performance with large data.
> 
> +----------+----------+----------+------------------------+
> | File     | Before   | After    | Configuration          |
> +----------+----------+----------+------------------------+
> | smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
> | (2.2 MB) |          |          |                        |
> +----------+----------+----------+------------------------+
> | largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
> | (40 MB)  |          |          |                        |
> +----------+----------+----------+------------------------+
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/#review219785
-----------------------------------------------------------




repository/src/main/java/org/apache/atlas/repository/graph/IFullTextMapper.java
Lines 34 (patched)
<https://reviews.apache.org/r/71025/#comment307984>

    methods defined here looks more of like helper methods  than interface methods.


- Sarath Subramanian


On March 4, 2020, 10:09 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71025/
> -----------------------------------------------------------
> 
> (Updated March 4, 2020, 10:09 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3320
>     https://issues.apache.org/jira/browse/ATLAS-3320
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> - Use existing producer-consumer (PC) framework.
> - Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
> - Add support for configuring number of workers and batch size within _AtlasImportRequest_.
> - Existing import implementation continues to function as before. This is maintained for backward compatibility.
> - New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
> - The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.
> 
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25
>     }
> }
> ```
> Support for ZipDirect format:
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25,
>         "format": "zipDirect",
>         "migration": "true"
>     }
> }
> ```
> 
> 
> **CURL**
> ```
> curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
> ```
> 
> 
> Diffs
> -----
> 
>   graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
>   repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java 0f2b4bfae 
>   repository/src/main/java/org/apache/atlas/repository/graph/IFullTextMapper.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
>   repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java d7020a702 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/IAtlasEntityChangeNotifier.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/EntityChangeNotifierNop.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/FullTextMapperV2Nop.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/TestModules.java 06e0ebc6c 
> 
> 
> Diff: https://reviews.apache.org/r/71025/diff/12/
> 
> 
> Testing
> -------
> 
> **Unit tests**
> Existing tests.
> 
> **Functional tests**
> - Verified import for pre-1.0 and post-1.0 exported ZIP files.
> 
> **Pre-commit**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1712/
> 
> **Volume tests**
> - Measure performance with large data.
> 
> +----------+----------+----------+------------------------+
> | File     | Before   | After    | Configuration          |
> +----------+----------+----------+------------------------+
> | smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
> | (2.2 MB) |          |          |                        |
> +----------+----------+----------+------------------------+
> | largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
> | (40 MB)  |          |          |                        |
> +----------+----------+----------+------------------------+
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/#review219859
-----------------------------------------------------------


Ship it!




Ship It!

- Sarath Subramanian


On March 5, 2020, 9:43 a.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71025/
> -----------------------------------------------------------
> 
> (Updated March 5, 2020, 9:43 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3320
>     https://issues.apache.org/jira/browse/ATLAS-3320
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> - Use existing producer-consumer (PC) framework.
> - Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
> - Add support for configuring number of workers and batch size within _AtlasImportRequest_.
> - Existing import implementation continues to function as before. This is maintained for backward compatibility.
> - New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
> - The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.
> 
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25
>     }
> }
> ```
> Support for ZipDirect format:
> _AtlasImportRequest_
> ```
> {
>     "options": {
>         "numWorkers": 8,
>         "batchSize": 25,
>         "format": "zipDirect",
>         "migration": "true"
>     }
> }
> ```
> 
> 
> **CURL**
> ```
> curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
> ```
> 
> 
> Diffs
> -----
> 
>   graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
>   repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
>   repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java 0f2b4bfae 
>   repository/src/main/java/org/apache/atlas/repository/graph/IFullTextMapper.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
>   repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
>   repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java d7020a702 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java fdf117a25 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/IAtlasEntityChangeNotifier.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/EntityChangeNotifierNop.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/FullTextMapperV2Nop.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
>   repository/src/test/java/org/apache/atlas/TestModules.java 06e0ebc6c 
> 
> 
> Diff: https://reviews.apache.org/r/71025/diff/14/
> 
> 
> Testing
> -------
> 
> **Unit tests**
> Existing tests.
> 
> **Functional tests**
> - Verified import for pre-1.0 and post-1.0 exported ZIP files.
> 
> **Pre-commit**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1712/
> 
> **Volume tests**
> - Measure performance with large data.
> 
> +----------+----------+----------+------------------------+
> | File     | Before   | After    | Configuration          |
> +----------+----------+----------+------------------------+
> | smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
> | (2.2 MB) |          |          |                        |
> +----------+----------+----------+------------------------+
> | largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
> | (40 MB)  |          |          |                        |
> +----------+----------+----------+------------------------+
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Ashutosh Mestry via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/
-----------------------------------------------------------

(Updated March 5, 2020, 5:43 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

Updates include: 
- Addressed review comments.


Bugs: ATLAS-3320
    https://issues.apache.org/jira/browse/ATLAS-3320


Repository: atlas


Description
-------

**Approach**
- Use existing producer-consumer (PC) framework.
- Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
- Add support for configuring number of workers and batch size within _AtlasImportRequest_.
- Existing import implementation continues to function as before. This is maintained for backward compatibility.
- New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
- The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.

_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25
    }
}
```
Support for ZipDirect format:
_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25,
        "format": "zipDirect",
        "migration": "true"
    }
}
```


**CURL**
```
curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
```


Diffs (updated)
-----

  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
  intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
  repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java 0f2b4bfae 
  repository/src/main/java/org/apache/atlas/repository/graph/IFullTextMapper.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
  repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
  repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
  repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java d7020a702 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/IAtlasEntityChangeNotifier.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/EntityChangeNotifierNop.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/FullTextMapperV2Nop.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/TestModules.java 06e0ebc6c 


Diff: https://reviews.apache.org/r/71025/diff/13/

Changes: https://reviews.apache.org/r/71025/diff/12-13/


Testing
-------

**Unit tests**
Existing tests.

**Functional tests**
- Verified import for pre-1.0 and post-1.0 exported ZIP files.

**Pre-commit**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1712/

**Volume tests**
- Measure performance with large data.

+----------+----------+----------+------------------------+
| File     | Before   | After    | Configuration          |
+----------+----------+----------+------------------------+
| smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
| (2.2 MB) |          |          |                        |
+----------+----------+----------+------------------------+
| largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
| (40 MB)  |          |          |                        |
+----------+----------+----------+------------------------+


Thanks,

Ashutosh Mestry


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Ashutosh Mestry via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/
-----------------------------------------------------------

(Updated March 5, 2020, 6:09 a.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

Updates include: 
- Found fix for failing UT.


Bugs: ATLAS-3320
    https://issues.apache.org/jira/browse/ATLAS-3320


Repository: atlas


Description
-------

**Approach**
- Use existing producer-consumer (PC) framework.
- Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
- Add support for configuring number of workers and batch size within _AtlasImportRequest_.
- Existing import implementation continues to function as before. This is maintained for backward compatibility.
- New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
- The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.

_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25
    }
}
```
Support for ZipDirect format:
_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25,
        "format": "zipDirect",
        "migration": "true"
    }
}
```


**CURL**
```
curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
```


Diffs (updated)
-----

  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
  intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
  repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
  repository/src/main/java/org/apache/atlas/repository/graph/FullTextMapperV2.java 0f2b4bfae 
  repository/src/main/java/org/apache/atlas/repository/graph/IFullTextMapper.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
  repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
  repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
  repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java d7020a702 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/IAtlasEntityChangeNotifier.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/EntityChangeNotifierNop.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/FullTextMapperV2Nop.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 
  repository/src/test/java/org/apache/atlas/TestModules.java 06e0ebc6c 


Diff: https://reviews.apache.org/r/71025/diff/12/

Changes: https://reviews.apache.org/r/71025/diff/11-12/


Testing (updated)
-------

**Unit tests**
Existing tests.

**Functional tests**
- Verified import for pre-1.0 and post-1.0 exported ZIP files.

**Pre-commit**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1712/

**Volume tests**
- Measure performance with large data.

+----------+----------+----------+------------------------+
| File     | Before   | After    | Configuration          |
+----------+----------+----------+------------------------+
| smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
| (2.2 MB) |          |          |                        |
+----------+----------+----------+------------------------+
| largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
| (40 MB)  |          |          |                        |
+----------+----------+----------+------------------------+


Thanks,

Ashutosh Mestry


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Ashutosh Mestry via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/
-----------------------------------------------------------

(Updated March 4, 2020, 6:30 a.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

Updates include:
- Addressed review comments.


Bugs: ATLAS-3320
    https://issues.apache.org/jira/browse/ATLAS-3320


Repository: atlas


Description
-------

**Approach**
- Use existing producer-consumer (PC) framework.
- Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
- Add support for configuring number of workers and batch size within _AtlasImportRequest_.
- Existing import implementation continues to function as before. This is maintained for backward compatibility.
- New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
- The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.

_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25
    }
}
```
Support for ZipDirect format:
_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25,
        "format": "zipDirect",
        "migration": "true"
    }
}
```


**CURL**
```
curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
```


Diffs (updated)
-----

  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
  intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
  repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
  repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
  repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
  repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
  repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java fdf117a25 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/EntityChangeNotifierNop.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/FullTextMapperV2Nop.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 


Diff: https://reviews.apache.org/r/71025/diff/10/

Changes: https://reviews.apache.org/r/71025/diff/9-10/


Testing
-------

**Unit tests**
Existing tests.

**Functional tests**
- Verified import for pre-1.0 and post-1.0 exported ZIP files.

**Pre-commit**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1292

**Volume tests**
- Measure performance with large data.

+----------+----------+----------+------------------------+
| File     | Before   | After    | Configuration          |
+----------+----------+----------+------------------------+
| smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
| (2.2 MB) |          |          |                        |
+----------+----------+----------+------------------------+
| largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
| (40 MB)  |          |          |                        |
+----------+----------+----------+------------------------+


Thanks,

Ashutosh Mestry


Re: Review Request 71025: Import Service: Support Concurrent Ingest

Posted by Ashutosh Mestry via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71025/
-----------------------------------------------------------

(Updated March 3, 2020, 5:13 a.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

Updates include:
- Modified approach for getting zip file size during migraiton import.


Bugs: ATLAS-3320
    https://issues.apache.org/jira/browse/ATLAS-3320


Repository: atlas


Description
-------

**Approach**
- Use existing producer-consumer (PC) framework.
- Modify _BulkImporterImpl_ to implement _WorkItemConsumer_.
- Add support for configuring number of workers and batch size within _AtlasImportRequest_.
- Existing import implementation continues to function as before. This is maintained for backward compatibility.
- New implementation supports additional more memory efficient zip format (_ZipDirect_). This drastically reduces memory requirement during import.
- The new import strategy, _MigrationImport_ uses the _bulkLoading_ mode of _JanusGraph_ thereby achieving high ingest rates.

_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25
    }
}
```
Support for ZipDirect format:
_AtlasImportRequest_
```
{
    "options": {
        "numWorkers": 8,
        "batchSize": 25,
        "format": "zipDirect",
        "migration": "true"
    }
}
```


**CURL**
```
curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F request=@./import-options.json -F data=@./Default-3-pre.zip http://localhost:21000/api/atlas/admin/import
```


Diffs (updated)
-----

  graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraph.java 4acb371f1 
  intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 3362bf158 
  repository/src/main/java/org/apache/atlas/GraphTransactionInterceptor.java bbe0dc5ba 
  repository/src/main/java/org/apache/atlas/repository/impexp/AuditsWriter.java 55990f780 
  repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 1964ade9a 
  repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceDirect.java cb5a7acd0 
  repository/src/main/java/org/apache/atlas/repository/migration/ZipFileMigrationImporter.java f552525a4 
  repository/src/main/java/org/apache/atlas/repository/store/graph/AtlasEntityStore.java 39ea3f82e 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java 30f5e5a7c 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasRelationshipStoreV2.java fdf117a25 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/BulkImporterImpl.java 54c32c5e8 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 2f3aad06b 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/ImportStrategy.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/RegularImport.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumer.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityConsumerBuilder.java PRE-CREATION 
  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/pc/EntityCreationManager.java PRE-CREATION 


Diff: https://reviews.apache.org/r/71025/diff/9/

Changes: https://reviews.apache.org/r/71025/diff/8-9/


Testing
-------

**Unit tests**
Existing tests.

**Functional tests**
- Verified import for pre-1.0 and post-1.0 exported ZIP files.

**Pre-commit**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1292

**Volume tests**
- Measure performance with large data.

+----------+----------+----------+------------------------+
| File     | Before   | After    | Configuration          |
+----------+----------+----------+------------------------+
| smalldb  |   6 min  |    2 min | Shards: 4, Threads: 8  |
| (2.2 MB) |          |          |                        |
+----------+----------+----------+------------------------+
| largedb  |    3 hrs |  10 mins | Shards: 4, Threads: 16 |
| (40 MB)  |          |          |                        |
+----------+----------+----------+------------------------+


Thanks,

Ashutosh Mestry