You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Ashutosh Mestry <am...@hortonworks.com> on 2018/03/20 23:14:01 UTC

Review Request 66184: Migration Utility: Branch 0.8: Performance Improvement

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66184/
-----------------------------------------------------------

Review request for atlas, Madhan Neethiraj, Ruchi Solani, and Sarath Subramanian.


Bugs: ATLAS-2461
    https://issues.apache.org/jira/browse/ATLAS-2461


Repository: atlas


Description
-------

**Background** 
The migration utility committed ealier has couple of short comings:
- Relies on Export service.
  - Needs _export-options.json_ to be specified.
  - Exporting everything means meticuloulsy updating the options file. It is likely some specification is missed and hence will lead to less data being migrated. 
- Suffers from performance problems for large data sets.

**Approach**
The new approach uses _Titan's_ _GraphSON_ writer. This is configured to export all data in _EXTENDED_ format.

The _EXTENDED_ format separates _vertices_ and _edges_. This open other interesting avenues for import.

**Implementation**
- Modified _Exporter_ to use _AtlasTypeRegistry_ and _GraphSONWriter_.
- Produced files: 
   - _atlas-typedef.json_: Contains type definitions of all types.
   - _atlas-migration-data.json_: Contains data from the database.


Diffs
-----

  tools/atlas-migration-exporter/pom.xml 5c6c61ee 
  tools/atlas-migration-exporter/src/main/java/org/apache/atlas/migration/Exporter.java a9873df0 


Diff: https://reviews.apache.org/r/66184/diff/1/


Testing
-------

**Functional tests**
Export from repositories with:
- Custom types.
- Complex lineages.
- Created hive entities via beeline.
- Imported data.

**Gremlin Shell**
- Used _Gremlin_ shell to perform export operation.


Thanks,

Ashutosh Mestry


Re: Review Request 66184: Migration Utility: Branch 0.8: Performance Improvement

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66184/#review199992
-----------------------------------------------------------


Ship it!




Ship It!

- Madhan Neethiraj


On March 26, 2018, 7:38 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66184/
> -----------------------------------------------------------
> 
> (Updated March 26, 2018, 7:38 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Ruchi Solani, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-2461
>     https://issues.apache.org/jira/browse/ATLAS-2461
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Background** 
> The migration utility committed ealier has couple of short comings:
> - Relies on Export service.
>   - Needs _export-options.json_ to be specified.
>   - Exporting everything means meticuloulsy updating the options file. It is likely some specification is missed and hence will lead to less data being migrated. 
> - Suffers from performance problems for large data sets.
> 
> **Approach**
> The new approach uses _Titan's_ _GraphSON_ writer. This is configured to export all data in _EXTENDED_ format.
> 
> The _EXTENDED_ format separates _vertices_ and _edges_. This open other interesting avenues for import.
> 
> **Implementation**
> - Modified _Exporter_ to use _AtlasTypeRegistry_ and _GraphSONWriter_.
> - Produced files: 
>    - _atlas-typedef.json_: Contains type definitions of all types.
>    - _atlas-migration-data.json_: Contains data from the database.
> 
> 
> Diffs
> -----
> 
>   distro/src/main/assemblies/migration-exporter.xml 8f751ff9 
>   tools/atlas-migration-exporter/pom.xml 5c6c61ee 
>   tools/atlas-migration-exporter/src/main/java/org/apache/atlas/migration/Exporter.java a9873df0 
>   tools/atlas-migration-exporter/src/main/resources/README 2f2bf3e1 
>   tools/atlas-migration-exporter/src/main/resources/atlas-log4j.xml PRE-CREATION 
>   tools/atlas-migration-exporter/src/main/resources/atlas_migration.py 199cde28 
>   tools/atlas-migration-exporter/src/main/resources/migration-export-request.json 64002aff 
> 
> 
> Diff: https://reviews.apache.org/r/66184/diff/3/
> 
> 
> Testing
> -------
> 
> **Functional tests**
> Export from repositories with:
> - Custom types.
> - Complex lineages.
> - Created hive entities via beeline.
> - Imported data.
> 
> **Gremlin Shell**
> - Used _Gremlin_ shell to perform export operation.
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 66184: Migration Utility: Branch 0.8: Performance Improvement

Posted by Ashutosh Mestry <am...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66184/
-----------------------------------------------------------

(Updated March 26, 2018, 7:38 p.m.)


Review request for atlas, Madhan Neethiraj, Ruchi Solani, and Sarath Subramanian.


Changes
-------

Updates include: 
- Separate log4j xml for migration log configuration.
- Updated atlas_migration.py to display log display on screen.
- Minor changes to README.


Bugs: ATLAS-2461
    https://issues.apache.org/jira/browse/ATLAS-2461


Repository: atlas


Description
-------

**Background** 
The migration utility committed ealier has couple of short comings:
- Relies on Export service.
  - Needs _export-options.json_ to be specified.
  - Exporting everything means meticuloulsy updating the options file. It is likely some specification is missed and hence will lead to less data being migrated. 
- Suffers from performance problems for large data sets.

**Approach**
The new approach uses _Titan's_ _GraphSON_ writer. This is configured to export all data in _EXTENDED_ format.

The _EXTENDED_ format separates _vertices_ and _edges_. This open other interesting avenues for import.

**Implementation**
- Modified _Exporter_ to use _AtlasTypeRegistry_ and _GraphSONWriter_.
- Produced files: 
   - _atlas-typedef.json_: Contains type definitions of all types.
   - _atlas-migration-data.json_: Contains data from the database.


Diffs (updated)
-----

  distro/src/main/assemblies/migration-exporter.xml 8f751ff9 
  tools/atlas-migration-exporter/pom.xml 5c6c61ee 
  tools/atlas-migration-exporter/src/main/java/org/apache/atlas/migration/Exporter.java a9873df0 
  tools/atlas-migration-exporter/src/main/resources/README 2f2bf3e1 
  tools/atlas-migration-exporter/src/main/resources/atlas-log4j.xml PRE-CREATION 
  tools/atlas-migration-exporter/src/main/resources/atlas_migration.py 199cde28 
  tools/atlas-migration-exporter/src/main/resources/migration-export-request.json 64002aff 


Diff: https://reviews.apache.org/r/66184/diff/3/

Changes: https://reviews.apache.org/r/66184/diff/2-3/


Testing
-------

**Functional tests**
Export from repositories with:
- Custom types.
- Complex lineages.
- Created hive entities via beeline.
- Imported data.

**Gremlin Shell**
- Used _Gremlin_ shell to perform export operation.


Thanks,

Ashutosh Mestry


Re: Review Request 66184: Migration Utility: Branch 0.8: Performance Improvement

Posted by Ashutosh Mestry <am...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66184/
-----------------------------------------------------------

(Updated March 20, 2018, 11:15 p.m.)


Review request for atlas, Madhan Neethiraj, Ruchi Solani, and Sarath Subramanian.


Changes
-------

Updates include:
- Updated README file.
- Removed _migration-export-options.json_ file.


Bugs: ATLAS-2461
    https://issues.apache.org/jira/browse/ATLAS-2461


Repository: atlas


Description
-------

**Background** 
The migration utility committed ealier has couple of short comings:
- Relies on Export service.
  - Needs _export-options.json_ to be specified.
  - Exporting everything means meticuloulsy updating the options file. It is likely some specification is missed and hence will lead to less data being migrated. 
- Suffers from performance problems for large data sets.

**Approach**
The new approach uses _Titan's_ _GraphSON_ writer. This is configured to export all data in _EXTENDED_ format.

The _EXTENDED_ format separates _vertices_ and _edges_. This open other interesting avenues for import.

**Implementation**
- Modified _Exporter_ to use _AtlasTypeRegistry_ and _GraphSONWriter_.
- Produced files: 
   - _atlas-typedef.json_: Contains type definitions of all types.
   - _atlas-migration-data.json_: Contains data from the database.


Diffs (updated)
-----

  tools/atlas-migration-exporter/pom.xml 5c6c61ee 
  tools/atlas-migration-exporter/src/main/java/org/apache/atlas/migration/Exporter.java a9873df0 
  tools/atlas-migration-exporter/src/main/resources/README 2f2bf3e1 
  tools/atlas-migration-exporter/src/main/resources/migration-export-request.json 64002aff 


Diff: https://reviews.apache.org/r/66184/diff/2/

Changes: https://reviews.apache.org/r/66184/diff/1-2/


Testing
-------

**Functional tests**
Export from repositories with:
- Custom types.
- Complex lineages.
- Created hive entities via beeline.
- Imported data.

**Gremlin Shell**
- Used _Gremlin_ shell to perform export operation.


Thanks,

Ashutosh Mestry