You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Gwen Shapira <gs...@cloudera.com> on 2014/05/25 20:56:53 UTC

Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/
-----------------------------------------------------------

Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 


Diffs
-----

  tools/pom.xml 31eda1c 
  tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
  tools/src/main/java/org/apache/sqoop/tools/tool/DataDumpTool.java PRE-CREATION 

Diff: https://reviews.apache.org/r/21898/diff/


Testing
-------

Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.


Thanks,

Gwen Shapira


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review44727
-----------------------------------------------------------


I've tried the patch on real cluster and I need to think about the export format a bit further. Otherwise I do have two small nits:


tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java
<https://reviews.apache.org/r/21898/#comment79198>

    It seems that the "verify" is there now twice?



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment79201>

    Shouldn't we return false at this point?


Jarcec

- Jarek Cecho


On May 30, 2014, 5:40 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated May 30, 2014, 5:40 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 
> 
> 
> Diffs
> -----
> 
>   docs/src/site/sphinx/Tools.rst ad72cd1 
>   pom.xml 1e2f005 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Abraham Elmahrek <ab...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review53072
-----------------------------------------------------------

Ship it!


Ship It!

- Abraham Elmahrek


On July 21, 2014, 8:55 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated July 21, 2014, 8:55 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 
> 
> 
> Diffs
> -----
> 
>   docs/src/site/sphinx/Tools.rst ad72cd1 
>   pom.xml 1e2f005 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/
-----------------------------------------------------------

(Updated July 21, 2014, 8:55 p.m.)


Review request for Sqoop.


Changes
-------

Fixed most of the issues raised by Jarcec and Abe. Also added "include-sensitive" flag to the metadata dump, so we can warn when loading a repository missing some information.


Repository: sqoop-sqoop2


Description
-------

Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 


Diffs (updated)
-----

  docs/src/site/sphinx/Tools.rst ad72cd1 
  pom.xml 1e2f005 
  tools/pom.xml 31eda1c 
  tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
  tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java PRE-CREATION 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 

Diff: https://reviews.apache.org/r/21898/diff/


Testing
-------

Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.


Thanks,

Gwen Shapira


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.

> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > Good work Gwen! Couple of high level notes:
> > 
> > 1) Please always put the patch on JIRA as well, we do have pre-commit build that will test your changes. Also we can't commit your changes unless they are attached to JIRA.
> > 2) The loader tool will load entire repository dump into memory which seems fine for now, but we might need to think about file format that would enable us process the dump in streaming fashion in the future.

2) I seriously seriously doubt we'll ever run into an issue here. Even large Sqoop users have around 50 jobs, we can easily have a year of daily submissions in less than 0.5G RAM. 
But in the unlikely case we'll need it - Jackson has streaming JSON Parser. The only part that may not fit into memory is the "submissions" records, and we can load them one-by-one easily. 


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java, line 21
> > <https://reviews.apache.org/r/21898/diff/6/?file=635434#file635434line21>
> >
> >     I don't think that interface is the correct type here, what about final class?

Makes sense. 

Since most of those JSON constants are used all over the code, I'm wondering if it makes sense to collect all of them into a class in org.apache.sqoop.json.util?

(As a separate Jira)


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java, lines 143-147
> > <https://reviews.apache.org/r/21898/diff/6/?file=635435#file635435line143>
> >
> >     Wouldn't be cleaner to just use ConnectorManager.getConnector API to get the connector name?
> >     
> >     https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/connector/ConnectorManager.java#L135

It looks like there are two interfaces to get connector metadata - one in ConnectorManager and one in RepositoryManager. I was using the RepositoryManager everywhere here and didn't notice that ConnectorManager actually provides better API.

I'm wondering if there's a good reason for the redundancy or whether we should clean this up.


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, lines 137-144
> > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line137>
> >
> >     What about creating another parent tool that will initialize entire Sqoop infrastructure? Something like the ConfiguredTool - we should be able to use the same tool for dump and load.

I'm not sure its a good idea - all the initialize methods take parameters, which may be different for different tools.

In this case I'm initializing RepositoryManager as immutable in both dump and load tools, but it should probably be mutable in load.


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, line 231
> > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line231>
> >
> >     I'm thinking if this check is indeed necessary. User might want to load older backup to a different Sqoop version right?

Users may indeed want to (although it shouldn't be common - backup best practices include a fresh backup immediately following a successful upgrade), it should work (we don't plan on making backward incompatible changes) and its trivial to work around this check for any sufficiently determined user (or support).

The reason its there is that otherwise we are implicitly promising that dump and load will work between any two versions. I doubt we want to promise (and test!) that.

If you think its important, I can check that the repository is newer than the json. Importing new dump into an old repo is much more likely to break (new fields in forms).


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, line 243
> > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line243>
> >
> >     Similar update code is already in the code base, so I'm wondering if it would make sense to abstract and reuse it rather then have similar code on two places?
> >     
> >     https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/repository/Repository.java#L386

Lets do it in a separate Jira for refactoring? 
I don't want to add changes to the Repository into the scope of this patch.


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java, line 18
> > <https://reviews.apache.org/r/21898/diff/6/?file=635434#file635434line18>
> >
> >     I can't help by my OCD is complaining about the location of the file - it's in generic package, but only two tools are using it. What about creating sub-package "repository" and put repository related tools there?

See my comment below. I agree this isn't a good place, but I think the right place is outside Tools completely, so same constants can be used by clients, and other parts of the framework that need to work with JSON.


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, lines 361-366
> > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line361>
> >
> >     Seems like good use case to use ConnectorManager.getConnector() API?
> >     
> >     https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/connector/ConnectorManager.java#L140

It would be, if it was possible. 

ConnectorManager.getConnector returns a connector. I need the connector PersistenceID, which is part of the connector metadata.
ConnectorManager.getConnectorMetadata takes an ID as a parameter, but not a name. The repository API has the same issue.

I'll open a separate JIRA for the cleanup. 


- Gwen


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review48179
-----------------------------------------------------------


On July 18, 2014, 4:11 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated July 18, 2014, 4:11 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 
> 
> 
> Diffs
> -----
> 
>   docs/src/site/sphinx/Tools.rst ad72cd1 
>   pom.xml 1e2f005 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review48179
-----------------------------------------------------------


Good work Gwen! Couple of high level notes:

1) Please always put the patch on JIRA as well, we do have pre-commit build that will test your changes. Also we can't commit your changes unless they are attached to JIRA.
2) The loader tool will load entire repository dump into memory which seems fine for now, but we might need to think about file format that would enable us process the dump in streaming fashion in the future.


tools/pom.xml
<https://reviews.apache.org/r/21898/#comment84503>

    Super nit: The formatting seems to be off here.



tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java
<https://reviews.apache.org/r/21898/#comment84504>

    Super nit: Please put space between command and the "RepositoryXTool.class..."



tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java
<https://reviews.apache.org/r/21898/#comment84511>

    I can't help by my OCD is complaining about the location of the file - it's in generic package, but only two tools are using it. What about creating sub-package "repository" and put repository related tools there?



tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java
<https://reviews.apache.org/r/21898/#comment84510>

    I don't think that interface is the correct type here, what about final class?



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84512>

    Nit: Unused import.



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84513>

    Nit: Unused import.



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84514>

    Nit: Unused import.



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84505>

    Shouldn't the default value be NULL?



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84506>

    Shouldn't we on parse error return false to indicate failure?



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84507>

    Super nit: Please put spaces after the commas...



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84515>

    Wouldn't be cleaner to just use ConnectorManager.getConnector API to get the connector name?
    
    https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/connector/ConnectorManager.java#L135



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java
<https://reviews.apache.org/r/21898/#comment84508>

    Not sure if this is indeed needed import.



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java
<https://reviews.apache.org/r/21898/#comment84516>

    Nit: Please use "//" for one line comments.



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java
<https://reviews.apache.org/r/21898/#comment84509>

    What about creating another parent tool that will initialize entire Sqoop infrastructure? Something like the ConfiguredTool - we should be able to use the same tool for dump and load.



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java
<https://reviews.apache.org/r/21898/#comment84517>

    Nit: Please do convert this to the standard Java doc comment.



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java
<https://reviews.apache.org/r/21898/#comment84518>

    I'm thinking if this check is indeed necessary. User might want to load older backup to a different Sqoop version right?



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java
<https://reviews.apache.org/r/21898/#comment84520>

    Similar update code is already in the code base, so I'm wondering if it would make sense to abstract and reuse it rather then have similar code on two places?
    
    https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/repository/Repository.java#L386



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java
<https://reviews.apache.org/r/21898/#comment84519>

    Seems like good use case to use ConnectorManager.getConnector() API?
    
    https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/connector/ConnectorManager.java#L140


Jarcec

- Jarek Cecho


On July 18, 2014, 4:11 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated July 18, 2014, 4:11 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 
> 
> 
> Diffs
> -----
> 
>   docs/src/site/sphinx/Tools.rst ad72cd1 
>   pom.xml 1e2f005 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.

> On July 18, 2014, 7:50 p.m., Abraham Elmahrek wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java, lines 62-69
> > <https://reviews.apache.org/r/21898/diff/6/?file=635435#file635435line62>
> >
> >     Many of these strings seem frequently used. Why not pull them out into constants? Though, this might not matter with String interning.

Probably doesn't matter either way since each is used twice. I'll add a constants class (to share between dump and load tools) if this gets out of control.


- Gwen


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review48144
-----------------------------------------------------------


On July 18, 2014, 4:11 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated July 18, 2014, 4:11 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 
> 
> 
> Diffs
> -----
> 
>   docs/src/site/sphinx/Tools.rst ad72cd1 
>   pom.xml 1e2f005 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Abraham Elmahrek <ab...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review48144
-----------------------------------------------------------


Good stuff! A couple of thoughts:
- Are connections or jobs ever overwritten? I think that connections and jobs with the same name should be overwritten.
- An ID map is sensible, but could we also use "name" to uniquely identify connections.


tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84442>

    NIT: Does this need a value?



tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java
<https://reviews.apache.org/r/21898/#comment84441>

    Many of these strings seem frequently used. Why not pull them out into constants? Though, this might not matter with String interning.


- Abraham Elmahrek


On July 18, 2014, 4:11 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated July 18, 2014, 4:11 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 
> 
> 
> Diffs
> -----
> 
>   docs/src/site/sphinx/Tools.rst ad72cd1 
>   pom.xml 1e2f005 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/
-----------------------------------------------------------

(Updated July 18, 2014, 4:11 p.m.)


Review request for Sqoop.


Changes
-------

Cleaned up the constants to a separate interface. 
Added some error handling and logging.


Repository: sqoop-sqoop2


Description
-------

Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 


Diffs (updated)
-----

  docs/src/site/sphinx/Tools.rst ad72cd1 
  pom.xml 1e2f005 
  tools/pom.xml 31eda1c 
  tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
  tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java PRE-CREATION 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 

Diff: https://reviews.apache.org/r/21898/diff/


Testing
-------

Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.


Thanks,

Gwen Shapira


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/
-----------------------------------------------------------

(Updated July 18, 2014, 1:22 a.m.)


Review request for Sqoop.


Changes
-------

This version contains both dump and load (SQOOP-1256). I think it makes more sense to review and test them together.

I am aware that once the new TO/FROM interfaces arrive this will require some re-write (we are assuming job types, one connector per job/connection, etc). I think its better to commit the feature as is and revise later (just in case we'll decide to support MySQL repositories before we move to TO/FROM).


Repository: sqoop-sqoop2


Description
-------

Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 


Diffs (updated)
-----

  docs/src/site/sphinx/Tools.rst ad72cd1 
  pom.xml 1e2f005 
  tools/pom.xml 31eda1c 
  tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 

Diff: https://reviews.apache.org/r/21898/diff/


Testing
-------

Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.


Thanks,

Gwen Shapira


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/
-----------------------------------------------------------

(Updated May 30, 2014, 5:40 p.m.)


Review request for Sqoop.


Changes
-------

Accidentally uploaded and published a patch from unclean branch (it included some extra files that don't belong here).

Please ignore r3 and use r4 :)


Repository: sqoop-sqoop2


Description
-------

Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 


Diffs (updated)
-----

  docs/src/site/sphinx/Tools.rst ad72cd1 
  pom.xml 1e2f005 
  tools/pom.xml 31eda1c 
  tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 

Diff: https://reviews.apache.org/r/21898/diff/


Testing
-------

Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.


Thanks,

Gwen Shapira


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/
-----------------------------------------------------------

(Updated May 30, 2014, 5:37 p.m.)


Review request for Sqoop.


Changes
-------

Added version information to the dump, to allow the load function to verify compatibility.


Repository: sqoop-sqoop2


Description
-------

Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 


Diffs (updated)
-----

  core/src/main/java/org/apache/sqoop/repository/JdbcRepository.java 7768b13 
  core/src/main/java/org/apache/sqoop/repository/JdbcRepositoryHandler.java 9299484 
  core/src/main/java/org/apache/sqoop/repository/Repository.java 4c7b866 
  docs/src/site/sphinx/Tools.rst ad72cd1 
  pom.xml 1e2f005 
  tools/pom.xml 31eda1c 
  tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 

Diff: https://reviews.apache.org/r/21898/diff/


Testing
-------

Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.


Thanks,

Gwen Shapira


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/
-----------------------------------------------------------

(Updated May 29, 2014, 10:53 p.m.)


Review request for Sqoop.


Repository: sqoop-sqoop2


Description
-------

Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 


Diffs
-----

  docs/src/site/sphinx/Tools.rst ad72cd1 
  pom.xml 1e2f005 
  tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 

Diff: https://reviews.apache.org/r/21898/diff/


Testing
-------

Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.


Thanks,

Gwen Shapira


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/
-----------------------------------------------------------

(Updated May 29, 2014, 10:48 p.m.)


Review request for Sqoop.


Changes
-------

Fixed issues below.
* changed tool name to RepositoryDump
* changed argument to --include-sensivite
* added output file argument so the dump will not go to stdout
* added connector name to output
* added dependency to root pom
* added documentation


Repository: sqoop-sqoop2


Description
-------

Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 


Diffs (updated)
-----

  docs/src/site/sphinx/Tools.rst ad72cd1 
  pom.xml 1e2f005 
  tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
  tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 

Diff: https://reviews.apache.org/r/21898/diff/


Testing
-------

Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.


Thanks,

Gwen Shapira


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Gwen Shapira <gs...@cloudera.com>.

> On May 26, 2014, 1:50 a.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/DataDumpTool.java, line 57
> > <https://reviews.apache.org/r/21898/diff/1/?file=594245#file594245line57>
> >
> >     I think that the idea is that user should be able to use the dump to load data back to repository (e.g. for backup purpose or during migration from one repository to another one). Something like mysqldump & mysqlimport. Hence two high level notes:
> >     
> >     * I would assume that the extra messages "dumping ..." will cause issues when parsing the text. 
> >     * We do need to output some information about the connectors though. The connection will contain connector ID without specifying what exact connector has been there - this might be a trouble as different connectors might have different IDs on different repositories. We don't necessary have to print out entire connector info, but at least the associated unique identification.

Re, first note:
The tools framework itself is writing messages to stdout, and I think they are pretty helpful (version, etc).
I think it will be cleanest to take an extra parameter with output filename and write the JSON to a file. 


> On May 26, 2014, 1:50 a.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java, line 39
> > <https://reviews.apache.org/r/21898/diff/1/?file=594244#file594244line39>
> >
> >     The "datadump" might suggest that Sqoop will do some data transfer. What about using word "repo" or "repository" instead? Something like "repositorydump", "repodump" or "repositoryexport", "repoexport"?

RepositoryDump sounds good to me.
The word "dump" is staying. Its a pretty standard ORM term for doing exactly what we are doing here, and the word "Export" is already used in Sqoop context.


- Gwen


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review43910
-----------------------------------------------------------


On July 18, 2014, 4:11 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated July 18, 2014, 4:11 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 
> 
> 
> Diffs
> -----
> 
>   docs/src/site/sphinx/Tools.rst ad72cd1 
>   pom.xml 1e2f005 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>


Re: Review Request 21898: Patch for SQOOP-1255 - https://issues.apache.org/jira/browse/SQOOP-1255

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review43910
-----------------------------------------------------------


Good work Gwen! I do have couple of high level notes:

1) Can you please add documentation? https://github.com/apache/sqoop/blob/sqoop2/docs/src/site/sphinx/Tools.rst

2) Absence of tests is currently expected as we don't have testing infrastructure for tools.


tools/pom.xml
<https://reviews.apache.org/r/21898/#comment78199>

    Let's put all the dependencies into root pom.xml file, including the versions. We do have <dependencyManagement> section for that.



tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java
<https://reviews.apache.org/r/21898/#comment78200>

    The "datadump" might suggest that Sqoop will do some data transfer. What about using word "repo" or "repository" instead? Something like "repositorydump", "repodump" or "repositoryexport", "repoexport"?



tools/src/main/java/org/apache/sqoop/tools/tool/DataDumpTool.java
<https://reviews.apache.org/r/21898/#comment78201>

    This file is missing Apache license header.



tools/src/main/java/org/apache/sqoop/tools/tool/DataDumpTool.java
<https://reviews.apache.org/r/21898/#comment78202>

    Please don't use "*" imports, always iterate all used symbols.



tools/src/main/java/org/apache/sqoop/tools/tool/DataDumpTool.java
<https://reviews.apache.org/r/21898/#comment78203>

    We are recommending to shut down server for all tools:
    
    https://github.com/apache/sqoop/blob/sqoop2/docs/src/site/sphinx/Tools.rst



tools/src/main/java/org/apache/sqoop/tools/tool/DataDumpTool.java
<https://reviews.apache.org/r/21898/#comment78204>

    Nit: "include-sensitive" might be a bit better name? Yeah I kind of don't like the word "dump" :-(



tools/src/main/java/org/apache/sqoop/tools/tool/DataDumpTool.java
<https://reviews.apache.org/r/21898/#comment78205>

    I think that the idea is that user should be able to use the dump to load data back to repository (e.g. for backup purpose or during migration from one repository to another one). Something like mysqldump & mysqlimport. Hence two high level notes:
    
    * I would assume that the extra messages "dumping ..." will cause issues when parsing the text. 
    * We do need to output some information about the connectors though. The connection will contain connector ID without specifying what exact connector has been there - this might be a trouble as different connectors might have different IDs on different repositories. We don't necessary have to print out entire connector info, but at least the associated unique identification.


- Jarek Cecho


On May 25, 2014, 6:56 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated May 25, 2014, 6:56 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and submissions. There's an option to dump sensitive data (i.e. passwords) as well. 
> 
> 
> Diffs
> -----
> 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/DataDumpTool.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>