You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by christeoh <gi...@git.apache.org> on 2017/12/12 13:41:19 UTC

[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

GitHub user christeoh opened a pull request:

    https://github.com/apache/sqoop/pull/41

    Sqoop-3224: Binary ftp transfer mode

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/christeoh/sqoop SQOOP-3224

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/sqoop/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #41
    
----
commit 3efb449f132d366150d118282d5e1d69e3585cde
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-10-25T10:20:07Z

    Added support for binary ftp transfers

commit 045ef06ca94fe9b1f0f330c07b605f2ca23bbe18
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-15T02:36:07Z

    Moved mainframe FTP transfermode default setting to initDefaults()

commit 2f47f54da74a75be42ece621f2faeb7bcb640622
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-15T02:38:02Z

    Replaced import java.io.* with single class imports

commit e6c9dc582aa41dd88bb354dff5f807169cce8b10
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-16T02:07:07Z

    Removed excessive logging per record to improve performance

commit cad3f010d673ced5cbe9df3e9f6c78ea3917ac3c
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-16T02:07:42Z

    Added comment to document why we need to add custom class for binary transfers

commit f08fad3cefb219ba14317d151b18081079cdcdef
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-16T03:27:48Z

    Converted to use BufferedInputStream instead of InputStream

commit 2ccf3f077287011928383a9992b7472cfd4d0358
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-17T00:57:48Z

    Added unit tests for MainframeDatasetFTPRecordReader.getNextBinaryRecord

commit 8dd951b88f1ea3647e023b42e75fd687b14aff29
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-17T01:19:04Z

    Updated unit tests and used helper classes

commit ee5d4b27b804f5d9933ce859dfb2bcc289a001bd
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-17T01:22:06Z

    Updated unit tests to use a method of org.junit.Assert

commit 0ec15ebc35eb7839929b3c1c0063ecad9d874a00
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-17T01:35:04Z

    Updated unit test for compilation

commit cff44a3082a2ec52f072fd5f35d31606c551992a
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-17T05:28:57Z

    Used StringUtils to do comparisons and corrected bulk imports

commit 6feb6b60a54e16ddfd7d3f3457a47d10816ab81a
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-28T03:51:07Z

    Replaced star import with specific class import

commit a1a479146fed0f518609f1deb1f2c3e425d90ae2
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-28T03:51:33Z

    Updated to use current class instead of deprecated class

commit 55cdb8f54e62fa8dc24acdcbee68d1e2541d9c0c
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-28T03:52:10Z

    Refactored common functionality to another function

commit c6a80348d4202eb7b0a47e41c472042294550b50
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-28T03:52:30Z

    Adjusted comment

commit e4e923c3d011d215b64dd099ba5931b98fa96b78
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-29T04:36:44Z

    Moved tests from TestMainframeDatasetFTPRecordReader to separate class

commit eb598fc7919ffb493cbdf03ff23e467f84bdf617
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-29T04:37:15Z

    Adjusted class for unit test support:

commit d48601cd27b216826775bd4974a6e2088600c4ae
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-29T04:38:43Z

    Adjusted exceptions to print full stack

commit f0d440cb02b4648e55bf699170dfd27c6f6ae903
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-29T04:39:05Z

    Moved unit tests to another class

commit cef76eda1271b78d3a577cffdd285f93ea218b22
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-29T06:17:48Z

    Updated unit tests

commit 6a86f8e2f9c35e29493c744273da50d88d0ccc0d
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-29T11:27:58Z

    Tidied up unit tests

commit 775105ce5adb2905c48e5ada271566a7ff6c5176
Author: Chris Teoh <ct...@cloudera.com>
Date:   2017-11-29T11:43:28Z

    Updated getNextBinaryRecord logic to be simpler

----


---

[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

Posted by christeoh <gi...@git.apache.org>.
Github user christeoh commented on a diff in the pull request:

    https://github.com/apache/sqoop/pull/41#discussion_r161689494
  
    --- Diff: src/java/org/apache/sqoop/mapreduce/RawKeyTextOutputFormat.java ---
    @@ -61,6 +62,9 @@ private void writeObject(Object o) throws IOException {
           if (o instanceof Text) {
             Text to = (Text) o;
             out.write(to.getBytes(), 0, to.getLength());
    +      } else if (o instanceof BytesWritable) {
    --- End diff --
    
    Thanks @szvasas . I need a bit of help with applying the technique on the OutputFormats as they contain inner classes.


---

[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

Posted by szvasas <gi...@git.apache.org>.
Github user szvasas commented on a diff in the pull request:

    https://github.com/apache/sqoop/pull/41#discussion_r158502339
  
    --- Diff: src/java/org/apache/sqoop/SqoopOptions.java ---
    @@ -308,7 +308,9 @@ public String toString() {
       // Indicates if the data set is on tape to use different FTP parser
       @StoredAsProperty("mainframe.input.dataset.tape")
       private String mainframeInputDatasetTape;
    -
    +  // Indicates if binary or ascii FTP transfer mode should be used
    +  @StoredAsProperty("mainframe.ftp.transfermode")
    +  private String mainframeFtpTransferMode;
    --- End diff --
    
    So my idea is to add a new field to com.cloudera.sqoop.SqoopOptions.FileLayout, let's say BinaryFile which would only be supported for mainframe imports (we should add validator to check this) and I would use this new field instead of introducing the new mainframeFtpTransferMode Sqoop option. This would not make the Sqoop interface more complex and would let us reuse already existing concepts. What do you think?


---

[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

Posted by szvasas <gi...@git.apache.org>.
Github user szvasas commented on a diff in the pull request:

    https://github.com/apache/sqoop/pull/41#discussion_r158502479
  
    --- Diff: src/java/org/apache/sqoop/mapreduce/RawKeyTextOutputFormat.java ---
    @@ -61,6 +62,9 @@ private void writeObject(Object o) throws IOException {
           if (o instanceof Text) {
             Text to = (Text) o;
             out.write(to.getBytes(), 0, to.getLength());
    +      } else if (o instanceof BytesWritable) {
    --- End diff --
    
    I think it would be a good idea to keep the BinaryKeyOutputFormat because we change the behavior of RawKeyTextOutputFormat now which may lead to issues in other connectors and its name suggests that it handles text and not binary. You can use a technique to elimiate the code duplication similar I used in MainframeDatasetBinaryImportMapper.


---

[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

Posted by szvasas <gi...@git.apache.org>.
Github user szvasas commented on a diff in the pull request:

    https://github.com/apache/sqoop/pull/41#discussion_r158502914
  
    --- Diff: src/java/org/apache/sqoop/mapreduce/mainframe/MainframeDatasetBinaryRecord.java ---
    @@ -0,0 +1,128 @@
    +/**
    --- End diff --
    
    I am still not sure we will need this new class.
    I understand that when a SqoopRecord is generated runtime it will always have a text field and not binary but that might be solved by changing org.apache.sqoop.manager.MainframeManager#getColumnTypes
    
    My guess is that if we changed that to return column type based on the transfer mode or file type in the SqoopOptions object we could just use the generated class.


---