You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Joey Echeverria (JIRA)" <ji...@apache.org> on 2011/08/19 15:36:27 UTC
[jira] [Created] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
The --hive-drop-import-delims option should accept a replacement string
-----------------------------------------------------------------------
Key: SQOOP-319
URL: https://issues.apache.org/jira/browse/SQOOP-319
Project: Sqoop
Issue Type: Bug
Components: hive-integration
Affects Versions: 1.3.0
Reporter: Joey Echeverria
Priority: Minor
When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-319:
----------------------------------
Attachment: SQOOP-319-1.patch
I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface. I added a test for the new option.
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089845#comment-13089845 ]
jiraposter@reviews.apache.org commented on SQOOP-319:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1598/
-----------------------------------------------------------
(Updated 2011-08-23 23:01:05.651698)
Review request for Sqoop.
Changes
-------
I added a hiveStringReplaceDelims() method and implemented hiveStringDropDelims() by calling that method. I added validation to throw an error if both --hive-drop-import-delims and --hive-delims-replacement are used. I also fixed the checkstyle issues that you found.
I added a test case for the validation code and also did manual testing of the feature.
Summary
-------
I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface.
This addresses bug SQOOP-319.
https://issues.apache.org/jira/browse/SQOOP-319
Diffs (updated)
-----
src/docs/user/hive-args.txt 7e6b7a0
src/docs/user/hive.txt 059d7cb
src/java/com/cloudera/sqoop/SqoopOptions.java d760d39
src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1
src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e
src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1
src/java/com/cloudera/sqoop/tool/ImportTool.java 66e60bd
src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd
testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION
Diff: https://reviews.apache.org/r/1598/diff
Testing
-------
I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.
Thanks,
Joey
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-319:
----------------------------------
Attachment: SQOOP-319-2.patch
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch, SQOOP-319-2.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "Arvind Prabhakar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvind Prabhakar reassigned SQOOP-319:
--------------------------------------
Assignee: Joey Echeverria
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090503#comment-13090503 ]
jiraposter@reviews.apache.org commented on SQOOP-319:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1598/#review1619
-----------------------------------------------------------
Ship it!
+1
- Arvind
On 2011-08-23 23:01:05, Joey Echeverria wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1598/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-23 23:01:05)
bq.
bq.
bq. Review request for Sqoop.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface.
bq.
bq.
bq. This addresses bug SQOOP-319.
bq. https://issues.apache.org/jira/browse/SQOOP-319
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/docs/user/hive-args.txt 7e6b7a0
bq. src/docs/user/hive.txt 059d7cb
bq. src/java/com/cloudera/sqoop/SqoopOptions.java d760d39
bq. src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1
bq. src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e
bq. src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1
bq. src/java/com/cloudera/sqoop/tool/ImportTool.java 66e60bd
bq. src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd
bq. testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/1598/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.
bq.
bq.
bq. Thanks,
bq.
bq. Joey
bq.
bq.
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch, SQOOP-319-2.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-319:
----------------------------------
Status: Patch Available (was: Open)
I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "Arvind Prabhakar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvind Prabhakar updated SQOOP-319:
-----------------------------------
Resolution: Fixed
Fix Version/s: 1.4.0
Status: Resolved (was: Patch Available)
Patch committed. Thanks Joey!
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Fix For: 1.4.0
>
> Attachments: SQOOP-319-1.patch, SQOOP-319-2.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088083#comment-13088083 ]
jiraposter@reviews.apache.org commented on SQOOP-319:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1598/#review1579
-----------------------------------------------------------
Thanks for the patch Joey. A high-level suggestion - please add validation that stops users from using both the options of --hive-drop-import-delims and the one you are introducing as they are logically incompatible.
A refactoring suggestion and minor checkstyle comments below.
src/java/com/cloudera/sqoop/lib/FieldFormatter.java
<https://reviews.apache.org/r/1598/#comment3565>
It will be better to create another method that is called hiveStringReplaceDelims(String,String) which is called by the original method with replacement string set to empty string.
src/java/com/cloudera/sqoop/orm/ClassWriter.java
<https://reviews.apache.org/r/1598/#comment3566>
Longer than 80.
src/java/com/cloudera/sqoop/orm/ClassWriter.java
<https://reviews.apache.org/r/1598/#comment3567>
Longer than 80.
src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java
<https://reviews.apache.org/r/1598/#comment3569>
Longer than 80.
src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java
<https://reviews.apache.org/r/1598/#comment3568>
Longer than 80.
src/test/com/cloudera/sqoop/hive/TestHiveImport.java
<https://reviews.apache.org/r/1598/#comment3570>
Longer than 80.
src/test/com/cloudera/sqoop/hive/TestHiveImport.java
<https://reviews.apache.org/r/1598/#comment3571>
Longer than 80.
- Arvind
On 2011-08-19 18:52:15, Joey Echeverria wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1598/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-08-19 18:52:15)
bq.
bq.
bq. Review request for Sqoop.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface.
bq.
bq.
bq. This addresses bug SQOOP-319.
bq. https://issues.apache.org/jira/browse/SQOOP-319
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/docs/user/hive-args.txt 7e6b7a0
bq. src/docs/user/hive.txt 059d7cb
bq. src/java/com/cloudera/sqoop/SqoopOptions.java d760d39
bq. src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1
bq. src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e
bq. src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1
bq. src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd
bq. testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/1598/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.
bq.
bq.
bq. Thanks,
bq.
bq. Joey
bq.
bq.
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "Arvind Prabhakar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvind Prabhakar updated SQOOP-319:
-----------------------------------
Status: Open (was: Patch Available)
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Echeverria updated SQOOP-319:
----------------------------------
Status: Patch Available (was: Open)
Updated patch based on review board feedback.
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch, SQOOP-319-2.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090778#comment-13090778 ]
Hudson commented on SQOOP-319:
------------------------------
Integrated in Sqoop-jdk-1.6 #17 (See [https://builds.apache.org/job/Sqoop-jdk-1.6/17/])
SQOOP-319. Support for replacing Hive delimiters.
(Joey Echeverria via Arvind Prabhakar)
arvind : http://svn.apache.org/viewvc/?view=rev&rev=1161382
Files :
* /incubator/sqoop/trunk/src/docs/user/hive.txt
* /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/SqoopOptions.java
* /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/lib/FieldFormatter.java
* /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java
* /incubator/sqoop/trunk/src/docs/user/hive-args.txt
* /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/orm/ClassWriter.java
* /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/hive/TestHiveImport.java
* /incubator/sqoop/trunk/testdata/hive/scripts/fieldWithNewlineReplacementImport.q
* /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/ImportTool.java
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Fix For: 1.4.0
>
> Attachments: SQOOP-319-1.patch, SQOOP-319-2.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-319) The --hive-drop-import-delims option
should accept a replacement string
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087888#comment-13087888 ]
jiraposter@reviews.apache.org commented on SQOOP-319:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1598/
-----------------------------------------------------------
Review request for Sqoop.
Summary
-------
I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface.
This addresses bug SQOOP-319.
https://issues.apache.org/jira/browse/SQOOP-319
Diffs
-----
src/docs/user/hive-args.txt 7e6b7a0
src/docs/user/hive.txt 059d7cb
src/java/com/cloudera/sqoop/SqoopOptions.java d760d39
src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1
src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e
src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1
src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd
testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION
Diff: https://reviews.apache.org/r/1598/diff
Testing
-------
I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.
Thanks,
Joey
> The --hive-drop-import-delims option should accept a replacement string
> -----------------------------------------------------------------------
>
> Key: SQOOP-319
> URL: https://issues.apache.org/jira/browse/SQOOP-319
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.3.0
> Reporter: Joey Echeverria
> Assignee: Joey Echeverria
> Priority: Minor
> Attachments: SQOOP-319-1.patch
>
>
> When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira