You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Marta Kuczora <ku...@cloudera.com> on 2016/08/08 15:13:28 UTC

Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/
-----------------------------------------------------------

Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.


Repository: hive-git


Description
-------

Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.

Main changes in the code:
 - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat

 - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.

 - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.

 - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.


Diffs
-----

  beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
  beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
  beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
  itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 

Diff: https://reviews.apache.org/r/50896/diff/


Testing
-------

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.


Thanks,

Marta Kuczora


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Peter Vary <pv...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145993
-----------------------------------------------------------


Ship it!




Hi Marta,

Thanks. LGTM (non binding)

Peter

- Peter Vary


On Aug. 17, 2016, 2:14 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> -----------------------------------------------------------
> 
> (Updated Aug. 17, 2016, 2:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/main/resources/BeeLine.properties 95b8fa1 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Aug. 18, 2016, 12:11 a.m., Sergio Pena wrote:
> > What about stop using the superCSV so that we can keep the 'dsv' format that can support singler and multiple characters?
> > I don't like the use of another 'dsv2' format for multiple ones. It might be confusing for users.

Sure, I can change the patch to use the "new logic" instead of superCSV. I was thinking about this approach when I started to work on this issue. I was just not sure which one would be preferable: leave the existing dsv format unchanged and create a new one or change the existing one not to use superCSV any more.

What do you mean exactly by stop using superCSV? Only for dsv outputformat (the formats tsv2 and csv2 will still use it) or completely remove it from the project?


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review146056
-----------------------------------------------------------


On Aug. 17, 2016, 2:14 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> -----------------------------------------------------------
> 
> (Updated Aug. 17, 2016, 2:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/main/resources/BeeLine.properties 95b8fa1 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review146056
-----------------------------------------------------------



What about stop using the superCSV so that we can keep the 'dsv' format that can support singler and multiple characters?
I don't like the use of another 'dsv2' format for multiple ones. It might be confusing for users.

- Sergio Pena


On Aug. 17, 2016, 2:14 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> -----------------------------------------------------------
> 
> (Updated Aug. 17, 2016, 2:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/main/resources/BeeLine.properties 95b8fa1 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Marta Kuczora <ku...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/
-----------------------------------------------------------

(Updated Aug. 17, 2016, 2:14 p.m.)


Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.


Changes
-------

Patch is fixed according to the review:
- Display an error message if multiple character delimiter is set with dsv output format. In this case it will fall back to the default dsv delimiter.
- Introduced new constant for default dsv2 delimiter to avoid the String<->char conversions.


Bugs: HIVE-14404
    https://issues.apache.org/jira/browse/HIVE-14404


Repository: hive-git


Description
-------

Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.

Main changes in the code:
 - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat

 - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.

 - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.

 - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.


Diffs (updated)
-----

  beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
  beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
  beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
  beeline/src/main/resources/BeeLine.properties 95b8fa1 
  itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 

Diff: https://reviews.apache.org/r/50896/diff/


Testing
-------

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.


Thanks,

Marta Kuczora


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Marta Kuczora <ku...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/
-----------------------------------------------------------

(Updated Aug. 11, 2016, 3:50 p.m.)


Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-14404
    https://issues.apache.org/jira/browse/HIVE-14404


Repository: hive-git


Description
-------

Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.

Main changes in the code:
 - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat

 - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.

 - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.

 - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.


Diffs
-----

  beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
  beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
  beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
  itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 

Diff: https://reviews.apache.org/r/50896/diff/


Testing
-------

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.


Thanks,

Marta Kuczora


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Aug. 9, 2016, 5 a.m., Peter Vary wrote:
> > Hi Marta,
> > 
> > Thanks for the patch, it is nice, and clean.
> > It might be a good idea to have the inputs checked, so if the user provides a multicharacter separator with a dsv format, then instead of using the first character of the string, an error might be printed.
> > 
> > Otherwise looks good.
> > 
> > Thanks,
> > Peter

Hi Peter,

thanks a lot for the review.
Sure, I can change the patch to check the input and print an error if the format is dsv and the separator contains multiple characters.

Regards,
Marta


> On Aug. 9, 2016, 5 a.m., Peter Vary wrote:
> > beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java, line 106
> > <https://reviews.apache.org/r/50896/diff/1/?file=1466246#file1466246line106>
> >
> >     nit Would not be better to change the type of the DEFAULT_DELIMITER_DSV to String?

If we change the type of the DEFAULT_DELIMITER_DSV to String, then we would need to do the converting for the single-character delimiter case. What would be better I think is to create a new default for dsv2, something like "String DEFAULT_DELIMITER_DSV2".


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145167
-----------------------------------------------------------


On Aug. 8, 2016, 3:13 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Peter Vary <pv...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145167
-----------------------------------------------------------



Hi Marta,

Thanks for the patch, it is nice, and clean.
It might be a good idea to have the inputs checked, so if the user provides a multicharacter separator with a dsv format, then instead of using the first character of the string, an error might be printed.

Otherwise looks good.

Thanks,
Peter


beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java (line 106)
<https://reviews.apache.org/r/50896/#comment211330>

    nit Would not be better to change the type of the DEFAULT_DELIMITER_DSV to String?


- Peter Vary


On Aug. 8, 2016, 3:13 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Aug. 9, 2016, 5:35 p.m., Szehon Ho wrote:
> > I'm ambivilent, I would rather have pursued the change to make it in superCSV to be better in long run.  But I do see it might not move very fast (did you guys try contacting them?).   The patch itself looks mostly fine though.
> > 
> > My only question is, does it need to be a 2nd version of the format?  That is, is there anything that is actually backward incompatibie other than adding a new flag?  Thanks.

No, I didn\u2019t contacted the SuperCSV team. I asked around what would be the better way to go and I mostly got the answer to fix the issue in Hive, so I went in that direction. But I can try to contact them if fixing the issue there would be preferable.

Actually the issue can be fixed without introducing the 2nd dsv output format as well. In this case I saw two possible solutions:
- Separate the logic for the dsv output format from the csv2 and tsv2 formats and for dsv format always use the new logic which supports the string delimiter. In this case the Super CSV library wouldn\u2019t be used for the dsv format.

- Change the logic in the SeparatedValuesOutputFormat class to use the SuperCSV library just like now if the delimiter is a single character and the use the new logic if the delimiter is a string.  This solution would leave the single-character case unchanged, and would support the multi-character delimiter for the same dsv format, but the code in the SeparatedValuesOutputFormat class would not be that nice.

I was thinking about which solution to choose. The reason why I introduced the new format is that I didn\u2019t wan\u2019t to change the logic for the single-character delimiter case, because I think using Super CSV makes the code a lot cleaner. But I also wanted to the keep the code clean, so I chose to separate the single-, and multi-character delimiter cases and introduce a new format.
If it is preferable to have only one dsv format, I can change the patch easily.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145227
-----------------------------------------------------------


On Aug. 8, 2016, 3:13 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

Posted by Szehon Ho <sz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145227
-----------------------------------------------------------



I'm ambivilent, I would rather have pursued the change to make it in superCSV to be better in long run.  But I do see it might not move very fast (did you guys try contacting them?).   The patch itself looks mostly fine though.

My only question is, does it need to be a 2nd version of the format?  That is, is there anything that is actually backward incompatibie other than adding a new flag?  Thanks.

- Szehon Ho


On Aug. 8, 2016, 3:13 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn\u2019t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn\u2019t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>