You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Marta Kuczora <ku...@cloudera.com> on 2016/08/23 14:44:39 UTC

Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/
-----------------------------------------------------------

Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.


Bugs: HIVE-14404
    https://issues.apache.org/jira/browse/HIVE-14404


Repository: hive-git


Description
-------

Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.

This review is a follow-up for the review 50896.


Diffs
-----

  beeline/pom.xml dc89e81 
  beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
  itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 49c1120 

Diff: https://reviews.apache.org/r/51334/diff/


Testing
-------

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.


Thanks,

Marta Kuczora


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Gabor Szadovszky <ga...@cloudera.com>.

> On Sept. 6, 2016, 1:21 p.m., Gabor Szadovszky wrote:
> > LGTM
> 
> Marta Kuczora wrote:
>     Thanks a lot for the review by the way. I forgot to write it earlier. :)

You're very welcome. :)


- Gabor


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/#review147822
-----------------------------------------------------------


On Sept. 6, 2016, 1:10 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51334/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2016, 1:10 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
> The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> This review is a follow-up for the review 50896.
> 
> 
> Diffs
> -----
> 
>   beeline/pom.xml d03f770 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java 8e65e39 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 59fbca3 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   bin/beeline.cmd 971e20b 
>   bin/ext/beeline.sh 8052c45 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 1ca7623 
> 
> Diff: https://reviews.apache.org/r/51334/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Sept. 6, 2016, 1:21 p.m., Gabor Szadovszky wrote:
> > LGTM

Thanks a lot for the review by the way. I forgot to write it earlier. :)


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/#review147822
-----------------------------------------------------------


On Sept. 6, 2016, 1:10 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51334/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2016, 1:10 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
> The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> This review is a follow-up for the review 50896.
> 
> 
> Diffs
> -----
> 
>   beeline/pom.xml d03f770 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java 8e65e39 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 59fbca3 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   bin/beeline.cmd 971e20b 
>   bin/ext/beeline.sh 8052c45 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 1ca7623 
> 
> Diff: https://reviews.apache.org/r/51334/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Gabor Szadovszky <ga...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/#review147822
-----------------------------------------------------------


Ship it!




LGTM

- Gabor Szadovszky


On Sept. 6, 2016, 1:10 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51334/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2016, 1:10 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
> The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> This review is a follow-up for the review 50896.
> 
> 
> Diffs
> -----
> 
>   beeline/pom.xml d03f770 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java 8e65e39 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 59fbca3 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   bin/beeline.cmd 971e20b 
>   bin/ext/beeline.sh 8052c45 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 1ca7623 
> 
> Diff: https://reviews.apache.org/r/51334/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Marta Kuczora <ku...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/
-----------------------------------------------------------

(Updated Sept. 6, 2016, 1:10 p.m.)


Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.


Changes
-------

- Fixed the findings from the review.
- Removed the super csv from the beeline.cmd and beeline.sh files since it is not needed any more.
- Fixed the output format for cases when the result contains empty or null values. Also added some test cases with empty and null values.


Bugs: HIVE-14404
    https://issues.apache.org/jira/browse/HIVE-14404


Repository: hive-git


Description
-------

Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.

This review is a follow-up for the review 50896.


Diffs (updated)
-----

  beeline/pom.xml d03f770 
  beeline/src/java/org/apache/hive/beeline/BeeLine.java 8e65e39 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 59fbca3 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
  bin/beeline.cmd 971e20b 
  bin/ext/beeline.sh 8052c45 
  itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 1ca7623 

Diff: https://reviews.apache.org/r/51334/diff/


Testing
-------

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.


Thanks,

Marta Kuczora


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Gabor Szadovszky <ga...@cloudera.com>.

> On Aug. 26, 2016, 5:39 p.m., Gabor Szadovszky wrote:
> > Some minor findings/questions. LGTM, otherwise.

Thanks for the patch. :)


- Gabor


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/#review146985
-----------------------------------------------------------


On Aug. 23, 2016, 2:45 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51334/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2016, 2:45 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
> The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> This review is a follow-up for the review 50896.
> 
> 
> Diffs
> -----
> 
>   beeline/pom.xml dc89e81 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 49c1120 
> 
> Diff: https://reviews.apache.org/r/51334/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Aug. 26, 2016, 5:39 p.m., Gabor Szadovszky wrote:
> > beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java, line 101
> > <https://reviews.apache.org/r/51334/diff/1/?file=1482090#file1482090line101>
> >
> >     Does it make sense to add an empty string to the StringBuilder?

No, it is not really necessary. You only have to add a separator if the value is null.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/#review146985
-----------------------------------------------------------


On Aug. 23, 2016, 2:45 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51334/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2016, 2:45 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
> The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> This review is a follow-up for the review 50896.
> 
> 
> Diffs
> -----
> 
>   beeline/pom.xml dc89e81 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 49c1120 
> 
> Diff: https://reviews.apache.org/r/51334/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Aug. 26, 2016, 5:39 p.m., Gabor Szadovszky wrote:
> > beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java, line 113
> > <https://reviews.apache.org/r/51334/diff/1/?file=1482090#file1482090line113>
> >
> >     From performance point of view it might be a good idea to use something like appendEscapedValue(StringBuilder, String) instead to avoid creating several StringBuilder and String objects.

Sure, it can be done like this.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/#review146985
-----------------------------------------------------------


On Aug. 23, 2016, 2:45 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51334/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2016, 2:45 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
> The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> This review is a follow-up for the review 50896.
> 
> 
> Diffs
> -----
> 
>   beeline/pom.xml dc89e81 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 49c1120 
> 
> Diff: https://reviews.apache.org/r/51334/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Gabor Szadovszky <ga...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/#review146985
-----------------------------------------------------------



Some minor findings/questions. LGTM, otherwise.


beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java (line 84)
<https://reviews.apache.org/r/51334/#comment213934>

    Does it make sense to add an empty string to the StringBuilder?



beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java (line 96)
<https://reviews.apache.org/r/51334/#comment213939>

    From performance point of view it might be a good idea to use something like appendEscapedValue(StringBuilder, String) instead to avoid creating several StringBuilder and String objects.


- Gabor Szadovszky


On Aug. 23, 2016, 2:45 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51334/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2016, 2:45 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.
> 
> 
> Bugs: HIVE-14404
>     https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
> The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> This review is a follow-up for the review 50896.
> 
> 
> Diffs
> -----
> 
>   beeline/pom.xml dc89e81 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 49c1120 
> 
> Diff: https://reviews.apache.org/r/51334/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 51334: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version

Posted by Marta Kuczora <ku...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51334/
-----------------------------------------------------------

(Updated Aug. 23, 2016, 2:45 p.m.)


Review request for hive, Naveen Gangam, Peter Vary, Sergio Pena, and Szehon Ho.


Summary (updated)
-----------------

HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters - second version


Bugs: HIVE-14404
    https://issues.apache.org/jira/browse/HIVE-14404


Repository: hive-git


Description
-------

Changed the class which is used to generate the output for the formats 'csv2', 'tsv2' and 'dsv' not to use SuperCSV any more and support multiple characters as delimiter.
The class implements the same escaping logic as it had with SuperCSV if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.

This review is a follow-up for the review 50896.


Diffs
-----

  beeline/pom.xml dc89e81 
  beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
  itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 49c1120 

Diff: https://reviews.apache.org/r/51334/diff/


Testing
-------

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter.


Thanks,

Marta Kuczora