You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Carl Steinbach <cw...@apache.org> on 2018/08/07 04:31:26 UTC

Re: Review Request 68099: SerDe to support Teradata Binary Format

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68099/#review206929
-----------------------------------------------------------



Please note that while I have not commented on every occurrence of the following issues, I would still like them all to be fixed:
 * Unnecessary 'else' clauses
 * Unnecessary uses of 'this'
 * RuntimeExceptions which should be replaced with checked exceptions.


contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java
Lines 1 (patched)
<https://reviews.apache.org/r/68099/#comment290075>

    I think this code and the other files in this patch belong in the serde module. Please move the code to serde/src/java/org/apache/hadoop/hive/serde2/teradata



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java
Lines 71 (patched)
<https://reviews.apache.org/r/68099/#comment290076>

    This should be a static final class variable, i.e:
    
    static final byte RECORD_END_BYTE = ...



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 74 (patched)
<https://reviews.apache.org/r/68099/#comment290081>

    Please avoid unnecessary uses of "this", both in this file and others in the patch.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 89 (patched)
<https://reviews.apache.org/r/68099/#comment290083>

    Change message to "Input file is compressed. Using compression code %s"



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 92 (patched)
<https://reviews.apache.org/r/68099/#comment290082>

    Please remove or change message to "The input file is not compressed".



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 116 (patched)
<https://reviews.apache.org/r/68099/#comment290077>

    static import String.format() in order to avoid constantly using the "String." prefix.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 155 (patched)
<https://reviews.apache.org/r/68099/#comment290078>

    Magic constants (e.g. "0x0a") should be defined in one place (e.g. TeradataConstants.java) as a static final variable.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 197 (patched)
<https://reviews.apache.org/r/68099/#comment290079>

    Remove



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 211 (patched)
<https://reviews.apache.org/r/68099/#comment290080>

    Please make this more readable by replace the ?: operator with equivalent if(){} else {} code.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
Lines 101 (patched)
<https://reviews.apache.org/r/68099/#comment290085>

    Unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
Lines 122 (patched)
<https://reviews.apache.org/r/68099/#comment290084>

    This else clause is unnecessary.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 105 (patched)
<https://reviews.apache.org/r/68099/#comment290086>

    This else clause is unnecessary if you explicitly return from the previous block.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 148 (patched)
<https://reviews.apache.org/r/68099/#comment290087>

    Consider breaking this into multiple lines for improved readability:
    
    int toWrite = date.get().getYear() * 10000 +
                  date.get().getMonth() * 100 +
                  ...



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 178 (patched)
<https://reviews.apache.org/r/68099/#comment290089>

    Add explicit return and remove unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 184 (patched)
<https://reviews.apache.org/r/68099/#comment290088>

    Instead of logging this info separately, I think it would make more sense to include this in the exception message.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 191 (patched)
<https://reviews.apache.org/r/68099/#comment290093>

    Unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 190 (patched)
<https://reviews.apache.org/r/68099/#comment290090>

    Is this worth logging? If so, consider changing the log level to DEBUG.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 217 (patched)
<https://reviews.apache.org/r/68099/#comment290092>

    remove line



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 363 (patched)
<https://reviews.apache.org/r/68099/#comment290091>

    Throwing a SerdeException



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 363 (patched)
<https://reviews.apache.org/r/68099/#comment290094>

    Replace RuntimeExceptions with SerdeExceptions.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 409 (patched)
<https://reviews.apache.org/r/68099/#comment290095>

    Combine this into one statement using warn(String message, Throwable ex)



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 578 (patched)
<https://reviews.apache.org/r/68099/#comment290096>

    convert these into if statements and return the value instead of savintg it to byteNum



contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestGeneralFunctions.java
Lines 31 (patched)
<https://reviews.apache.org/r/68099/#comment290097>

    This belongs in the common module since it's testing functionality that is defined in the common module.


- Carl Steinbach


On July 31, 2018, 12:22 a.m., Lu Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68099/
> -----------------------------------------------------------
> 
> (Updated July 31, 2018, 12:22 a.m.)
> 
> 
> Review request for hive, Carl Steinbach and Daniel Dai.
> 
> 
> Bugs: HIVE-20225
>     https://issues.apache.org/jira/browse/HIVE-20225
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> When using TPT/BTEQ to export Data from Teradata, Teradata will export binary files based on the schema.
> 
> A Customized SerDe is needed in order to directly read these files from Hive.
> 
> CREATE EXTERNAL TABLE `TABLE1`(
> ...)
> PARTITIONED BY (
> ...)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
> STORED AS INPUTFORMAT
>  'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
> OUTPUTFORMAT
>  'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
> LOCATION ...;
> 
> SELECT * FROM `TABLE1`;
> Problem Statement:
> 
> Right now the fast way to export data from Teradata is using TPT. However, the Hive could not directly utilize these exported binary format because it doesn't have a SerDe for these files.
> 
> Result:
> 
> Provided with the SerDe, Hive can operate upon the exported Teradata Binary Format file transparently.
> 
> 
> Diffs
> -----
> 
>   contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java PRE-CREATION 
>   contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java PRE-CREATION 
>   contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java PRE-CREATION 
>   contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java PRE-CREATION 
>   contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java PRE-CREATION 
>   contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java PRE-CREATION 
>   contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestGeneralFunctions.java PRE-CREATION 
>   contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDate.java PRE-CREATION 
>   contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDecimal.java PRE-CREATION 
>   contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForTimeStamp.java PRE-CREATION 
>   contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeGeneral.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68099/diff/2/
> 
> 
> Testing
> -------
> 
> Junit tests have been added for Serialization and Deserialization functions
> 
> 
> Thanks,
> 
> Lu Li
> 
>