You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Carl Steinbach <cw...@apache.org> on 2018/08/07 04:31:26 UTC
Re: Review Request 68099: SerDe to support Teradata Binary Format
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68099/#review206929
-----------------------------------------------------------
Please note that while I have not commented on every occurrence of the following issues, I would still like them all to be fixed:
* Unnecessary 'else' clauses
* Unnecessary uses of 'this'
* RuntimeExceptions which should be replaced with checked exceptions.
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java
Lines 1 (patched)
<https://reviews.apache.org/r/68099/#comment290075>
I think this code and the other files in this patch belong in the serde module. Please move the code to serde/src/java/org/apache/hadoop/hive/serde2/teradata
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java
Lines 71 (patched)
<https://reviews.apache.org/r/68099/#comment290076>
This should be a static final class variable, i.e:
static final byte RECORD_END_BYTE = ...
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 74 (patched)
<https://reviews.apache.org/r/68099/#comment290081>
Please avoid unnecessary uses of "this", both in this file and others in the patch.
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 89 (patched)
<https://reviews.apache.org/r/68099/#comment290083>
Change message to "Input file is compressed. Using compression code %s"
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 92 (patched)
<https://reviews.apache.org/r/68099/#comment290082>
Please remove or change message to "The input file is not compressed".
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 116 (patched)
<https://reviews.apache.org/r/68099/#comment290077>
static import String.format() in order to avoid constantly using the "String." prefix.
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 155 (patched)
<https://reviews.apache.org/r/68099/#comment290078>
Magic constants (e.g. "0x0a") should be defined in one place (e.g. TeradataConstants.java) as a static final variable.
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 197 (patched)
<https://reviews.apache.org/r/68099/#comment290079>
Remove
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 211 (patched)
<https://reviews.apache.org/r/68099/#comment290080>
Please make this more readable by replace the ?: operator with equivalent if(){} else {} code.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
Lines 101 (patched)
<https://reviews.apache.org/r/68099/#comment290085>
Unnecessary else clause.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
Lines 122 (patched)
<https://reviews.apache.org/r/68099/#comment290084>
This else clause is unnecessary.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 105 (patched)
<https://reviews.apache.org/r/68099/#comment290086>
This else clause is unnecessary if you explicitly return from the previous block.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 148 (patched)
<https://reviews.apache.org/r/68099/#comment290087>
Consider breaking this into multiple lines for improved readability:
int toWrite = date.get().getYear() * 10000 +
date.get().getMonth() * 100 +
...
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 178 (patched)
<https://reviews.apache.org/r/68099/#comment290089>
Add explicit return and remove unnecessary else clause.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 184 (patched)
<https://reviews.apache.org/r/68099/#comment290088>
Instead of logging this info separately, I think it would make more sense to include this in the exception message.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 191 (patched)
<https://reviews.apache.org/r/68099/#comment290093>
Unnecessary else clause.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 190 (patched)
<https://reviews.apache.org/r/68099/#comment290090>
Is this worth logging? If so, consider changing the log level to DEBUG.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 217 (patched)
<https://reviews.apache.org/r/68099/#comment290092>
remove line
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 363 (patched)
<https://reviews.apache.org/r/68099/#comment290091>
Throwing a SerdeException
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 363 (patched)
<https://reviews.apache.org/r/68099/#comment290094>
Replace RuntimeExceptions with SerdeExceptions.
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 409 (patched)
<https://reviews.apache.org/r/68099/#comment290095>
Combine this into one statement using warn(String message, Throwable ex)
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 578 (patched)
<https://reviews.apache.org/r/68099/#comment290096>
convert these into if statements and return the value instead of savintg it to byteNum
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestGeneralFunctions.java
Lines 31 (patched)
<https://reviews.apache.org/r/68099/#comment290097>
This belongs in the common module since it's testing functionality that is defined in the common module.
- Carl Steinbach
On July 31, 2018, 12:22 a.m., Lu Li wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68099/
> -----------------------------------------------------------
>
> (Updated July 31, 2018, 12:22 a.m.)
>
>
> Review request for hive, Carl Steinbach and Daniel Dai.
>
>
> Bugs: HIVE-20225
> https://issues.apache.org/jira/browse/HIVE-20225
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> When using TPT/BTEQ to export Data from Teradata, Teradata will export binary files based on the schema.
>
> A Customized SerDe is needed in order to directly read these files from Hive.
>
> CREATE EXTERNAL TABLE `TABLE1`(
> ...)
> PARTITIONED BY (
> ...)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
> LOCATION ...;
>
> SELECT * FROM `TABLE1`;
> Problem Statement:
>
> Right now the fast way to export data from Teradata is using TPT. However, the Hive could not directly utilize these exported binary format because it doesn't have a SerDe for these files.
>
> Result:
>
> Provided with the SerDe, Hive can operate upon the exported Teradata Binary Format file transparently.
>
>
> Diffs
> -----
>
> contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java PRE-CREATION
> contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java PRE-CREATION
> contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java PRE-CREATION
> contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java PRE-CREATION
> contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java PRE-CREATION
> contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java PRE-CREATION
> contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestGeneralFunctions.java PRE-CREATION
> contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDate.java PRE-CREATION
> contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDecimal.java PRE-CREATION
> contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForTimeStamp.java PRE-CREATION
> contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeGeneral.java PRE-CREATION
>
>
> Diff: https://reviews.apache.org/r/68099/diff/2/
>
>
> Testing
> -------
>
> Junit tests have been added for Serialization and Deserialization functions
>
>
> Thanks,
>
> Lu Li
>
>