You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Venki Korukanti <ve...@gmail.com> on 2013/10/30 23:05:35 UTC

Review Request 15105: HIVE-3844: Unix timestamps don't seem to be read correctly from HDFS as Timestamp column

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15105/
-----------------------------------------------------------

Review request for hive and Mark Grover.


Bugs: HIVE-3844
    https://issues.apache.org/jira/browse/HIVE-3844


Repository: hive-git


Description
-------

Currently LazyTimestamp can only interpret timestamps in JDBC format. If there are UNIX epoch styles timestamps it treats them as invalid and output NULL. This patch is to support all three types of formats mentioned here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps.

BigDecimal is used to avoid rounding off errors with atoi/atof type of conversions. 


Diffs
-----

  data/files/timestamp_data.txt PRE-CREATION 
  ql/src/test/queries/clientpositive/timestamp_4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/timestamp_null.q efd5bc4 
  ql/src/test/results/clientpositive/timestamp_4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/timestamp_null.q.out d21b880 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 27895c5 

Diff: https://reviews.apache.org/r/15105/diff/


Testing
-------

Fix includes a unittest which reads from a file that has most types of timestamp formats.


Thanks,

Venki Korukanti


Re: Review Request 15105: HIVE-3844: Unix timestamps don't seem to be read correctly from HDFS as Timestamp column

Posted by Venki Korukanti <ve...@gmail.com>.

> On Nov. 14, 2013, 1:44 a.m., Thejas Nair wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java, line 89
> > <https://reviews.apache.org/r/15105/diff/2/?file=374513#file374513line89>
> >
> >     Catching an exception and converting it will be extremely slow.
> >

Changed to use Pattern/Matcher to detect the timestamp format.


- Venki


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15105/#review28853
-----------------------------------------------------------


On Feb. 6, 2014, 10:37 p.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15105/
> -----------------------------------------------------------
> 
> (Updated Feb. 6, 2014, 10:37 p.m.)
> 
> 
> Review request for hive and Mark Grover.
> 
> 
> Bugs: HIVE-3844
>     https://issues.apache.org/jira/browse/HIVE-3844
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently LazyTimestamp can only interpret timestamps in JDBC format. If there are UNIX epoch styles timestamps it treats them as invalid and output NULL. This patch is to support all three types of formats mentioned here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps.
> 
> BigDecimal is used to avoid rounding off errors with atoi/atof type of conversions. 
> 
> 
> Diffs
> -----
> 
>   data/files/timestamp_data.txt PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamp_4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamp_null.q 36f3541 
>   ql/src/test/results/clientpositive/timestamp_4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/timestamp_null.q.out 57269d7 
>   serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 435d6c6 
>   serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 27895c5 
> 
> Diff: https://reviews.apache.org/r/15105/diff/
> 
> 
> Testing
> -------
> 
> Fix includes a unittest which reads from a file that has most types of timestamp formats.
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>


Re: Review Request 15105: HIVE-3844: Unix timestamps don't seem to be read correctly from HDFS as Timestamp column

Posted by Thejas Nair <th...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15105/#review28853
-----------------------------------------------------------



serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java
<https://reviews.apache.org/r/15105/#comment55898>

    Catching an exception and converting it will be extremely slow.
    


- Thejas Nair


On Oct. 30, 2013, 10:05 p.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15105/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2013, 10:05 p.m.)
> 
> 
> Review request for hive and Mark Grover.
> 
> 
> Bugs: HIVE-3844
>     https://issues.apache.org/jira/browse/HIVE-3844
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently LazyTimestamp can only interpret timestamps in JDBC format. If there are UNIX epoch styles timestamps it treats them as invalid and output NULL. This patch is to support all three types of formats mentioned here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps.
> 
> BigDecimal is used to avoid rounding off errors with atoi/atof type of conversions. 
> 
> 
> Diffs
> -----
> 
>   data/files/timestamp_data.txt PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamp_4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamp_null.q efd5bc4 
>   ql/src/test/results/clientpositive/timestamp_4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/timestamp_null.q.out d21b880 
>   serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 27895c5 
> 
> Diff: https://reviews.apache.org/r/15105/diff/
> 
> 
> Testing
> -------
> 
> Fix includes a unittest which reads from a file that has most types of timestamp formats.
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>


Re: Review Request 15105: HIVE-3844: Unix timestamps don't seem to be read correctly from HDFS as Timestamp column

Posted by Venki Korukanti <ve...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15105/
-----------------------------------------------------------

(Updated Feb. 6, 2014, 10:37 p.m.)


Review request for hive and Mark Grover.


Changes
-------

Addressed review comments


Bugs: HIVE-3844
    https://issues.apache.org/jira/browse/HIVE-3844


Repository: hive-git


Description
-------

Currently LazyTimestamp can only interpret timestamps in JDBC format. If there are UNIX epoch styles timestamps it treats them as invalid and output NULL. This patch is to support all three types of formats mentioned here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps.

BigDecimal is used to avoid rounding off errors with atoi/atof type of conversions. 


Diffs (updated)
-----

  data/files/timestamp_data.txt PRE-CREATION 
  ql/src/test/queries/clientpositive/timestamp_4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/timestamp_null.q 36f3541 
  ql/src/test/results/clientpositive/timestamp_4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/timestamp_null.q.out 57269d7 
  serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 435d6c6 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 27895c5 

Diff: https://reviews.apache.org/r/15105/diff/


Testing
-------

Fix includes a unittest which reads from a file that has most types of timestamp formats.


Thanks,

Venki Korukanti


Re: Review Request 15105: HIVE-3844: Unix timestamps don't seem to be read correctly from HDFS as Timestamp column

Posted by Venki Korukanti <ve...@gmail.com>.

> On Nov. 14, 2013, 9:28 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java, line 92
> > <https://reviews.apache.org/r/15105/diff/2/?file=374513#file374513line92>
> >
> >     Maybe consider using TimestampWritable.decimalToTimestamp() here. Looks like this is what the cast operator ends up using.

Updated the patch to use existing code in TimestampWritable


> On Nov. 14, 2013, 9:28 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java, line 96
> > <https://reviews.apache.org/r/15105/diff/2/?file=374513#file374513line96>
> >
> >     If you are allowing numeric value to timestamp conversion, not sure if it needs to be so strict that it returns null because of the nanos. Looks like TimestampWritable.decimalToTimestamp() behavior isn't so strict.

As the new patch is using TimestampWritable, this is not a problem anymore as the decimalToTimestamp is not so strict here.


- Venki


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15105/#review28864
-----------------------------------------------------------


On Feb. 6, 2014, 10:37 p.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15105/
> -----------------------------------------------------------
> 
> (Updated Feb. 6, 2014, 10:37 p.m.)
> 
> 
> Review request for hive and Mark Grover.
> 
> 
> Bugs: HIVE-3844
>     https://issues.apache.org/jira/browse/HIVE-3844
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently LazyTimestamp can only interpret timestamps in JDBC format. If there are UNIX epoch styles timestamps it treats them as invalid and output NULL. This patch is to support all three types of formats mentioned here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps.
> 
> BigDecimal is used to avoid rounding off errors with atoi/atof type of conversions. 
> 
> 
> Diffs
> -----
> 
>   data/files/timestamp_data.txt PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamp_4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamp_null.q 36f3541 
>   ql/src/test/results/clientpositive/timestamp_4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/timestamp_null.q.out 57269d7 
>   serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java 435d6c6 
>   serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 27895c5 
> 
> Diff: https://reviews.apache.org/r/15105/diff/
> 
> 
> Testing
> -------
> 
> Fix includes a unittest which reads from a file that has most types of timestamp formats.
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>


Re: Review Request 15105: HIVE-3844: Unix timestamps don't seem to be read correctly from HDFS as Timestamp column

Posted by Jason Dere <jd...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15105/#review28864
-----------------------------------------------------------



serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java
<https://reviews.apache.org/r/15105/#comment55925>

    Maybe consider using TimestampWritable.decimalToTimestamp() here. Looks like this is what the cast operator ends up using.



serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java
<https://reviews.apache.org/r/15105/#comment55926>

    If you are allowing numeric value to timestamp conversion, not sure if it needs to be so strict that it returns null because of the nanos. Looks like TimestampWritable.decimalToTimestamp() behavior isn't so strict.


- Jason Dere


On Oct. 30, 2013, 10:05 p.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15105/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2013, 10:05 p.m.)
> 
> 
> Review request for hive and Mark Grover.
> 
> 
> Bugs: HIVE-3844
>     https://issues.apache.org/jira/browse/HIVE-3844
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently LazyTimestamp can only interpret timestamps in JDBC format. If there are UNIX epoch styles timestamps it treats them as invalid and output NULL. This patch is to support all three types of formats mentioned here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps.
> 
> BigDecimal is used to avoid rounding off errors with atoi/atof type of conversions. 
> 
> 
> Diffs
> -----
> 
>   data/files/timestamp_data.txt PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamp_4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/timestamp_null.q efd5bc4 
>   ql/src/test/results/clientpositive/timestamp_4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/timestamp_null.q.out d21b880 
>   serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyTimestamp.java 27895c5 
> 
> Diff: https://reviews.apache.org/r/15105/diff/
> 
> 
> Testing
> -------
> 
> Fix includes a unittest which reads from a file that has most types of timestamp formats.
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>