You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Ujjwal Wadhawan <uj...@gmail.com> on 2015/06/01 18:01:44 UTC

Is the HCatRecord iterator ephemeral ? (A usage scenario on Timestamp)

Hi,

I wanted to check a behavior reproducible with timestamp. It can be
summarized as

“When reading from stored HCatRecord iterator, the column value of data
type *timestamp *of a previous row gets reset to 1970-01-01 00:00:00.0
(or locale adjusted epoch time 0)  when the column value in the current row
has *null*.

Columns of other data types in previous row do not get affected by presence
of *null* in its current column value.”

Pls see the mail for details and steps to reproduce.

Regards,
Ujjwal

---------- Forwarded message ----------
From: Ujjwal <uj...@gmail.com>
Date: Fri, May 29, 2015 at 2:20 PM
Subject: Re: only timestamp column value of previous row gets reset
To: user@hive.apache.org


Hi all,

The issue can be reproduced in a simple java program (code attached for
reference/use) where I do not use the iterator right away after reading,
but store it in a vector for later use. As per my understanding, the
iterator should not change once given to the consumer. However the
timestamp datatype object gets reset under one condition explained
earlier.. I have attached the code for reference.


Create a table
---------------------

create table if not exists sample (dtcol date, tscol timestamp, stcol
string) row format delimited fields terminated by ',' stored as textfile;
truncate table sample;



Input data (input)
------------------------

9779-11-21,2014-04-01 11:30:55,abc
9779-11-21,2014-04-04 11:30:55,def
,null,



Load the data
-------------------

hadoop fs -put input /apps/hive/warehouse/sample



Check
---------

hive> select * from sample;

OK

9779-11-21      2014-04-01 11:30:55     abc
9779-11-21      2014-04-04 11:30:55     def
NULL    NULL
Time taken: 0.029 seconds, Fetched: 3 row(s)
hive>



Execute
------------

export CLASSPATH=`hadoop classpath`:`hcat -classpath`
java -classpath SampleHCatReader.jar:$CLASSPATH
org.my.internal.SampleHCatReader



Output having timestamp reset !
------------------------------------------------

HCat record right after reading is  9779-11-21  2014-04-01 11:30:55.0   abc
HCat record right after reading is  9779-11-21  2014-04-04 11:30:55.0   def
HCat record right after reading is  null        null

HCat record later is 9779-11-21 2014-04-01 11:30:55.0   abc
HCat record later is 9779-11-21 1970-01-01 00:00:00.0   def
HCat record later is null       null



As we see above, the output for time-stamp gets reset.

Regards,
Ujjwal W