You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jitendra Nath Pandey (JIRA)" <ji...@apache.org> on 2014/03/03 06:37:20 UTC

[jira] [Updated] (HIVE-6511) casting from decimal to tinyint,smallint, int and bigint generates different result when vectorization is on

     [ https://issues.apache.org/jira/browse/HIVE-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jitendra Nath Pandey updated HIVE-6511:
---------------------------------------

    Attachment: HIVE-6511.1.patch

The longValue function in Decimal128 rounds the value. HiveDecimal just discards the fractional part. This patch adds another method to Decimal128, that discards the fractional part, and is used in the CastDecimalToLong expression.

> casting from decimal to tinyint,smallint, int and bigint generates different result when vectorization is on
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6511
>                 URL: https://issues.apache.org/jira/browse/HIVE-6511
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: HIVE-6511.1.patch
>
>
> select dc,cast(dc as int), cast(dc as smallint),cast(dc as tinyint) from vectortab10korc limit 20 generates following result when vectorization is enabled:
> {code}
> 4619756289662.078125	-1628520834	-16770	126
> 1553532646710.316406	-1245514442	-2762	54
> 3367942487288.360352	688127224	-776	-8
> 4386447830839.337891	1286221623	12087	55
> -3234165331139.458008	-54957251	27453	61
> -488378613475.326172	1247658269	-16099	29
> -493942492598.691406	-21253559	-19895	73
> 3101852523586.039062	886135874	23618	66
> 2544105595941.381836	1484956709	-23515	37
> -3997512403067.0625	1102149509	30597	-123
> -1183754978977.589355	1655994718	31070	94
> 1408783849655.676758	34576568	-26440	-72
> -2993175106993.426758	417098319	27215	79
> 3004723551798.100586	-1753555402	-8650	54
> 1103792083527.786133	-14511544	-28088	72
> 469767055288.485352	1615620024	26552	-72
> -1263700791098.294434	-980406074	12486	-58
> -4244889766496.484375	-1462078048	30112	-96
> -3962729491139.782715	1525323068	-27332	60
> NULL	NULL	NULL	NULL
> {code}
> When vectorization is disabled, result looks like this:
> {code}
> 4619756289662.078125	-1628520834	-16770	126
> 1553532646710.316406	-1245514442	-2762	54
> 3367942487288.360352	688127224	-776	-8
> 4386447830839.337891	1286221623	12087	55
> -3234165331139.458008	-54957251	27453	61
> -488378613475.326172	1247658269	-16099	29
> -493942492598.691406	-21253558	-19894	74
> 3101852523586.039062	886135874	23618	66
> 2544105595941.381836	1484956709	-23515	37
> -3997512403067.0625	1102149509	30597	-123
> -1183754978977.589355	1655994719	31071	95
> 1408783849655.676758	34576567	-26441	-73
> -2993175106993.426758	417098319	27215	79
> 3004723551798.100586	-1753555402	-8650	54
> 1103792083527.786133	-14511545	-28089	71
> 469767055288.485352	1615620024	26552	-72
> -1263700791098.294434	-980406074	12486	-58
> -4244889766496.484375	-1462078048	30112	-96
> -3962729491139.782715	1525323069	-27331	61
> NULL	NULL	NULL	NULL
> {code}
> This issue is visible only for certain decimal values. In above example, row 7,11,12, and 15 generates different results.
> vectortab10korc table schema:
> {code}
> t                   	tinyint             	from deserializer   
> si                  	smallint            	from deserializer   
> i                   	int                 	from deserializer   
> b                   	bigint              	from deserializer   
> f                   	float               	from deserializer   
> d                   	double              	from deserializer   
> dc                  	decimal(38,18)      	from deserializer   
> bo                  	boolean             	from deserializer   
> s                   	string              	from deserializer   
> s2                  	string              	from deserializer   
> ts                  	timestamp           	from deserializer   
> 	 	 
> # Detailed Table Information	 	 
> Database:           	default             	 
> Owner:              	xyz              	 
> CreateTime:         	Tue Feb 25 21:54:28 UTC 2014	 
> LastAccessTime:     	UNKNOWN             	 
> Protect Mode:       	None                	 
> Retention:          	0                   	 
> Location:           	hdfs://host1.domain.com:8020/apps/hive/warehouse/vectortab10korc	 
> Table Type:         	MANAGED_TABLE       	 
> Table Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	true                
> 	numFiles            	1                   
> 	numRows             	10000               
> 	rawDataSize         	0                   
> 	totalSize           	344748              
> 	transient_lastDdlTime	1393365281          
> 	 	 
> # Storage Information	 	 
> SerDe Library:      	org.apache.hadoop.hive.ql.io.orc.OrcSerde	 
> InputFormat:        	org.apache.hadoop.hive.ql.io.orc.OrcInputFormat	 
> OutputFormat:       	org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat	 
> Compressed:         	No                  	 
> Num Buckets:        	-1                  	 
> Bucket Columns:     	[]                  	 
> Sort Columns:       	[]                  	 
> Storage Desc Params:	 	 
> 	serialization.format	1                   
> Time taken: 0.196 seconds, Fetched: 41 row(s
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)