You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2013/12/20 20:16:12 UTC

[jira] [Commented] (HIVE-6083) User provided table properties are not assigned to the TableDesc of the FileSinkDesc in a CTAS query

    [ https://issues.apache.org/jira/browse/HIVE-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854449#comment-13854449 ] 

Yin Huai commented on HIVE-6083:
--------------------------------

With .1 patch ...
* Snappy compression
{code}
create table web_sales_correct_orc_snappy
stored as orc tblproperties ("orc.compress"="SNAPPY")
as select * from web_sales;
{code}
{code}
describe formatted web_sales_correct_orc_snappy;
....
Location:           	hdfs://localhost:54310/user/hive/warehouse/web_sales_correct_orc_snappy	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	true                
	numFiles            	1                   
	numRows             	719384              
	orc.compress        	SNAPPY              
	rawDataSize         	97815412            
	totalSize           	51042245            
	transient_lastDdlTime	1387566737          
....   
{code}
{code}
bin/hive --orcfiledump /user/hive/warehouse/web_sales_correct_orc_snappy/000000_0
Rows: 719384
Compression: SNAPPY
Compression size: 262144
...
{code}
* No compression
{code}
create table web_sales_correct_orc_none
stored as orc tblproperties ("orc.compress"="NONE")
as select * from web_sales;
{code}
{code}
describe formatted web_sales_correct_orc_none;
....
Location:           	hdfs://localhost:54310/user/hive/warehouse/web_sales_correct_orc_none	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	true                
	numFiles            	1                   
	numRows             	719384              
	orc.compress        	NONE                
	rawDataSize         	97815412            
	totalSize           	53968823            
	transient_lastDdlTime	1387566788     
....   
{code}
{code}
bin/hive --orcfiledump /user/hive/warehouse/web_sales_correct_orc_none/000000_0
Rows: 719384
Compression: NONE
...
{code}

> User provided table properties are not assigned to the TableDesc of the FileSinkDesc in a CTAS query
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6083
>                 URL: https://issues.apache.org/jira/browse/HIVE-6083
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>         Attachments: HIVE-6083.1.patch.txt
>
>
> I was trying to use a CTAS query to create a table stored with ORC and orc.compress was set to SNAPPY. However, the table was still compressed as ZLIB (although the result of DESCRIBE still shows that this table is compressed by SNAPPY). For a CTAS query, SemanticAnalyzer.genFileSinkPlan uses CreateTableDesc to generate the TableDesc for the FileSinkDesc by calling PlanUtils.getTableDesc. However, in PlanUtils.getTableDesc, I do not see user provided table properties are assigned to the returned TableDesc (CreateTableDesc.getTblProps was not called in this method ).  
> btw, I only checked the code of 0.12 and trunk.
> Two examples:
> * Snappy compression
> {code}
> create table web_sales_wrong_orc_snappy
> stored as orc tblproperties ("orc.compress"="SNAPPY")
> as select * from web_sales;
> {code}
> {code}
> describe formatted web_sales_wrong_orc_snappy;
> ....
> Location:           	hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_snappy	 
> Table Type:         	MANAGED_TABLE       	 
> Table Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	true                
> 	numFiles            	1                   
> 	numRows             	719384              
> 	orc.compress        	SNAPPY              
> 	rawDataSize         	97815412            
> 	totalSize           	40625243            
> 	transient_lastDdlTime	1387566015       
> ....   
> {code}
> {code}
> bin/hive --orcfiledump /user/hive/warehouse/web_sales_wrong_orc_snappy/000000_0
> Rows: 719384
> Compression: ZLIB
> Compression size: 262144
> ...
> {code}
> * No compression
> {code}
> create table web_sales_wrong_orc_none
> stored as orc tblproperties ("orc.compress"="NONE")
> as select * from web_sales;
> {code}
> {code}
> describe formatted web_sales_wrong_orc_none;
> ....
> Location:           	hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_none	 
> Table Type:         	MANAGED_TABLE       	 
> Table Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	true                
> 	numFiles            	1                   
> 	numRows             	719384              
> 	orc.compress        	NONE                
> 	rawDataSize         	97815412            
> 	totalSize           	40625243            
> 	transient_lastDdlTime	1387566064       
> ....   
> {code}
> {code}
> bin/hive --orcfiledump /user/hive/warehouse/web_sales_wrong_orc_none/000000_0
> Rows: 719384
> Compression: ZLIB
> Compression size: 262144
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)