You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhan Zhang (JIRA)" <ji...@apache.org> on 2015/10/13 22:17:05 UTC
[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

    [ https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955559#comment-14955559 ] 

Zhan Zhang commented on SPARK-11087:
------------------------------------

no matter whether the table is sorted or not, the predicate pushdown should happen. Need to first add some debug msg on the driver side to make sure it happen.

> spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate
> ---------------------------------------------------------------------
>
>                 Key: SPARK-11087
>                 URL: https://issues.apache.org/jira/browse/SPARK-11087
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>         Environment: orc file version 0.12 with HIVE_8732
> hive version 1.2.1.2.3.0.0-2557
>            Reporter: patcharee
>            Priority: Minor
>
> I have an external hive table stored as partitioned orc file (see the table schema below). I tried to query from the table with where clause>
> hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
> hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 117")). 
> But from the log file with debug logging level on, the ORC pushdown predicate was not generated. 
> Unfortunately my table was not sorted when I inserted the data, but I expected the ORC pushdown predicate should be generated (because of the where clause) though
> Table schema
> ================================
> hive> describe formatted 4D;
> OK
> # col_name            	data_type           	comment             
> 	 	 
> date                	int                 	                    
> hh                  	int                 	                    
> x                   	int                 	                    
> y                   	int                 	                    
> height              	float               	                    
> u                   	float               	                    
> v                   	float               	                    
> w                   	float               	                    
> ph                  	float               	                    
> phb                 	float               	                    
> t                   	float               	                    
> p                   	float               	                    
> pb                  	float               	                    
> qvapor              	float               	                    
> qgraup              	float               	                    
> qnice               	float               	                    
> qnrain              	float               	                    
> tke_pbl             	float               	                    
> el_pbl              	float               	                    
> qcloud              	float               	                    
> 	 	 
> # Partition Information	 	 
> # col_name            	data_type           	comment             
> 	 	 
> zone                	int                 	                    
> z                   	int                 	                    
> year                	int                 	                    
> month               	int                 	                    
> 	 	 
> # Detailed Table Information	 	 
> Database:           	default             	 
> Owner:              	patcharee           	 
> CreateTime:         	Thu Jul 09 16:46:54 CEST 2015	 
> LastAccessTime:     	UNKNOWN             	 
> Protect Mode:       	None                	 
> Retention:          	0                   	 
> Location:           	hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D	 
> Table Type:         	EXTERNAL_TABLE      	 
> Table Parameters:	 	 
> 	EXTERNAL            	TRUE                
> 	comment             	this table is imported from rwf_data/*/wrf/*
> 	last_modified_by    	patcharee           
> 	last_modified_time  	1439806692          
> 	orc.compress        	ZLIB                
> 	transient_lastDdlTime	1439806692          
> 	 	 
> # Storage Information	 	 
> SerDe Library:      	org.apache.hadoop.hive.ql.io.orc.OrcSerde	 
> InputFormat:        	org.apache.hadoop.hive.ql.io.orc.OrcInputFormat	 
> OutputFormat:       	org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat	 
> Compressed:         	No                  	 
> Num Buckets:        	-1                  	 
> Bucket Columns:     	[]                  	 
> Sort Columns:       	[]                  	 
> Storage Desc Params:	 	 
> 	serialization.format	1                   
> Time taken: 0.388 seconds, Fetched: 58 row(s)
> ================================
> Data was inserted into this table by another spark job>
> df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("4D")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org