You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/01/26 23:05:00 UTC
[jira] [Commented] (SPARK-26663) Cannot query a Hive table with subdirectories

    [ https://issues.apache.org/jira/browse/SPARK-26663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753216#comment-16753216 ] 

Dongjoon Hyun commented on SPARK-26663:
---------------------------------------

Hi, [~pomptuintje]. Thank you for reporting. Actually, the given example has incorrect syntax like `creat table`. It would be better if you reports with the script what you used. I do the following but I cannot reproduce the issue.
{code:java}
Logging initialized using configuration in jar:file:/Users/dongjoon/APACHE/hive-release/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive> create table a(id int);
OK
Time taken: 1.299 seconds
hive> create table b(id int);
OK
Time taken: 0.046 seconds
hive> insert into a values(1);
Query ID = dongjoon_20190126145804_2c0252e1-d07c-4213-a387-90efe26d450b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-01-26 14:58:06,272 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local2005651311_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: file:/user/hive/warehouse/a/.hive-staging_hive_2019-01-26_14-58-04_030_4426868381325183205-1/-ext-10000
Loading data to table default.a
Table default.a stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1]
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.436 seconds
hive> insert into b values(1);
Query ID = dongjoon_20190126145810_034d9c36-0f23-42a6-ac0a-681839335bd6
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-01-26 14:58:11,941 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local966105199_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: file:/user/hive/warehouse/b/.hive-staging_hive_2019-01-26_14-58-10_554_693159949912597124-1/-ext-10000
Loading data to table default.b
Table default.b stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1]
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.551 seconds
hive> create table c as select id from a union all select id from b;
Query ID = dongjoon_20190126145831_c2b31651-c88b-47ab-9081-2375cf064b15
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-01-26 14:58:33,130 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1725928125_0003
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-01-26 14:58:34,449 Stage-3 map = 100%,  reduce = 0%
Ended Job = job_local1246940820_0004
Moving data to: file:/user/hive/warehouse/c
Table default.c stats: [numFiles=1, numRows=2, totalSize=4, rawDataSize=2]
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Stage-Stage-3:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.806 seconds
{code}
{code:java}
19/01/26 14:58:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context available as 'sc' (master = local[*], app id = local-1548543527800).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sql("select * from c").show
19/01/26 14:58:57 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
+---+
| id|
+---+
|  1|
|  1|
+---+
{code}

> Cannot query a Hive table with subdirectories
> ---------------------------------------------
>
>                 Key: SPARK-26663
>                 URL: https://issues.apache.org/jira/browse/SPARK-26663
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Aäron
>            Priority: Major
>
> Hello,
>  
> I want to report the following issue (my first one :) )
> When I create a table in Hive based on a union all then Spark 2.4 is unable to query this table.
> To reproduce:
> *Hive 1.2.1*
> {code:java}
> hive> creat table a(id int);
> insert into a values(1);
> hive> creat table b(id int);
> insert into b values(2);
> hive> create table c(id int) as select id from a union all select id from b;
> {code}
>  
> *Spark 2.3.1*
>  
> {code:java}
> scala> spark.table("c").show
> +---+
> | id|
> +---+
> | 1|
> | 2|
> +---+
> scala> spark.table("c").count
> res5: Long = 2
>  {code}
>  
> *Spark 2.4.0*
> {code:java}
> scala> spark.table("c").show
> 19/01/18 17:00:49 WARN HiveMetastoreCatalog: Unable to infer schema for table perftest_be.c from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.
> +---+
> | id|
> +---+
> +---+
> scala> spark.table("c").count
> res3: Long = 0
> {code}
> I did not find an existing issue for this.  Might be important to investigate.
>  
> +Extra info:+ Spark 2.3.1 and 2.4.0 use the same spark-defaults.conf.
>  
> Kind regards.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org