You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/07/27 06:30:00 UTC
[jira] [Resolved] (SPARK-28523) Cant Select from a partitioned table using the Hive Warehouse Connector

     [ https://issues.apache.org/jira/browse/SPARK-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-28523.
----------------------------------
    Resolution: Invalid

This looks an issue in HWC

> Cant Select from a partitioned table using the Hive Warehouse Connector
> -----------------------------------------------------------------------
>
>                 Key: SPARK-28523
>                 URL: https://issues.apache.org/jira/browse/SPARK-28523
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Varun Rao
>            Priority: Major
>
> I'm having an issue SELECTING from a partition in a partitioned table using the Hive Warehouse Connector. Spark SQL is able to execute the SELECT statement without any problems. 
>  
> As a side note I dont see issues for the Hive Warehouse Connector on Spark's Jira. Can someone point me to this?
>  
> Below is the code that I ran to confirm this issue.
>  
>  
> pyspark --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.21-1.jar --py-files /usr/hdp/current/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.21-1.zip --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://c372-node2.squadron-labs.com:2181,c372-node3.squadron-labs.com:2181,c372-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive" --conf spark.hadoop.hive.llap.daemon.service.hosts=@llap0 --conf spark.hadoop.hive.zookeeper.quorum="c372-node2.squadron-labs.com:2181;c372-node3.squadron-labs.com:2181;c372-node4.squadron-labs.com:2181" 
>  
> from pyspark_llap import HiveWarehouseSession
> hive = HiveWarehouseSession.session(spark).build()
> query = "CREATE TABLE test_table_2 (k BIGINT, s SMALLINT) PARTITIONED BY (d STRING)" 
> spark.sql(query)
> hive.executeUpdate(query)
> query2 = "INSERT OVERWRITE TABLE test_table_2 PARTITION(d='partition_string') VALUES (666, 42)"
> spark.sql(query2)
> hive.executeUpdate(query2)
> query3 = "SELECT * FROM test_table_2 WHERE d='partition_string'"
> spark.sql(query3).show()
> hive.executeQuery(query3).show()
>  
> RuntimeException: java.io.IOException: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
> at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:66)
> at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD.compute(DataSourceRDD.scala:42)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
> at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.ensureInstancesCache(ZkRegistryBase.java:619)
> at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:388)
> at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:55)
> at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:140)
> at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:136)
> at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstanceForHost(LlapBaseInputFormat.java:391)
> at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:373)
> at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:156)
> at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.getRecordReader(HiveWarehouseDataReader.java:71)
> at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.<init>(HiveWarehouseDataReader.java:49)
> at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.getDataReader(HiveWarehouseDataReaderFactory.java:72)
> at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:64)
> ... 18 more
> Caused by: shadecurator.org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
> at shadecurator.org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225)
> at shadecurator.org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
> at shadecurator.org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117)
> at shadecurator.org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:489)
> at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:199)
> at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:193)
> at shadecurator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
> at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:190)
> at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:175)
> at shadecurator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32)
> at shadecurator.org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:194)
> at shadecurator.org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:61)
> at shadecurator.org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53)
> at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:576)
> at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:326)
> at shadecurator.org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:303)
> at org.apache.hadoop.hive.registry.impl.ZkRegistryBase.ensureInstancesCache(ZkRegistryBase.java:597)
> ... 29 more



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org