You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by linxi zeng <li...@gmail.com> on 2016/05/15 03:01:50 UTC

spark sql write orc table on viewFS throws exception

hi, all:
Recently, we have encountered a problem while using spark sql to write orc
table, which is related to https://issues.apache.org/jira/browse/HIVE-10790.
In order to fix this problem we decided to patched the PR to the hive
branch which spark1.5 rely on.
We pull the hive branch(
https://github.com/pwendell/hive/tree/release-1.2.1-spark) and compile it
with cmd: mvn clean package -Phadoop-2,dist -DskipTests, and then upload to
the nexus without any problem.
<https://github.com/pwendell/hive/tree/release-1.2.1-spark>
But when we compile spark with hive (group: org.spark-project.hive,
version: 1.2.1.spark) using cmd: ./make-distribution.sh --tgz -Phive
-Phive-thriftserver -Psparkr -Pyarn -Dhadoop.version=2.4.1
-Dprotobuf.version=2.5.0 -DskipTests
we get this error msg:

[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @
spark-hive_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO][INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @
spark-hive_2.10 ---
[INFO] Using zinc server for incremental compilation
[info] Compiling 27 Scala sources and 1 Java source to
/home/sankuai/zenglinxi/spark/sql/hive/target/scala-2.10/classes...
[warn] Class org.apache.hadoop.hive.shims.HadoopShims not found -
continuing with a stub.
[error]
/home/sankuai/zenglinxi/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala:35:
object shims is not a member of package org.apache.hadoop.hive
[error] import org.apache.hadoop.hive.shims.{HadoopShims, ShimLoader}
[error]                               ^
[error]
/home/sankuai/zenglinxi/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala:114:
not found: value ShimLoader
[error]     val loadedShimsClassName =
ShimLoader.getHadoopShims.getClass.getCanonicalName
[error]                                ^
[error]
/home/sankuai/zenglinxi/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala:123:
not found: type ShimLoader
[error]       val shimsField =
classOf[ShimLoader].getDeclaredField("hadoopShims")
[error]                                ^
[error]
/home/sankuai/zenglinxi/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala:127:
not found: type HadoopShims
[error]       val shims =
classOf[HadoopShims].cast(shimsClass.newInstance())
[error]                           ^
[warn] Class org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge not found
- continuing with a stub.
[warn] Class org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge not found
- continuing with a stub.
[warn] Class org.apache.hadoop.hive.shims.HadoopShims not found -
continuing with a stub.
[warn] four warnings found
[error] four errors found
[error] Compile failed at 2016-5-13 16:34:44 [4.348s]
[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [
 3.105 s]
[INFO] Spark Project Launcher ............................. SUCCESS [
 8.360 s]
[INFO] Spark Project Networking ........................... SUCCESS [
 8.491 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
 5.110 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [
 6.854 s]
[INFO] Spark Project Core ................................. SUCCESS [02:33
min]
[INFO] Spark Project Bagel ................................ SUCCESS [
 5.183 s]
[INFO] Spark Project GraphX ............................... SUCCESS [
15.744 s]
[INFO] Spark Project Streaming ............................ SUCCESS [
39.070 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [
57.416 s]
[INFO] Spark Project SQL .................................. SUCCESS [01:11
min]
[INFO] Spark Project ML Library ........................... SUCCESS [01:28
min]
[INFO] Spark Project Tools ................................ SUCCESS [
 2.539 s]
[INFO] Spark Project Hive ................................. FAILURE [
13.273 s]
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project YARN ................................. SKIPPED

This error information does throw doubt on
https://issues.apache.org/jira/browse/HIVE-10790.
Any one meet the same question with me?

Re: spark sql write orc table on viewFS throws exception

Posted by Mich Talebzadeh <mi...@gmail.com>.

I am not sure this is going to resolve INSERT OVEERWRITE into ORC table
issue. Can you go to hive and do

show create table custom.rank_less_orc_none

and send the output.

Is that table defined as transactional?

Other alternative is to use Spark to insert into a normal text table and do
insert from the text table into ORC using HiveContext or doing it purely in
Hive


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 15 May 2016 at 04:01, linxi zeng <li...@gmail.com> wrote:

> hi, all:
> Recently, we have encountered a problem while using spark sql to write orc
> table, which is related to
> https://issues.apache.org/jira/browse/HIVE-10790.
> In order to fix this problem we decided to patched the PR to the hive
> branch which spark1.5 rely on.
> We pull the hive branch(
> https://github.com/pwendell/hive/tree/release-1.2.1-spark) and compile it
> with cmd: mvn clean package -Phadoop-2,dist -DskipTests, and then upload to
> the nexus without any problem.
> <https://github.com/pwendell/hive/tree/release-1.2.1-spark>
> But when we compile spark with hive (group: org.spark-project.hive,
> version: 1.2.1.spark) using cmd: ./make-distribution.sh --tgz -Phive
> -Phive-thriftserver -Psparkr -Pyarn -Dhadoop.version=2.4.1
> -Dprotobuf.version=2.5.0 -DskipTests
> we get this error msg:
>
> [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @
> spark-hive_2.10 ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] Copying 1 resource
> [INFO] Copying 3 resources
> [INFO][INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @
> spark-hive_2.10 ---
> [INFO] Using zinc server for incremental compilation
> [info] Compiling 27 Scala sources and 1 Java source to
> /home/sankuai/zenglinxi/spark/sql/hive/target/scala-2.10/classes...
> [warn] Class org.apache.hadoop.hive.shims.HadoopShims not found -
> continuing with a stub.
> [error]
> /home/sankuai/zenglinxi/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala:35:
> object shims is not a member of package org.apache.hadoop.hive
> [error] import org.apache.hadoop.hive.shims.{HadoopShims, ShimLoader}
> [error]                               ^
> [error]
> /home/sankuai/zenglinxi/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala:114:
> not found: value ShimLoader
> [error]     val loadedShimsClassName =
> ShimLoader.getHadoopShims.getClass.getCanonicalName
> [error]                                ^
> [error]
> /home/sankuai/zenglinxi/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala:123:
> not found: type ShimLoader
> [error]       val shimsField =
> classOf[ShimLoader].getDeclaredField("hadoopShims")
> [error]                                ^
> [error]
> /home/sankuai/zenglinxi/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala:127:
> not found: type HadoopShims
> [error]       val shims =
> classOf[HadoopShims].cast(shimsClass.newInstance())
> [error]                           ^
> [warn] Class org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge not
> found - continuing with a stub.
> [warn] Class org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge not
> found - continuing with a stub.
> [warn] Class org.apache.hadoop.hive.shims.HadoopShims not found -
> continuing with a stub.
> [warn] four warnings found
> [error] four errors found
> [error] Compile failed at 2016-5-13 16:34:44 [4.348s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
>  3.105 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
>  8.360 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
>  8.491 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
>  5.110 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
>  6.854 s]
> [INFO] Spark Project Core ................................. SUCCESS [02:33
> min]
> [INFO] Spark Project Bagel ................................ SUCCESS [
>  5.183 s]
> [INFO] Spark Project GraphX ............................... SUCCESS [
> 15.744 s]
> [INFO] Spark Project Streaming ............................ SUCCESS [
> 39.070 s]
> [INFO] Spark Project Catalyst ............................. SUCCESS [
> 57.416 s]
> [INFO] Spark Project SQL .................................. SUCCESS [01:11
> min]
> [INFO] Spark Project ML Library ........................... SUCCESS [01:28
> min]
> [INFO] Spark Project Tools ................................ SUCCESS [
>  2.539 s]
> [INFO] Spark Project Hive ................................. FAILURE [
> 13.273 s]
> [INFO] Spark Project REPL ................................. SKIPPED
> [INFO] Spark Project YARN ................................. SKIPPED
>
> This error information does throw doubt on
> https://issues.apache.org/jira/browse/HIVE-10790.
> Any one meet the same question with me?
>