You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 道玉 <z....@qq.com> on 2017/05/28 13:52:21 UTC

回复：Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error

Hi,


Here're the sql I'm using. Using this 2 statements in beeline will just work fine. But with spring org.springframework.jdbc.datasource.SimpleDriverDataSource, no matter using jdbcTemplate, or datasource.getConnection and createStatement after execute these 2 statements, the second create statement will fail silently.


Even the same create table xxx as , 2nd time will fail, if using spring data.


create table task.task_73 stored as orc as SELECT A1.col_3, A1.col_2, A2.col_1, A2.col_0 FROM ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_3bc5c40101b048c199d5ced4dd72a562 ) A1 FULL OUTER JOIN ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_98fd0cd226214cffa998b8205908c013 ) A2 ON A1.col_3 = A2.col_3


create table task.task_74 stored as orc as SELECT A1.col_3, A1.col_2, A2.col_1, A2.col_0 FROM ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_3bc5c40101b048c199d5ced4dd72a562 ) A1 INNER JOIN ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_98fd0cd226214cffa998b8205908c013 ) A2 ON A1.col_3 = A2.col_3





------------------ 原始邮件 ------------------
发件人: "道玉";<z....@qq.com>;
发送时间: 2017年5月28日(星期天) 晚上9:16
收件人: "user"<us...@spark.apache.org>; 

主题: Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error



Hey guys,

I've post a question here: https://stackoverflow.com/questions/44223024/spark-sql-2-1-0-create-table-xxx-as-select-from-yyy-sometimes-get-error


Thrift sql server, sometimes can't execute "create table as select" statement until restart. When this happens, the spark job/stage gives no fail/error message, just show success with a very short time duration. Jdbc client will get "unCategorized exception", beeline verbose mode get "java.lang.reflect.InvocationTargetException(state=,code=0)"


And only the spark UI /sqlserver/ tab gives some error stack trace.


Please help! I could provide further logs, just told me where to find it!


Thanks!
--
Daoyu

回复：Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error

Posted by 道玉 <z....@qq.com>.

Hi all,


Any ideas, on this error stack trace?


17/05/29 08:44:53 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING, 
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/.hive-staging_hive_2017-05-29_08-44-50_607_2388239917764085229-3/-ext-10000/part-00000 to destination hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/part-00000;
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
	at org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:92)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:231)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/.hive-staging_hive_2017-05-29_08-44-50_607_2388239917764085229-3/-ext-10000/part-00000 to destination hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/part-00000
	at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
	at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2892)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1640)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:728)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:676)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:279)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:226)
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:225)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:268)
	at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:675)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:768)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
	... 40 more
Caused by: java.io.IOException: Filesystem closed
	at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:798)
	at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:2966)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
	at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
	at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
	at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
	... 59 more
17/05/29 08:44:53 ERROR thriftserver.SparkExecuteStatementOperation: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/.hive-staging_hive_2017-05-29_08-44-50_607_2388239917764085229-3/-ext-10000/part-00000 to destination hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/part-00000;
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:266)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)





------------------ 原始邮件 ------------------
发件人: "道玉";<z....@qq.com>;
发送时间: 2017年5月29日(星期一) 上午8:47
收件人: "道玉"<z....@qq.com>; "user"<us...@spark.apache.org>; 

主题: 回复：Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error



Hi all,


After upgrade to spark 2.1.1, now I get more error details. Seems like it is a hdfs permission error. Any ideas? Why the first time it will work, but then same statement will result in unable to move source?


Error: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/.hive-staging_hive_2017-05-29_08-44-50_607_2388239917764085229-3/-ext-10000/part-00000 to destination hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/part-00000; (state=,code=0)




------------------ 原始邮件 ------------------
发件人: "道玉";<z....@qq.com>;
发送时间: 2017年5月28日(星期天) 晚上9:52
收件人: "道玉"<z....@qq.com>; "user"<us...@spark.apache.org>; 

主题: 回复：Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error



Hi,


Here're the sql I'm using. Using this 2 statements in beeline will just work fine. But with spring org.springframework.jdbc.datasource.SimpleDriverDataSource, no matter using jdbcTemplate, or datasource.getConnection and createStatement after execute these 2 statements, the second create statement will fail silently.


Even the same create table xxx as , 2nd time will fail, if using spring data.


create table task.task_73 stored as orc as SELECT A1.col_3, A1.col_2, A2.col_1, A2.col_0 FROM ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_3bc5c40101b048c199d5ced4dd72a562 ) A1 FULL OUTER JOIN ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_98fd0cd226214cffa998b8205908c013 ) A2 ON A1.col_3 = A2.col_3


create table task.task_74 stored as orc as SELECT A1.col_3, A1.col_2, A2.col_1, A2.col_0 FROM ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_3bc5c40101b048c199d5ced4dd72a562 ) A1 INNER JOIN ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_98fd0cd226214cffa998b8205908c013 ) A2 ON A1.col_3 = A2.col_3





------------------ 原始邮件 ------------------
发件人: "道玉";<z....@qq.com>;
发送时间: 2017年5月28日(星期天) 晚上9:16
收件人: "user"<us...@spark.apache.org>; 

主题: Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error



Hey guys,

I've post a question here: https://stackoverflow.com/questions/44223024/spark-sql-2-1-0-create-table-xxx-as-select-from-yyy-sometimes-get-error


Thrift sql server, sometimes can't execute "create table as select" statement until restart. When this happens, the spark job/stage gives no fail/error message, just show success with a very short time duration. Jdbc client will get "unCategorized exception", beeline verbose mode get "java.lang.reflect.InvocationTargetException(state=,code=0)"


And only the spark UI /sqlserver/ tab gives some error stack trace.


Please help! I could provide further logs, just told me where to find it!


Thanks!
--
Daoyu

回复：Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error

Posted by 道玉 <z....@qq.com>.

Hi all,


After upgrade to spark 2.1.1, now I get more error details. Seems like it is a hdfs permission error. Any ideas? Why the first time it will work, but then same statement will result in unable to move source?


Error: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/.hive-staging_hive_2017-05-29_08-44-50_607_2388239917764085229-3/-ext-10000/part-00000 to destination hdfs://jzf-01:9000/user/hive/warehouse/task.db/task_107/part-00000; (state=,code=0)




------------------ 原始邮件 ------------------
发件人: "道玉";<z....@qq.com>;
发送时间: 2017年5月28日(星期天) 晚上9:52
收件人: "道玉"<z....@qq.com>; "user"<us...@spark.apache.org>; 

主题: 回复：Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error



Hi,


Here're the sql I'm using. Using this 2 statements in beeline will just work fine. But with spring org.springframework.jdbc.datasource.SimpleDriverDataSource, no matter using jdbcTemplate, or datasource.getConnection and createStatement after execute these 2 statements, the second create statement will fail silently.


Even the same create table xxx as , 2nd time will fail, if using spring data.


create table task.task_73 stored as orc as SELECT A1.col_3, A1.col_2, A2.col_1, A2.col_0 FROM ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_3bc5c40101b048c199d5ced4dd72a562 ) A1 FULL OUTER JOIN ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_98fd0cd226214cffa998b8205908c013 ) A2 ON A1.col_3 = A2.col_3


create table task.task_74 stored as orc as SELECT A1.col_3, A1.col_2, A2.col_1, A2.col_0 FROM ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_3bc5c40101b048c199d5ced4dd72a562 ) A1 INNER JOIN ( SELECT col_3, col_2, col_1, col_0 FROM resource.res_98fd0cd226214cffa998b8205908c013 ) A2 ON A1.col_3 = A2.col_3





------------------ 原始邮件 ------------------
发件人: "道玉";<z....@qq.com>;
发送时间: 2017年5月28日(星期天) 晚上9:16
收件人: "user"<us...@spark.apache.org>; 

主题: Spark sql 2.1.0 thrift jdbc server - create table xxx as select * from yyy sometimes get error



Hey guys,

I've post a question here: https://stackoverflow.com/questions/44223024/spark-sql-2-1-0-create-table-xxx-as-select-from-yyy-sometimes-get-error


Thrift sql server, sometimes can't execute "create table as select" statement until restart. When this happens, the spark job/stage gives no fail/error message, just show success with a very short time duration. Jdbc client will get "unCategorized exception", beeline verbose mode get "java.lang.reflect.InvocationTargetException(state=,code=0)"


And only the spark UI /sqlserver/ tab gives some error stack trace.


Please help! I could provide further logs, just told me where to find it!


Thanks!
--
Daoyu