You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by linxi zeng <li...@gmail.com> on 2016/06/27 15:30:44 UTC
run spark sql with script transformation faild

Hi, all:
    Recently, we are trying to compare with spark sql and hive on MR, and I
have tried to run spark (spark1.6 rc2) sql with script transformation, the
spark job faild and get an error message like:

16/06/26 11:01:28 INFO codegen.GenerateUnsafeProjection: Code
generated in 19.054534 ms

16/06/26 11:01:28 ERROR execution.ScriptTransformationWriterThread:
/bin/bash: test.py: command not found



16/06/26 11:01:28 ERROR util.Utils: Uncaught exception in thread
Thread-ScriptTransformation-Feed

java.io.IOException: Stream closed

	at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)

	at java.io.OutputStream.write(OutputStream.java:116)

	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)

	at java.io.DataOutputStream.write(DataOutputStream.java:107)

	at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:277)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:255)

	at scala.collection.Iterator$class.foreach(Iterator.scala:727)

	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:255)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244)

	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1801)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:244)

16/06/26 11:01:28 ERROR util.SparkUncaughtExceptionHandler: Uncaught
exception in thread Thread[Thread-ScriptTransformation-Feed,5,main]

java.io.IOException: Stream closed

	at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)

	at java.io.OutputStream.write(OutputStream.java:116)

	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)

	at java.io.DataOutputStream.write(DataOutputStream.java:107)

	at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:277)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:255)

	at scala.collection.Iterator$class.foreach(Iterator.scala:727)

	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:255)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:244)

	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1801)

	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:244)


cmd is:

> spark-1.6/bin/spark-sql -f transform.sql


the sql and python script is:
transform.sql (which was executed successfully on hive) :

> add file /tmp/spark_sql_test/test.py;
> select transform(cityname) using 'test.py' as (new_cityname) from
> test.spark2_orc where dt='20160622' limit 5 ;

test.py:

> #!/usr/bin/env python
> #coding=utf-8
> import sys
> import string
> reload(sys)
> sys.setdefaultencoding('utf8')
> for line in sys.stdin:
>     cityname = line.strip("\n").split("\t")[0]
>     lt = []
>     lt.append(cityname + "_zlx")
>     print "\t".join(lt)


And after making two modifications:
(1) chmod +x test.py
(2) transform.sql：using 'test.py'  ->  using './test.py'
the sql executed successfully.
I was wonder that if the spark sql with script transformation should be run
like this way? Any one meet the same problem?