You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by selvaraj periyasamy <se...@gmail.com> on 2020/04/21 23:34:46 UTC
Table Read fails in Spark Submit , Where as succeeds in spark-shell
Folks,
I am using Apache Hudi 0.5.0. Our hadoop cluster is miix of spark version
2.3.0, Scala version 2.11.8 & Hive version 1.2.2.
There are multiple use cases already working in Hudi.
I need to read one of sequence table, which is continuously inserted on new
partition by other process using Hive, not by Hudi. And then write this
DataFrame into another COW table using Hudi.
When I use spark.sql in spark-shell, which was started with Hudi jar, I am
able to do select as mentioned below.
spark-shell --jars
/Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
scala> spark.sql("select * from poc.request_result__ct").show
2020-04-21 15:56:07 WARN ObjectStore:568 - Failed to get database
global_temp, returning NoSuchObjectException
+----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
|request_id|prev_request_id|ref_no|type_code|
transaction_date|process_ts|
commit_ts|header__change_oper|header__partition_name|
+----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
|2020041011| null| null| PA|2020-04-10
11:11:23|2020-04-10 11:11:30|2020-04-10 11:11:35| I|
20200117T235000_2...|
+----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
Whereas when I convert the same code into scala file and execute it using
spark-submit , I am getting error. Attached the error logs.
spark-submit --jars
/Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --class Test
/Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
object Test {
def main(args: Array[String]): Unit = {
implicit val sparkSession: SparkSession = SparkUtil.buildSession("Test_"+
now.get(Calendar.HOUR_OF_DAY)+now.get(Calendar.MINUTE)+now.get(Calendar.SECOND))
sparkSession.sql( s"select * from poc.request_result__ct").show()
}
}
When I remove Hudi bundle jar and runs the same , it works.
spark-submit --class Test /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
Even though Hudi code will come into picture only when I insert the data on
other table, for some reason, read fails . Could anyone shed some
light not his issue?
Re: Table Read fails in Spark Submit , Where as succeeds in spark-shell
Posted by selvaraj periyasamy <se...@gmail.com>.
Sure . Issue https://github.com/apache/incubator-hudi/issues/1546 has been
raised .
Thanks,
Selva
On Tue, Apr 21, 2020 at 6:25 PM cooper <li...@gmail.com> wrote:
> hi,periyasamy
> Thanks for asking questions or reporting issues, please describe it in
> detail by using github issue.
> https://github.com/apache/incubator-hudi/issues
>
> cooper
>
> selvaraj periyasamy <se...@gmail.com> 于2020年4月22日周三
> 上午7:35写道:
>
> > Folks,
> >
> > I am using Apache Hudi 0.5.0. Our hadoop cluster is miix of spark
> > version 2.3.0, Scala version 2.11.8 & Hive version 1.2.2.
> > There are multiple use cases already working in Hudi.
> >
> > I need to read one of sequence table, which is continuously inserted on
> > new partition by other process using Hive, not by Hudi. And then write
> > this DataFrame into another COW table using Hudi.
> >
> > When I use spark.sql in spark-shell, which was started with Hudi jar, I
> am
> > able to do select as mentioned below.
> >
> > spark-shell --jars
> > /Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
> > 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
> >
> >
> > scala> spark.sql("select * from poc.request_result__ct").show
> >
> > 2020-04-21 15:56:07 WARN ObjectStore:568 - Failed to get database
> > global_temp, returning NoSuchObjectException
> >
> >
> >
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
> >
> > |request_id|prev_request_id|ref_no|type_code|
> > transaction_date|process_ts|
> > commit_ts|header__change_oper|header__partition_name|
> >
> >
> >
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
> >
> > |2020041011| null| null| PA|2020-04-10
> > 11:11:23|2020-04-10 11:11:30|2020-04-10 11:11:35| I|
> > 20200117T235000_2...|
> >
> >
> >
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
> >
> >
> >
> >
> >
> > Whereas when I convert the same code into scala file and execute it using
> > spark-submit , I am getting error. Attached the error logs.
> >
> >
> > spark-submit --jars
> > /Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
> > 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --class
> Test
> > /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
> >
> >
> > object Test {
> >
> > def main(args: Array[String]): Unit = {
> >
> > implicit val sparkSession: SparkSession = SparkUtil.buildSession("Test_"+
> >
> >
> now.get(Calendar.HOUR_OF_DAY)+now.get(Calendar.MINUTE)+now.get(Calendar.SECOND))
> >
> > sparkSession.sql( s"select * from poc.request_result__ct").show()
> >
> > }
> > }
> >
> >
> > When I remove Hudi bundle jar and runs the same , it works.
> >
> >
> > spark-submit --class Test /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
> >
> >
> > Even though Hudi code will come into picture only when I insert the data
> > on other table, for some reason, read fails . Could anyone shed some
> > light not his issue?
> >
> >
> >
>
Re: Table Read fails in Spark Submit , Where as succeeds in spark-shell
Posted by cooper <li...@gmail.com>.
hi,periyasamy
Thanks for asking questions or reporting issues, please describe it in
detail by using github issue.
https://github.com/apache/incubator-hudi/issues
cooper
selvaraj periyasamy <se...@gmail.com> 于2020年4月22日周三
上午7:35写道:
> Folks,
>
> I am using Apache Hudi 0.5.0. Our hadoop cluster is miix of spark
> version 2.3.0, Scala version 2.11.8 & Hive version 1.2.2.
> There are multiple use cases already working in Hudi.
>
> I need to read one of sequence table, which is continuously inserted on
> new partition by other process using Hive, not by Hudi. And then write
> this DataFrame into another COW table using Hudi.
>
> When I use spark.sql in spark-shell, which was started with Hudi jar, I am
> able to do select as mentioned below.
>
> spark-shell --jars
> /Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
> 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
>
>
> scala> spark.sql("select * from poc.request_result__ct").show
>
> 2020-04-21 15:56:07 WARN ObjectStore:568 - Failed to get database
> global_temp, returning NoSuchObjectException
>
>
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
>
> |request_id|prev_request_id|ref_no|type_code|
> transaction_date|process_ts|
> commit_ts|header__change_oper|header__partition_name|
>
>
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
>
> |2020041011| null| null| PA|2020-04-10
> 11:11:23|2020-04-10 11:11:30|2020-04-10 11:11:35| I|
> 20200117T235000_2...|
>
>
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
>
>
>
>
>
> Whereas when I convert the same code into scala file and execute it using
> spark-submit , I am getting error. Attached the error logs.
>
>
> spark-submit --jars
> /Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
> 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --class Test
> /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
>
>
> object Test {
>
> def main(args: Array[String]): Unit = {
>
> implicit val sparkSession: SparkSession = SparkUtil.buildSession("Test_"+
>
> now.get(Calendar.HOUR_OF_DAY)+now.get(Calendar.MINUTE)+now.get(Calendar.SECOND))
>
> sparkSession.sql( s"select * from poc.request_result__ct").show()
>
> }
> }
>
>
> When I remove Hudi bundle jar and runs the same , it works.
>
>
> spark-submit --class Test /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
>
>
> Even though Hudi code will come into picture only when I insert the data
> on other table, for some reason, read fails . Could anyone shed some
> light not his issue?
>
>
>