You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by selvaraj periyasamy <se...@gmail.com> on 2020/04/21 23:34:46 UTC

Table Read fails in Spark Submit , Where as succeeds in spark-shell

Folks,

I am using  Apache Hudi 0.5.0. Our hadoop cluster is miix of  spark version
 2.3.0, Scala version 2.11.8 & Hive version 1.2.2.
There are multiple use cases already working in Hudi.

I need to read one of sequence table, which is continuously inserted on new
partition by other process using Hive, not by Hudi.  And then write this
DataFrame into another COW table using Hudi.

When I use spark.sql in spark-shell, which was started with Hudi jar, I am
able to do select as mentioned below.

spark-shell --jars
/Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer'


scala> spark.sql("select * from poc.request_result__ct").show

2020-04-21 15:56:07 WARN  ObjectStore:568 - Failed to get database
global_temp, returning NoSuchObjectException

+----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+

|request_id|prev_request_id|ref_no|type_code|
transaction_date|process_ts|
commit_ts|header__change_oper|header__partition_name|

+----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+

|2020041011|          null|        null|       PA|2020-04-10
11:11:23|2020-04-10 11:11:30|2020-04-10 11:11:35|                  I|
20200117T235000_2...|

+----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+





Whereas when I convert the same code into scala file and execute it using
spark-submit , I am getting error. Attached the error logs.


spark-submit --jars
/Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --class Test
/Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar


object Test {

def main(args: Array[String]): Unit = {

implicit val sparkSession: SparkSession = SparkUtil.buildSession("Test_"+
  now.get(Calendar.HOUR_OF_DAY)+now.get(Calendar.MINUTE)+now.get(Calendar.SECOND))

sparkSession.sql( s"select * from poc.request_result__ct").show()

 }
}


When I remove Hudi bundle jar and runs the same , it works.


spark-submit --class Test /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar


Even though Hudi code will come into picture only when I insert the data on
other table,  for some reason, read fails . Could anyone shed some
light not his issue?

Re: Table Read fails in Spark Submit , Where as succeeds in spark-shell

Posted by selvaraj periyasamy <se...@gmail.com>.

Sure . Issue https://github.com/apache/incubator-hudi/issues/1546 has been
raised .

Thanks,
Selva



On Tue, Apr 21, 2020 at 6:25 PM cooper <li...@gmail.com> wrote:

> hi,periyasamy
> Thanks for asking questions or reporting issues, please describe it in
> detail by using github issue.
> https://github.com/apache/incubator-hudi/issues
>
> cooper
>
> selvaraj periyasamy <se...@gmail.com> 于2020年4月22日周三
> 上午7:35写道：
>
> > Folks,
> >
> > I am using  Apache Hudi 0.5.0. Our hadoop cluster is miix of  spark
> > version  2.3.0, Scala version 2.11.8 & Hive version 1.2.2.
> > There are multiple use cases already working in Hudi.
> >
> > I need to read one of sequence table, which is continuously inserted on
> > new partition by other process using Hive, not by Hudi.  And then write
> > this DataFrame into another COW table using Hudi.
> >
> > When I use spark.sql in spark-shell, which was started with Hudi jar, I
> am
> > able to do select as mentioned below.
> >
> > spark-shell --jars
> > /Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
> > 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
> >
> >
> > scala> spark.sql("select * from poc.request_result__ct").show
> >
> > 2020-04-21 15:56:07 WARN  ObjectStore:568 - Failed to get database
> > global_temp, returning NoSuchObjectException
> >
> >
> >
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
> >
> > |request_id|prev_request_id|ref_no|type_code|
> > transaction_date|process_ts|
> > commit_ts|header__change_oper|header__partition_name|
> >
> >
> >
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
> >
> > |2020041011|          null|        null|       PA|2020-04-10
> > 11:11:23|2020-04-10 11:11:30|2020-04-10 11:11:35|                  I|
> > 20200117T235000_2...|
> >
> >
> >
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
> >
> >
> >
> >
> >
> > Whereas when I convert the same code into scala file and execute it using
> > spark-submit , I am getting error. Attached the error logs.
> >
> >
> > spark-submit --jars
> > /Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
> > 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --class
> Test
> > /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
> >
> >
> > object Test {
> >
> > def main(args: Array[String]): Unit = {
> >
> > implicit val sparkSession: SparkSession = SparkUtil.buildSession("Test_"+
> >
> >
>  now.get(Calendar.HOUR_OF_DAY)+now.get(Calendar.MINUTE)+now.get(Calendar.SECOND))
> >
> > sparkSession.sql( s"select * from poc.request_result__ct").show()
> >
> >  }
> > }
> >
> >
> > When I remove Hudi bundle jar and runs the same , it works.
> >
> >
> > spark-submit --class Test /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
> >
> >
> > Even though Hudi code will come into picture only when I insert the data
> > on other table,  for some reason, read fails . Could anyone shed some
> > light not his issue?
> >
> >
> >
>

Re: Table Read fails in Spark Submit , Where as succeeds in spark-shell

Posted by cooper <li...@gmail.com>.

hi,periyasamy
Thanks for asking questions or reporting issues, please describe it in
detail by using github issue.
https://github.com/apache/incubator-hudi/issues

cooper

selvaraj periyasamy <se...@gmail.com> 于2020年4月22日周三
上午7:35写道：

> Folks,
>
> I am using  Apache Hudi 0.5.0. Our hadoop cluster is miix of  spark
> version  2.3.0, Scala version 2.11.8 & Hive version 1.2.2.
> There are multiple use cases already working in Hudi.
>
> I need to read one of sequence table, which is continuously inserted on
> new partition by other process using Hive, not by Hudi.  And then write
> this DataFrame into another COW table using Hudi.
>
> When I use spark.sql in spark-shell, which was started with Hudi jar, I am
> able to do select as mentioned below.
>
> spark-shell --jars
> /Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
> 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
>
>
> scala> spark.sql("select * from poc.request_result__ct").show
>
> 2020-04-21 15:56:07 WARN  ObjectStore:568 - Failed to get database
> global_temp, returning NoSuchObjectException
>
>
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
>
> |request_id|prev_request_id|ref_no|type_code|
> transaction_date|process_ts|
> commit_ts|header__change_oper|header__partition_name|
>
>
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
>
> |2020041011|          null|        null|       PA|2020-04-10
> 11:11:23|2020-04-10 11:11:30|2020-04-10 11:11:35|                  I|
> 20200117T235000_2...|
>
>
> +----------+--------------+------------+---------+-------------------+-------------------+-------------------+-------------------+----------------------+
>
>
>
>
>
> Whereas when I convert the same code into scala file and execute it using
> spark-submit , I am getting error. Attached the error logs.
>
>
> spark-submit --jars
> /Users/seperiya/Downloads/hudi-spark-bundle-0.5.0-incubating.jar --conf
> 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --class Test
> /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
>
>
> object Test {
>
> def main(args: Array[String]): Unit = {
>
> implicit val sparkSession: SparkSession = SparkUtil.buildSession("Test_"+
>
>   now.get(Calendar.HOUR_OF_DAY)+now.get(Calendar.MINUTE)+now.get(Calendar.SECOND))
>
> sparkSession.sql( s"select * from poc.request_result__ct").show()
>
>  }
> }
>
>
> When I remove Hudi bundle jar and runs the same , it works.
>
>
> spark-submit --class Test /Test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
>
>
> Even though Hudi code will come into picture only when I insert the data
> on other table,  for some reason, read fails . Could anyone shed some
> light not his issue?
>
>
>