You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Gurudatt Kulkarni <gu...@gmail.com> on 2019/11/13 09:19:08 UTC

Unable to query Hudi Tables via Spark Shell

Hi All,

I am running into a strange issue where I am unable to query Hudi tables
via spark-shell. I followed the procedure as stated in Hudi Docs
<https://hudi.apache.org/querying_data.html#spark>.

Used this command
spark-shell --jars hdfs:///jars/hudi-spark-bundle-0.5.1-SNAPSHOT.jar
--master yarn

Added this config,

spark.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class",
classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter],
classOf[org.apache.hadoop.fs.PathFilter]);

Ran a simple select query on the hive table via spark sql. It is
throwing a java.lang.ClassNotFoundException:
org.apache.hudi.hadoop.HoodieParquetInputFormat . I checked the
hudi-spark-bundle jar, for the particular class, it is available in the
jar. Also, the hudi-hadoop-mr bundle is available in Hive classpath. Have I
missed any step here?

Regards,
Gurudatt

Re: Unable to query Hudi Tables via Spark Shell

Posted by Bhavani Sudha <bh...@gmail.com>.
Hi Gurudatt,

I am guessing client mode does not support hdfs jar extraction. Can you try
changing the deploy-mode to cluster (default is client mode if you have not
specified) ?

You can also try specifying the ```--packages
org.apache.hudi:hudi-spark-bundle:0.5.0-incubating``` instead of --jars.
This would pull the jars from Maven directly.

Thanks,
Sudha

On Wed, Nov 13, 2019 at 1:19 AM Gurudatt Kulkarni <gu...@gmail.com>
wrote:

> Hi All,
>
> I am running into a strange issue where I am unable to query Hudi tables
> via spark-shell. I followed the procedure as stated in Hudi Docs
> <https://hudi.apache.org/querying_data.html#spark>.
>
> Used this command
> spark-shell --jars hdfs:///jars/hudi-spark-bundle-0.5.1-SNAPSHOT.jar
> --master yarn
>
> Added this config,
>
> spark.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], classOf[org.apache.hadoop.fs.PathFilter]);
>
> Ran a simple select query on the hive table via spark sql. It is throwing
> a java.lang.ClassNotFoundException:
> org.apache.hudi.hadoop.HoodieParquetInputFormat . I checked the
> hudi-spark-bundle jar, for the particular class, it is available in the
> jar. Also, the hudi-hadoop-mr bundle is available in Hive classpath. Have I
> missed any step here?
>
> Regards,
> Gurudatt
>
>
>