You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by Rui Li <li...@apache.org> on 2020/09/17 12:26:43 UTC

What's the recommended way to add dependencies

Hello,

I'm a newbie and just trying to run my very first example with Hudi -- to
ingest CSV DFS source to a Hudi table. I hit the following problems in the
process.

1. Class not found exception for HiveConf. I didn't enable hive sync but
still got this error. I guess it's because the class is imported in
DeltaSync. I solved this by adding hive-common to class path. (I tried
hive-exec at first but that caused conflicts with Parquet)

2. No such method error for jetty SessionHandler::setHttpOnly. It's
because the jetty-server version conflicts between Hudi and my Hadoop. I
solved this problem by setting spark.driver.userClassPathFirst=true.

Although I've managed to make the program run successfully, I wonder
whether I'm doing it right and what's the recommended way to add
dependencies.

The components I'm using:
Spark 2.4.6 w/o Hadoop
Hadoop 3.0.3
Hive 2.3.4
Hudi latest master code

Thanks in advance!

-- 
Cheers,
Rui Li

Re: What's the recommended way to add dependencies

Posted by Rui Li <li...@apache.org>.
Hi leesf,

Thanks for your reply and the instructions. I'll give it a try.

On Sun, Sep 20, 2020 at 9:04 PM leesf <le...@gmail.com> wrote:

> Hi Rui,
>
> 1. It is because utilities.bundle.hive.scope is provided by default, thus
> not included in hudi-utilities-bundle-xxx.jar, and you would set -Dutilities.bundle.hive.scope=runtime
> to re-compile from master branch to include hive dependencies, also I think
> adding hive-common jar to class path should be fine without setting -D
> utilities.bundle.hive.scope=runtime. But as you pointed out that even you
> don't sync to hive, you still get the error, I think we would document it
> at least.
> 2. There is a PR to fix the jetty version conflict, please check it out,
> https://github.com/apache/hudi/pull/1990.
>
> Rui Li <li...@apache.org> 于2020年9月17日周四 下午8:26写道:
>
>> Hello,
>>
>> I'm a newbie and just trying to run my very first example with Hudi -- to
>> ingest CSV DFS source to a Hudi table. I hit the following problems in the
>> process.
>>
>> 1. Class not found exception for HiveConf. I didn't enable hive sync but
>> still got this error. I guess it's because the class is imported in
>> DeltaSync. I solved this by adding hive-common to class path. (I tried
>> hive-exec at first but that caused conflicts with Parquet)
>>
>> 2. No such method error for jetty SessionHandler::setHttpOnly. It's
>> because the jetty-server version conflicts between Hudi and my Hadoop. I
>> solved this problem by setting spark.driver.userClassPathFirst=true.
>>
>> Although I've managed to make the program run successfully, I wonder
>> whether I'm doing it right and what's the recommended way to add
>> dependencies.
>>
>> The components I'm using:
>> Spark 2.4.6 w/o Hadoop
>> Hadoop 3.0.3
>> Hive 2.3.4
>> Hudi latest master code
>>
>> Thanks in advance!
>>
>> --
>> Cheers,
>> Rui Li
>>
>

-- 
Cheers,
Rui Li

Re: What's the recommended way to add dependencies

Posted by leesf <le...@gmail.com>.
Hi Rui,

1. It is because utilities.bundle.hive.scope is provided by default, thus
not included in hudi-utilities-bundle-xxx.jar, and you would set
-Dutilities.bundle.hive.scope=runtime
to re-compile from master branch to include hive dependencies, also I think
adding hive-common jar to class path should be fine without setting -D
utilities.bundle.hive.scope=runtime. But as you pointed out that even you
don't sync to hive, you still get the error, I think we would document it
at least.
2. There is a PR to fix the jetty version conflict, please check it out,
https://github.com/apache/hudi/pull/1990.

Rui Li <li...@apache.org> 于2020年9月17日周四 下午8:26写道:

> Hello,
>
> I'm a newbie and just trying to run my very first example with Hudi -- to
> ingest CSV DFS source to a Hudi table. I hit the following problems in the
> process.
>
> 1. Class not found exception for HiveConf. I didn't enable hive sync but
> still got this error. I guess it's because the class is imported in
> DeltaSync. I solved this by adding hive-common to class path. (I tried
> hive-exec at first but that caused conflicts with Parquet)
>
> 2. No such method error for jetty SessionHandler::setHttpOnly. It's
> because the jetty-server version conflicts between Hudi and my Hadoop. I
> solved this problem by setting spark.driver.userClassPathFirst=true.
>
> Although I've managed to make the program run successfully, I wonder
> whether I'm doing it right and what's the recommended way to add
> dependencies.
>
> The components I'm using:
> Spark 2.4.6 w/o Hadoop
> Hadoop 3.0.3
> Hive 2.3.4
> Hudi latest master code
>
> Thanks in advance!
>
> --
> Cheers,
> Rui Li
>