You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kyuubi.apache.org by Cheng Pan <pa...@gmail.com> on 2022/05/18 05:45:00 UTC

[DISCUSS] Kyuubi Hive JDBC shaded client and development experience

Hi Kyuubi developers,

Recently, we notice that many new contributors are confused about
Kyuubi's development and debugging process, because they can not run
`mvn test` or run test on IDE(e.g. Jetbrains IDEA) directly like most
maven based Java projects, instead, they should run `mvn install`
before doing that.

The root cause is Kyuubi Hive JDBC shaded client. We introduce this
module because the upstream Hive JDBC client has lots of transitive
dependencies which may pollute the user classpath and cause class
conflict. The Kyuubi Hive JDBC shaded client shaded and relocated all
Hive transitive classes to make it friendly to downstream projects and
users.

The problem is, Maven and IDEA[1] do not work well with shaded
modules, the current workaround requires the developer to run `mvn
install` before testing.

I have 2 ideas to solve the problem.

1. Some Kyuubi modules use Kyuubi Hive JDBC shaded client for testing,
we let them depend on the latest released version of Kyuubi Hive JDBC
shaded client rather than the project module, and introduce a profile
to switch back on CI environment for verification.

2. Separate the Kyuubi Hive JDBC shaded client to an independent Git
repo, and make it have an independent release cycle with the Kyuubi
main project.

Further, now the Kyuubi Hive JDBC shaded client has room to improve,
i.e. it's easy to deserialize the metadata and record from thrift
format to Java format directly, we don't need to use Hive serde to do
such things, then we can drop those Hive dependencies rather than
shade them; as Kyuubi proposes to support etcd as service discover
server, we also need to support that in Kyuubi Hive JDBC client.

What do you guys think of that?

[1] https://youtrack.jetbrains.com/issue/IDEA-126596

Thanks,
Cheng Pan

Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience

Posted by Kent Yao <ya...@apache.org>.
+1 for option 2

On 2022/05/18 13:30:57 zhaomin1423 wrote:
> +1
> 
> 
> 
> ---- Replied Message ----
> | From | Fei Wang<fe...@apache.org> |
> | Date | 05/18/2022 15:05 |
> | To | dev@kyuubi.apache.org<de...@kyuubi.apache.org> |
> | Cc | |
> | Subject | Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience |
> +1 for separating Kyuubi Hive JDBC shaded client to an independent Git repo.
> 
> Regards,
> Fei
> 
> On 2022/05/18 05:45:00 Cheng Pan wrote:
> > Hi Kyuubi developers,
> >
> > Recently, we notice that many new contributors are confused about
> > Kyuubi's development and debugging process, because they can not run
> > `mvn test` or run test on IDE(e.g. Jetbrains IDEA) directly like most
> > maven based Java projects, instead, they should run `mvn install`
> > before doing that.
> >
> > The root cause is Kyuubi Hive JDBC shaded client. We introduce this
> > module because the upstream Hive JDBC client has lots of transitive
> > dependencies which may pollute the user classpath and cause class
> > conflict. The Kyuubi Hive JDBC shaded client shaded and relocated all
> > Hive transitive classes to make it friendly to downstream projects and
> > users.
> >
> > The problem is, Maven and IDEA[1] do not work well with shaded
> > modules, the current workaround requires the developer to run `mvn
> > install` before testing.
> >
> > I have 2 ideas to solve the problem.
> >
> > 1. Some Kyuubi modules use Kyuubi Hive JDBC shaded client for testing,
> > we let them depend on the latest released version of Kyuubi Hive JDBC
> > shaded client rather than the project module, and introduce a profile
> > to switch back on CI environment for verification.
> >
> > 2. Separate the Kyuubi Hive JDBC shaded client to an independent Git
> > repo, and make it have an independent release cycle with the Kyuubi
> > main project.
> >
> > Further, now the Kyuubi Hive JDBC shaded client has room to improve,
> > i.e. it's easy to deserialize the metadata and record from thrift
> > format to Java format directly, we don't need to use Hive serde to do
> > such things, then we can drop those Hive dependencies rather than
> > shade them; as Kyuubi proposes to support etcd as service discover
> > server, we also need to support that in Kyuubi Hive JDBC client.
> >
> > What do you guys think of that?
> >
> > [1] https://youtrack.jetbrains.com/issue/IDEA-126596
> >
> > Thanks,
> > Cheng Pan
> >
> 

Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience

Posted by zhaomin1423 <zh...@163.com>.
+1



---- Replied Message ----
| From | Fei Wang<fe...@apache.org> |
| Date | 05/18/2022 15:05 |
| To | dev@kyuubi.apache.org<de...@kyuubi.apache.org> |
| Cc | |
| Subject | Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience |
+1 for separating Kyuubi Hive JDBC shaded client to an independent Git repo.

Regards,
Fei

On 2022/05/18 05:45:00 Cheng Pan wrote:
> Hi Kyuubi developers,
>
> Recently, we notice that many new contributors are confused about
> Kyuubi's development and debugging process, because they can not run
> `mvn test` or run test on IDE(e.g. Jetbrains IDEA) directly like most
> maven based Java projects, instead, they should run `mvn install`
> before doing that.
>
> The root cause is Kyuubi Hive JDBC shaded client. We introduce this
> module because the upstream Hive JDBC client has lots of transitive
> dependencies which may pollute the user classpath and cause class
> conflict. The Kyuubi Hive JDBC shaded client shaded and relocated all
> Hive transitive classes to make it friendly to downstream projects and
> users.
>
> The problem is, Maven and IDEA[1] do not work well with shaded
> modules, the current workaround requires the developer to run `mvn
> install` before testing.
>
> I have 2 ideas to solve the problem.
>
> 1. Some Kyuubi modules use Kyuubi Hive JDBC shaded client for testing,
> we let them depend on the latest released version of Kyuubi Hive JDBC
> shaded client rather than the project module, and introduce a profile
> to switch back on CI environment for verification.
>
> 2. Separate the Kyuubi Hive JDBC shaded client to an independent Git
> repo, and make it have an independent release cycle with the Kyuubi
> main project.
>
> Further, now the Kyuubi Hive JDBC shaded client has room to improve,
> i.e. it's easy to deserialize the metadata and record from thrift
> format to Java format directly, we don't need to use Hive serde to do
> such things, then we can drop those Hive dependencies rather than
> shade them; as Kyuubi proposes to support etcd as service discover
> server, we also need to support that in Kyuubi Hive JDBC client.
>
> What do you guys think of that?
>
> [1] https://youtrack.jetbrains.com/issue/IDEA-126596
>
> Thanks,
> Cheng Pan
>

Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience

Posted by Fei Wang <fe...@apache.org>.
+1 for separating Kyuubi Hive JDBC shaded client to an independent Git repo.

Regards,
Fei

On 2022/05/18 05:45:00 Cheng Pan wrote:
> Hi Kyuubi developers,
> 
> Recently, we notice that many new contributors are confused about
> Kyuubi's development and debugging process, because they can not run
> `mvn test` or run test on IDE(e.g. Jetbrains IDEA) directly like most
> maven based Java projects, instead, they should run `mvn install`
> before doing that.
> 
> The root cause is Kyuubi Hive JDBC shaded client. We introduce this
> module because the upstream Hive JDBC client has lots of transitive
> dependencies which may pollute the user classpath and cause class
> conflict. The Kyuubi Hive JDBC shaded client shaded and relocated all
> Hive transitive classes to make it friendly to downstream projects and
> users.
> 
> The problem is, Maven and IDEA[1] do not work well with shaded
> modules, the current workaround requires the developer to run `mvn
> install` before testing.
> 
> I have 2 ideas to solve the problem.
> 
> 1. Some Kyuubi modules use Kyuubi Hive JDBC shaded client for testing,
> we let them depend on the latest released version of Kyuubi Hive JDBC
> shaded client rather than the project module, and introduce a profile
> to switch back on CI environment for verification.
> 
> 2. Separate the Kyuubi Hive JDBC shaded client to an independent Git
> repo, and make it have an independent release cycle with the Kyuubi
> main project.
> 
> Further, now the Kyuubi Hive JDBC shaded client has room to improve,
> i.e. it's easy to deserialize the metadata and record from thrift
> format to Java format directly, we don't need to use Hive serde to do
> such things, then we can drop those Hive dependencies rather than
> shade them; as Kyuubi proposes to support etcd as service discover
> server, we also need to support that in Kyuubi Hive JDBC client.
> 
> What do you guys think of that?
> 
> [1] https://youtrack.jetbrains.com/issue/IDEA-126596
> 
> Thanks,
> Cheng Pan
> 

Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience

Posted by Cheng Pan <pa...@gmail.com>.
Thanks for the feedback, as we all agree to move the Kyuubi Hive JDBC
shaded client to a separated repo, the next things are:
1. Any suggestions for repo name? e.g. `incubating-kyuubi-hive-jdbc`
2. We may need to shade some modules in the future as Paul exampled,
should we put them together or separately?

For question 2, I need to clarify that as far as I know, we have at
least 2 kinds of artifacts to be shaded and separated
1) Popular 3rd-party library, e.g. guava;
2) Kyuubi self-components, which should be light(no-deps) and used by
other components for verification. e.g. Kyuubi Hive JDBC shaded
client, Kyuubi Spark TPC-DS Connector

Thanks,
Cheng Pan

On Thu, May 19, 2022 at 5:30 PM hongdd <jn...@163.com> wrote:
>
> +1 for option 2
>
>
> Thanks,
> hongdd
>
>
> ---- Replied Message ----
> | From | Paul Lam<pa...@gmail.com> |
> | Date | 05/19/2022 14:42 |
> | To | <de...@kyuubi.apache.org> |
> | Subject | Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience |
> Good to see we have a plan to improve this situation!
>
> I’m +1 to option 2. It may add some maintenance overhead, but it’s more intuitive.
> I see lots of projects have their own shaded repo, e.g. Flink[1] and Presto[2].
>
> [1] https://github.com/apache/flink-shaded
> [2] https://github.com/prestodb/presto-hive-apache
>
> Best,
> Paul Lam
>
> 2022年5月18日 13:45,Cheng Pan <pa...@gmail.com> 写道:
>
> Hi Kyuubi developers,
>
> Recently, we notice that many new contributors are confused about
> Kyuubi's development and debugging process, because they can not run
> `mvn test` or run test on IDE(e.g. Jetbrains IDEA) directly like most
> maven based Java projects, instead, they should run `mvn install`
> before doing that.
>
> The root cause is Kyuubi Hive JDBC shaded client. We introduce this
> module because the upstream Hive JDBC client has lots of transitive
> dependencies which may pollute the user classpath and cause class
> conflict. The Kyuubi Hive JDBC shaded client shaded and relocated all
> Hive transitive classes to make it friendly to downstream projects and
> users.
>
> The problem is, Maven and IDEA[1] do not work well with shaded
> modules, the current workaround requires the developer to run `mvn
> install` before testing.
>
> I have 2 ideas to solve the problem.
>
> 1. Some Kyuubi modules use Kyuubi Hive JDBC shaded client for testing,
> we let them depend on the latest released version of Kyuubi Hive JDBC
> shaded client rather than the project module, and introduce a profile
> to switch back on CI environment for verification.
>
> 2. Separate the Kyuubi Hive JDBC shaded client to an independent Git
> repo, and make it have an independent release cycle with the Kyuubi
> main project.
>
> Further, now the Kyuubi Hive JDBC shaded client has room to improve,
> i.e. it's easy to deserialize the metadata and record from thrift
> format to Java format directly, we don't need to use Hive serde to do
> such things, then we can drop those Hive dependencies rather than
> shade them; as Kyuubi proposes to support etcd as service discover
> server, we also need to support that in Kyuubi Hive JDBC client.
>
> What do you guys think of that?
>
> [1] https://youtrack.jetbrains.com/issue/IDEA-126596
>
> Thanks,
> Cheng Pan
>

Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience

Posted by hongdd <jn...@163.com>.
+1 for option 2


Thanks,
hongdd


---- Replied Message ----
| From | Paul Lam<pa...@gmail.com> |
| Date | 05/19/2022 14:42 |
| To | <de...@kyuubi.apache.org> |
| Subject | Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience |
Good to see we have a plan to improve this situation!

I’m +1 to option 2. It may add some maintenance overhead, but it’s more intuitive.
I see lots of projects have their own shaded repo, e.g. Flink[1] and Presto[2].

[1] https://github.com/apache/flink-shaded
[2] https://github.com/prestodb/presto-hive-apache

Best,
Paul Lam

2022年5月18日 13:45,Cheng Pan <pa...@gmail.com> 写道:

Hi Kyuubi developers,

Recently, we notice that many new contributors are confused about
Kyuubi's development and debugging process, because they can not run
`mvn test` or run test on IDE(e.g. Jetbrains IDEA) directly like most
maven based Java projects, instead, they should run `mvn install`
before doing that.

The root cause is Kyuubi Hive JDBC shaded client. We introduce this
module because the upstream Hive JDBC client has lots of transitive
dependencies which may pollute the user classpath and cause class
conflict. The Kyuubi Hive JDBC shaded client shaded and relocated all
Hive transitive classes to make it friendly to downstream projects and
users.

The problem is, Maven and IDEA[1] do not work well with shaded
modules, the current workaround requires the developer to run `mvn
install` before testing.

I have 2 ideas to solve the problem.

1. Some Kyuubi modules use Kyuubi Hive JDBC shaded client for testing,
we let them depend on the latest released version of Kyuubi Hive JDBC
shaded client rather than the project module, and introduce a profile
to switch back on CI environment for verification.

2. Separate the Kyuubi Hive JDBC shaded client to an independent Git
repo, and make it have an independent release cycle with the Kyuubi
main project.

Further, now the Kyuubi Hive JDBC shaded client has room to improve,
i.e. it's easy to deserialize the metadata and record from thrift
format to Java format directly, we don't need to use Hive serde to do
such things, then we can drop those Hive dependencies rather than
shade them; as Kyuubi proposes to support etcd as service discover
server, we also need to support that in Kyuubi Hive JDBC client.

What do you guys think of that?

[1] https://youtrack.jetbrains.com/issue/IDEA-126596

Thanks,
Cheng Pan


Re: [DISCUSS] Kyuubi Hive JDBC shaded client and development experience

Posted by Paul Lam <pa...@gmail.com>.
Good to see we have a plan to improve this situation!

I’m +1 to option 2. It may add some maintenance overhead, but it’s more intuitive.
I see lots of projects have their own shaded repo, e.g. Flink[1] and Presto[2].

[1] https://github.com/apache/flink-shaded
[2] https://github.com/prestodb/presto-hive-apache

Best,
Paul Lam

> 2022年5月18日 13:45,Cheng Pan <pa...@gmail.com> 写道:
> 
> Hi Kyuubi developers,
> 
> Recently, we notice that many new contributors are confused about
> Kyuubi's development and debugging process, because they can not run
> `mvn test` or run test on IDE(e.g. Jetbrains IDEA) directly like most
> maven based Java projects, instead, they should run `mvn install`
> before doing that.
> 
> The root cause is Kyuubi Hive JDBC shaded client. We introduce this
> module because the upstream Hive JDBC client has lots of transitive
> dependencies which may pollute the user classpath and cause class
> conflict. The Kyuubi Hive JDBC shaded client shaded and relocated all
> Hive transitive classes to make it friendly to downstream projects and
> users.
> 
> The problem is, Maven and IDEA[1] do not work well with shaded
> modules, the current workaround requires the developer to run `mvn
> install` before testing.
> 
> I have 2 ideas to solve the problem.
> 
> 1. Some Kyuubi modules use Kyuubi Hive JDBC shaded client for testing,
> we let them depend on the latest released version of Kyuubi Hive JDBC
> shaded client rather than the project module, and introduce a profile
> to switch back on CI environment for verification.
> 
> 2. Separate the Kyuubi Hive JDBC shaded client to an independent Git
> repo, and make it have an independent release cycle with the Kyuubi
> main project.
> 
> Further, now the Kyuubi Hive JDBC shaded client has room to improve,
> i.e. it's easy to deserialize the metadata and record from thrift
> format to Java format directly, we don't need to use Hive serde to do
> such things, then we can drop those Hive dependencies rather than
> shade them; as Kyuubi proposes to support etcd as service discover
> server, we also need to support that in Kyuubi Hive JDBC client.
> 
> What do you guys think of that?
> 
> [1] https://youtrack.jetbrains.com/issue/IDEA-126596
> 
> Thanks,
> Cheng Pan