You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by David Smiley <ds...@apache.org> on 2022/01/20 21:48:05 UTC
Moving out all Hadoop plugins into one module
The issue https://issues.apache.org/jira/browse/SOLR-14660 is about moving
the HDFS plugins out of core into a module. While a great thing, it still
leaves quite a few Hadoop related dependencies in solr-core because Hadoop
is not there only for HDFS; it's there for some exotic authentication &
authorization plugins. In that JIRA issue I proposed that this module be
"hadoop" and have any hadoop related plugins.
As a quick experiment, I commented out the hadoop-auth dependency and tried
to compile to see what the compiler caught. It exposed the following two
Solr plugins:
* HadoopAuthPlugin
* KerberosPlugin
Are we okay with expanding the scope of SOLR-14660 to include these?
Note that SOLR-14660 *might* result in 9.0 not including this module in the
release distribution if we don't feel the module will be sufficiently ready
to release.
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
Re: Moving out all Hadoop plugins into one module
Posted by Jan Høydahl <ja...@cominvent.com>.
That's why a package system tends to be a complex beast, with a dependency tree between packages etc, so you'd have a hadoop-common package and hadoop-auth and hadoop-hdfs that depend on it. But I don't know if we want to go there, package management is not Solr's core business.
Another thing to remember: Once we factor out hadoop as a module (contrib), we may like to upgrade the version in solr-core for certain common dependencies that were locked on old versions due to hadoop. But, if a user then tries to drop the module-jars (and dependencies) into SOLR_HOME/lib/ or similar, there will be jar version conflicts between module and core. If loading the module through package manager however, there will be classloader isolation and more likely to succeed.
I don't have a list of such potential crashes, and I hear that newer versions of hadoop is better and use shading for some deps, but whoever prepares the module should do a thorough check of the resulting modules/<hadoop-foo>/lib/ folder and cross-check it with jars in WEB-INF/lib/ to look for trouble - perhaps there are workarounds.
Jan
> 20. jan. 2022 kl. 23:12 skrev Kevin Risden <co...@gmail.com>:
>
> Yea it would be duplicate jars in both places. It is a shame both share the name "hadoop" since the two features - filesystem and authentication. They end up being two entirely different things both in Hadoop itself and inside of Solr.
>
> Kevin Risden
>
>
> On Thu, Jan 20, 2022 at 4:58 PM David Smiley <dsmiley@apache.org <ma...@apache.org>> wrote:
> Separate modules will mean our distro will end up duplicating hadoop-common and other related JARs for both modules. I was trying to be practical. But it's not important to me; ok.
> implementation ('org.apache.hadoop:hadoop-common') { transitive = false } // too many to ignore
> implementation ('org.apache.hadoop:hadoop-annotations')
> runtimeOnly 'org.apache.htrace:htrace-core4' // note: removed in Hadoop 3.3.2
> runtimeOnly "org.apache.commons:commons-configuration2"
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>
> On Thu, Jan 20, 2022 at 4:55 PM Kevin Risden <krisden@apache.org <ma...@apache.org>> wrote:
> My preference would be as a separate HadoopAuthentication or something module. HDFS the filesystem / blockcache / etc support is unique and separate from the authentication part. It shouldn't all be in one module.
>
> Kevin Risden
>
>
> On Thu, Jan 20, 2022 at 4:48 PM David Smiley <dsmiley@apache.org <ma...@apache.org>> wrote:
> The issue https://issues.apache.org/jira/browse/SOLR-14660 <https://issues.apache.org/jira/browse/SOLR-14660> is about moving the HDFS plugins out of core into a module. While a great thing, it still leaves quite a few Hadoop related dependencies in solr-core because Hadoop is not there only for HDFS; it's there for some exotic authentication & authorization plugins. In that JIRA issue I proposed that this module be "hadoop" and have any hadoop related plugins.
>
> As a quick experiment, I commented out the hadoop-auth dependency and tried to compile to see what the compiler caught. It exposed the following two Solr plugins:
> * HadoopAuthPlugin
> * KerberosPlugin
>
> Are we okay with expanding the scope of SOLR-14660 to include these?
>
> Note that SOLR-14660 *might* result in 9.0 not including this module in the release distribution if we don't feel the module will be sufficiently ready to release.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
Re: Moving out all Hadoop plugins into one module
Posted by Kevin Risden <co...@gmail.com>.
Yea it would be duplicate jars in both places. It is a shame both share the
name "hadoop" since the two features - filesystem and authentication. They
end up being two entirely different things both in Hadoop itself and inside
of Solr.
Kevin Risden
On Thu, Jan 20, 2022 at 4:58 PM David Smiley <ds...@apache.org> wrote:
> Separate modules will mean our distro will end up duplicating
> hadoop-common and other related JARs for both modules. I was trying to be
> practical. But it's not important to me; ok.
>
> implementation ('org.apache.hadoop:hadoop-common') { transitive = false } // too many to ignore
> implementation ('org.apache.hadoop:hadoop-annotations')
> runtimeOnly 'org.apache.htrace:htrace-core4' // note: removed in Hadoop 3.3.2
> runtimeOnly "org.apache.commons:commons-configuration2"
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Jan 20, 2022 at 4:55 PM Kevin Risden <kr...@apache.org> wrote:
>
>> My preference would be as a separate HadoopAuthentication or something
>> module. HDFS the filesystem / blockcache / etc support is unique and
>> separate from the authentication part. It shouldn't all be in one module.
>>
>> Kevin Risden
>>
>>
>> On Thu, Jan 20, 2022 at 4:48 PM David Smiley <ds...@apache.org> wrote:
>>
>>> The issue https://issues.apache.org/jira/browse/SOLR-14660 is about
>>> moving the HDFS plugins out of core into a module. While a great thing, it
>>> still leaves quite a few Hadoop related dependencies in solr-core because
>>> Hadoop is not there only for HDFS; it's there for some exotic
>>> authentication & authorization plugins. In that JIRA issue I proposed that
>>> this module be "hadoop" and have any hadoop related plugins.
>>>
>>> As a quick experiment, I commented out the hadoop-auth dependency and
>>> tried to compile to see what the compiler caught. It exposed the following
>>> two Solr plugins:
>>> * HadoopAuthPlugin
>>> * KerberosPlugin
>>>
>>> Are we okay with expanding the scope of SOLR-14660 to include these?
>>>
>>> Note that SOLR-14660 *might* result in 9.0 not including this module in
>>> the release distribution if we don't feel the module will be sufficiently
>>> ready to release.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>
Re: Moving out all Hadoop plugins into one module
Posted by David Smiley <ds...@apache.org>.
Separate modules will mean our distro will end up duplicating hadoop-common
and other related JARs for both modules. I was trying to be practical.
But it's not important to me; ok.
implementation ('org.apache.hadoop:hadoop-common') { transitive =
false } // too many to ignore
implementation ('org.apache.hadoop:hadoop-annotations')
runtimeOnly 'org.apache.htrace:htrace-core4' // note: removed in Hadoop 3.3.2
runtimeOnly "org.apache.commons:commons-configuration2"
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
On Thu, Jan 20, 2022 at 4:55 PM Kevin Risden <kr...@apache.org> wrote:
> My preference would be as a separate HadoopAuthentication or something
> module. HDFS the filesystem / blockcache / etc support is unique and
> separate from the authentication part. It shouldn't all be in one module.
>
> Kevin Risden
>
>
> On Thu, Jan 20, 2022 at 4:48 PM David Smiley <ds...@apache.org> wrote:
>
>> The issue https://issues.apache.org/jira/browse/SOLR-14660 is about
>> moving the HDFS plugins out of core into a module. While a great thing, it
>> still leaves quite a few Hadoop related dependencies in solr-core because
>> Hadoop is not there only for HDFS; it's there for some exotic
>> authentication & authorization plugins. In that JIRA issue I proposed that
>> this module be "hadoop" and have any hadoop related plugins.
>>
>> As a quick experiment, I commented out the hadoop-auth dependency and
>> tried to compile to see what the compiler caught. It exposed the following
>> two Solr plugins:
>> * HadoopAuthPlugin
>> * KerberosPlugin
>>
>> Are we okay with expanding the scope of SOLR-14660 to include these?
>>
>> Note that SOLR-14660 *might* result in 9.0 not including this module in
>> the release distribution if we don't feel the module will be sufficiently
>> ready to release.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>
Re: Moving out all Hadoop plugins into one module
Posted by Kevin Risden <kr...@apache.org>.
My preference would be as a separate HadoopAuthentication or something
module. HDFS the filesystem / blockcache / etc support is unique and
separate from the authentication part. It shouldn't all be in one module.
Kevin Risden
On Thu, Jan 20, 2022 at 4:48 PM David Smiley <ds...@apache.org> wrote:
> The issue https://issues.apache.org/jira/browse/SOLR-14660 is about
> moving the HDFS plugins out of core into a module. While a great thing, it
> still leaves quite a few Hadoop related dependencies in solr-core because
> Hadoop is not there only for HDFS; it's there for some exotic
> authentication & authorization plugins. In that JIRA issue I proposed that
> this module be "hadoop" and have any hadoop related plugins.
>
> As a quick experiment, I commented out the hadoop-auth dependency and
> tried to compile to see what the compiler caught. It exposed the following
> two Solr plugins:
> * HadoopAuthPlugin
> * KerberosPlugin
>
> Are we okay with expanding the scope of SOLR-14660 to include these?
>
> Note that SOLR-14660 *might* result in 9.0 not including this module in
> the release distribution if we don't feel the module will be sufficiently
> ready to release.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>