You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by David Smiley <ds...@apache.org> on 2022/01/20 21:48:05 UTC

Moving out all Hadoop plugins into one module

The issue https://issues.apache.org/jira/browse/SOLR-14660 is about moving
the HDFS plugins out of core into a module.  While a great thing, it still
leaves quite a few Hadoop related dependencies in solr-core because Hadoop
is not there only for HDFS; it's there for some exotic authentication &
authorization plugins.  In that JIRA issue I proposed that this module be
"hadoop" and have any hadoop related plugins.

As a quick experiment, I commented out the hadoop-auth dependency and tried
to compile to see what the compiler caught. It exposed the following two
Solr plugins:
* HadoopAuthPlugin
* KerberosPlugin

Are we okay with expanding the scope of SOLR-14660 to include these?

Note that SOLR-14660 *might* result in 9.0 not including this module in the
release distribution if we don't feel the module will be sufficiently ready
to release.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

Re: Moving out all Hadoop plugins into one module

Posted by Jan Høydahl <ja...@cominvent.com>.
That's why a package system tends to be a complex beast, with a dependency tree between packages etc, so you'd have a hadoop-common package and hadoop-auth and hadoop-hdfs that depend on it. But I don't know if we want to go there, package management is not Solr's core business.

Another thing to remember: Once we factor out hadoop as a module (contrib), we may like to upgrade the version in solr-core for certain common dependencies that were locked on old versions due to hadoop. But, if a user then tries to drop the module-jars (and dependencies) into SOLR_HOME/lib/ or similar, there will be jar version conflicts between module and core. If loading the module through package manager however, there will be classloader isolation and more likely to succeed.

I don't have a list of such potential crashes, and I hear that newer versions of hadoop is better and use shading for some deps, but whoever prepares the module should do a thorough check of the resulting modules/<hadoop-foo>/lib/ folder and cross-check it with jars in WEB-INF/lib/ to look for trouble - perhaps there are workarounds.

Jan

> 20. jan. 2022 kl. 23:12 skrev Kevin Risden <co...@gmail.com>:
> 
> Yea it would be duplicate jars in both places. It is a shame both share the name "hadoop" since the two features - filesystem and authentication. They end up being two entirely different things both in Hadoop itself and inside of Solr.
> 
> Kevin Risden
> 
> 
> On Thu, Jan 20, 2022 at 4:58 PM David Smiley <dsmiley@apache.org <ma...@apache.org>> wrote:
> Separate modules will mean our distro will end up duplicating hadoop-common and other related JARs for both modules.  I was trying to be practical.  But it's not important to me; ok.
> implementation ('org.apache.hadoop:hadoop-common') { transitive = false } // too many to ignore
> implementation ('org.apache.hadoop:hadoop-annotations')
> runtimeOnly 'org.apache.htrace:htrace-core4' // note: removed in Hadoop 3.3.2
> runtimeOnly "org.apache.commons:commons-configuration2"
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
> 
> On Thu, Jan 20, 2022 at 4:55 PM Kevin Risden <krisden@apache.org <ma...@apache.org>> wrote:
> My preference would be as a separate HadoopAuthentication or something module. HDFS the filesystem / blockcache / etc support is unique and separate from the authentication part. It shouldn't all be in one module.
> 
> Kevin Risden
> 
> 
> On Thu, Jan 20, 2022 at 4:48 PM David Smiley <dsmiley@apache.org <ma...@apache.org>> wrote:
> The issue https://issues.apache.org/jira/browse/SOLR-14660 <https://issues.apache.org/jira/browse/SOLR-14660> is about moving the HDFS plugins out of core into a module.  While a great thing, it still leaves quite a few Hadoop related dependencies in solr-core because Hadoop is not there only for HDFS; it's there for some exotic authentication & authorization plugins.  In that JIRA issue I proposed that this module be "hadoop" and have any hadoop related plugins.
> 
> As a quick experiment, I commented out the hadoop-auth dependency and tried to compile to see what the compiler caught. It exposed the following two Solr plugins:
> * HadoopAuthPlugin
> * KerberosPlugin
> 
> Are we okay with expanding the scope of SOLR-14660 to include these?
> 
> Note that SOLR-14660 *might* result in 9.0 not including this module in the release distribution if we don't feel the module will be sufficiently ready to release.
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>

Re: Moving out all Hadoop plugins into one module

Posted by Kevin Risden <co...@gmail.com>.
Yea it would be duplicate jars in both places. It is a shame both share the
name "hadoop" since the two features - filesystem and authentication. They
end up being two entirely different things both in Hadoop itself and inside
of Solr.

Kevin Risden


On Thu, Jan 20, 2022 at 4:58 PM David Smiley <ds...@apache.org> wrote:

> Separate modules will mean our distro will end up duplicating
> hadoop-common and other related JARs for both modules.  I was trying to be
> practical.  But it's not important to me; ok.
>
> implementation ('org.apache.hadoop:hadoop-common') { transitive = false } // too many to ignore
> implementation ('org.apache.hadoop:hadoop-annotations')
> runtimeOnly 'org.apache.htrace:htrace-core4' // note: removed in Hadoop 3.3.2
> runtimeOnly "org.apache.commons:commons-configuration2"
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Jan 20, 2022 at 4:55 PM Kevin Risden <kr...@apache.org> wrote:
>
>> My preference would be as a separate HadoopAuthentication or something
>> module. HDFS the filesystem / blockcache / etc support is unique and
>> separate from the authentication part. It shouldn't all be in one module.
>>
>> Kevin Risden
>>
>>
>> On Thu, Jan 20, 2022 at 4:48 PM David Smiley <ds...@apache.org> wrote:
>>
>>> The issue https://issues.apache.org/jira/browse/SOLR-14660 is about
>>> moving the HDFS plugins out of core into a module.  While a great thing, it
>>> still leaves quite a few Hadoop related dependencies in solr-core because
>>> Hadoop is not there only for HDFS; it's there for some exotic
>>> authentication & authorization plugins.  In that JIRA issue I proposed that
>>> this module be "hadoop" and have any hadoop related plugins.
>>>
>>> As a quick experiment, I commented out the hadoop-auth dependency and
>>> tried to compile to see what the compiler caught. It exposed the following
>>> two Solr plugins:
>>> * HadoopAuthPlugin
>>> * KerberosPlugin
>>>
>>> Are we okay with expanding the scope of SOLR-14660 to include these?
>>>
>>> Note that SOLR-14660 *might* result in 9.0 not including this module in
>>> the release distribution if we don't feel the module will be sufficiently
>>> ready to release.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>

Re: Moving out all Hadoop plugins into one module

Posted by David Smiley <ds...@apache.org>.
Separate modules will mean our distro will end up duplicating hadoop-common
and other related JARs for both modules.  I was trying to be practical.
But it's not important to me; ok.

implementation ('org.apache.hadoop:hadoop-common') { transitive =
false } // too many to ignore
implementation ('org.apache.hadoop:hadoop-annotations')
runtimeOnly 'org.apache.htrace:htrace-core4' // note: removed in Hadoop 3.3.2
runtimeOnly "org.apache.commons:commons-configuration2"

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Jan 20, 2022 at 4:55 PM Kevin Risden <kr...@apache.org> wrote:

> My preference would be as a separate HadoopAuthentication or something
> module. HDFS the filesystem / blockcache / etc support is unique and
> separate from the authentication part. It shouldn't all be in one module.
>
> Kevin Risden
>
>
> On Thu, Jan 20, 2022 at 4:48 PM David Smiley <ds...@apache.org> wrote:
>
>> The issue https://issues.apache.org/jira/browse/SOLR-14660 is about
>> moving the HDFS plugins out of core into a module.  While a great thing, it
>> still leaves quite a few Hadoop related dependencies in solr-core because
>> Hadoop is not there only for HDFS; it's there for some exotic
>> authentication & authorization plugins.  In that JIRA issue I proposed that
>> this module be "hadoop" and have any hadoop related plugins.
>>
>> As a quick experiment, I commented out the hadoop-auth dependency and
>> tried to compile to see what the compiler caught. It exposed the following
>> two Solr plugins:
>> * HadoopAuthPlugin
>> * KerberosPlugin
>>
>> Are we okay with expanding the scope of SOLR-14660 to include these?
>>
>> Note that SOLR-14660 *might* result in 9.0 not including this module in
>> the release distribution if we don't feel the module will be sufficiently
>> ready to release.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>

Re: Moving out all Hadoop plugins into one module

Posted by Kevin Risden <kr...@apache.org>.
My preference would be as a separate HadoopAuthentication or something
module. HDFS the filesystem / blockcache / etc support is unique and
separate from the authentication part. It shouldn't all be in one module.

Kevin Risden


On Thu, Jan 20, 2022 at 4:48 PM David Smiley <ds...@apache.org> wrote:

> The issue https://issues.apache.org/jira/browse/SOLR-14660 is about
> moving the HDFS plugins out of core into a module.  While a great thing, it
> still leaves quite a few Hadoop related dependencies in solr-core because
> Hadoop is not there only for HDFS; it's there for some exotic
> authentication & authorization plugins.  In that JIRA issue I proposed that
> this module be "hadoop" and have any hadoop related plugins.
>
> As a quick experiment, I commented out the hadoop-auth dependency and
> tried to compile to see what the compiler caught. It exposed the following
> two Solr plugins:
> * HadoopAuthPlugin
> * KerberosPlugin
>
> Are we okay with expanding the scope of SOLR-14660 to include these?
>
> Note that SOLR-14660 *might* result in 9.0 not including this module in
> the release distribution if we don't feel the module will be sufficiently
> ready to release.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>