You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Shad Storhaug <sh...@shadstorhaug.com> on 2016/10/17 14:24:57 UTC

SPIClassIterator/NamedSPILoader Limitation Issue

I have completed implementing all but one of the missing Lucene.Net.Core tests, and have started getting some of the new failing tests to pass (was originally over 60, now is more like 35). But one issue I discovered along the way is that some of the test doubles that depended on the OLD_FORMAT_IMPERSONATION_IS_ACTIVE setting being static are not able to function.

After thinking this through, the reason why that variable was made non-static is because if we run multiple tests in parallel using xUnit they cannot use a static setting that changes for various tests. So we need another solution than going back to the original Java implementation that was static (that is, if we plan to go forward with xUnit...).

Also, as I previously mentioned (http://apache.markmail.org/search/?q=remaining+list%3Aorg.apache.incubator.lucene-net-dev+order%3Adate-backward+date%3A200603-201610#query:remaining%20list%3Aorg.apache.incubator.lucene-net-dev%20order%3Adate-backward%20date%3A200603-201610+page:1+mid:so67bfksvrebrhru+state:results), the fact that the SPIClassIterator just loads the types from the current AppDomain in random order is an issue for anyone who needs to extend it (as well as a problem for testing).

The Problem

Many of the RW codec files (test mocks) require the ability to read the OLD_FORMAT_IMPERSONATION_IS_ACTIVE variable from LuceneTestCase. In Lucene, they did so by reading the setting the variable directly because it was static, so they would have access to the setting in real time.

Connie solved part of the issue by adding a constructor to the codec test mocks, which allows you to inject this setting into the IndexWriterConfig (an example here: https://github.com/apache/lucenenet/blob/686b75113be7c4bae2daee20afb5f7a70d79a1d5/src/Lucene.Net.Tests/core/Index/TestNumericDocValuesUpdates.cs#L1067). That works for the IndexWriter, however, when using the IndexReader the codec names are read from the index file and loaded directly by NamedSPILoader without any way to intervene or pass anything to the constructor.

In addition, the ORDER of the classes that are currently loaded by SPIClassIterator is important for the tests. They require that the test codecs are loaded before the production codecs. This is also important if someone wanted to extend the codec for some reason - they will also need their customizations to load first before the internal Lucene classes.

Proposed Solution

We could add a correlation to the Java ClassLoader to Lucene.Net, called ITypeLoader. The constructor and method overloads from Lucene that accept ClassLoader were left out of Lucene.Net, which is one reason we are missing this functionality. The main purpose of ITypeLoader would be to load and ORDER the types that are iterated by the SPIClassIterator. ITypeLoader would be set statically at application startup (Lucene.Net.Support.TypeLoader.SetTypeLoader(ITypeLoader)) and will also have a default value (DefaultTypeLoader) if no customization is needed.

The existing SPIClassIterator would be refactored to only iterate and not cache the types. It will also be passed a parameter (as it was in Lucene) to filter for types that directly inherit the passed in type. The caching of types could be moved to a decorator ITypeLoader instead, but we should remove the constraints that we currently have on constructor signatures.

In addition, we could have an IInstanceFactory (abstract factory) that could be implemented in order to provide additional dependencies/configuration to types that are customized, similar to what is specified here: http://blog.ploeh.dk/2014/05/19/di-friendly-framework/.  Its purpose would be to replace the current Activator.CreateInstance() calls that only accept a default constructor (so we can pass in test variables that apply to the instance). We could also statically specify IInstanceFactory at application startup (Lucene.Net.Support.TypeLoader.SetInstanceFactory(IInstanceFactory)). One major benefit of this is that if one were so inclined, they could use a 3rd party dependency injection container to supply these instances to the NamedSPILoader (which is where they are cached until they are used).

The behavior of the default value of IInstanceFactory (DefaultInstanceFactory) is pretty clear - just wrap the same old call to Activator.CreateInstance(). We can then easily override this behavior in the test environment to inject whatever we need (I am thinking that we could just pass a Func<T> to the constructor so it can access the real-time value of the OLD_FORMAT_IMPERSONATION_IS_ACTIVE variable, which would most likely just be local to each test). We could even use this in other places in Lucene.Net to allow for injection of custom constructor parameters where we currently have fixed parameter signatures.

However, the behavior of the default value of ITypeLoader (DefaultTypeLoader) is less clear. Do we really need to load all of the types from the current AppDomain? If so, how do we control the order of the types and/or assemblies?

Option 1

A possible solution would be to just hard code - pure DI - the Lucene.Net assemblies (specified by type string) in DefaultTypeLoader (or possibly in a custom IComparer<Assembly> dependency?). We could easily control the order this way and if people don't need customization this would perform much better. If they do need customization, they could simply inherit DefaultTypeLoader, override its GetTypes() method, and call base.GetTypes() after specifying the customization types. The only downside to this is that new types are not automatically discovered.

Option 2

Another solution to this would be to load everything from the AppDomain as we do now, but ensure all of the built-in Lucene.Net assemblies are skipped until the very end of the process, which means all custom types are automatically loaded before all built-in types.

Option 3

Same as Option 1, but also have a configuration file of some sort (possibly JSON format...) that could, if supplied, be used to provide the names and order of the assemblies to load. This is actually the closest to how it was done in Java - there is a .classpath file that determines the order in which the packages are loaded, which made it configurable. However, the .classpath is apparently a closer analog to a .csproj file as this order is added to the compiled .jar package. It doesn't seem very reasonable to require someone to hand edit the .csproj file in order to control the dependency order (and I am not sure if the assembly/project reference order is preserved for a .NET compiled assembly anyway).

Option 4

Same as Option 1, but add a configuration setting "AutoDiscoverAssemblies" to plug in the default behavior from Option 2. Basically, default to best performance but allow automatic discovery of new types if the option is enabled.

Option 5

Same as Option 1, but have an alternative AssemblyTypeLoader ITypeLoader implementation with a constructor parameter that accepts an IEnumerable<Assembly> where the user can supply any assemblies to scan for Lucene.Net types (in the proper order). If the DefaultTypeLoader is used, no scanning will happen. But if the user sets the ITypeLoader to AssemblyTypeLoader, the constructor-specified assemblies will be scanned. The advantage here is that we don't need to mess with loading and then ignoring unwanted assemblies from the AppDomain, but we can still scan for types.


So, what would be the most logical default behavior for DefaultInstanceLoader? Is there an alternative way of ordering the assemblies and/or types that we should consider?


Shad Storhaug (NightOwl888)





Re: SPIClassIterator/NamedSPILoader Limitation Issue

Posted by Elad Margalit <el...@gmail.com>.
Option 5 seems to benefit both of the worlds, fast and customizable 

But I'm fine with option 1 which 95% of the users will use anyway



Sent from my iPhone

> On 17 Oct 2016, at 17:24, Shad Storhaug <sh...@shadstorhaug.com> wrote:
> 
> I have completed implementing all but one of the missing Lucene.Net.Core tests, and have started getting some of the new failing tests to pass (was originally over 60, now is more like 35). But one issue I discovered along the way is that some of the test doubles that depended on the OLD_FORMAT_IMPERSONATION_IS_ACTIVE setting being static are not able to function.
> 
> After thinking this through, the reason why that variable was made non-static is because if we run multiple tests in parallel using xUnit they cannot use a static setting that changes for various tests. So we need another solution than going back to the original Java implementation that was static (that is, if we plan to go forward with xUnit...).
> 
> Also, as I previously mentioned (http://apache.markmail.org/search/?q=remaining+list%3Aorg.apache.incubator.lucene-net-dev+order%3Adate-backward+date%3A200603-201610#query:remaining%20list%3Aorg.apache.incubator.lucene-net-dev%20order%3Adate-backward%20date%3A200603-201610+page:1+mid:so67bfksvrebrhru+state:results), the fact that the SPIClassIterator just loads the types from the current AppDomain in random order is an issue for anyone who needs to extend it (as well as a problem for testing).
> 
> The Problem
> 
> Many of the RW codec files (test mocks) require the ability to read the OLD_FORMAT_IMPERSONATION_IS_ACTIVE variable from LuceneTestCase. In Lucene, they did so by reading the setting the variable directly because it was static, so they would have access to the setting in real time.
> 
> Connie solved part of the issue by adding a constructor to the codec test mocks, which allows you to inject this setting into the IndexWriterConfig (an example here: https://github.com/apache/lucenenet/blob/686b75113be7c4bae2daee20afb5f7a70d79a1d5/src/Lucene.Net.Tests/core/Index/TestNumericDocValuesUpdates.cs#L1067). That works for the IndexWriter, however, when using the IndexReader the codec names are read from the index file and loaded directly by NamedSPILoader without any way to intervene or pass anything to the constructor.
> 
> In addition, the ORDER of the classes that are currently loaded by SPIClassIterator is important for the tests. They require that the test codecs are loaded before the production codecs. This is also important if someone wanted to extend the codec for some reason - they will also need their customizations to load first before the internal Lucene classes.
> 
> Proposed Solution
> 
> We could add a correlation to the Java ClassLoader to Lucene.Net, called ITypeLoader. The constructor and method overloads from Lucene that accept ClassLoader were left out of Lucene.Net, which is one reason we are missing this functionality. The main purpose of ITypeLoader would be to load and ORDER the types that are iterated by the SPIClassIterator. ITypeLoader would be set statically at application startup (Lucene.Net.Support.TypeLoader.SetTypeLoader(ITypeLoader)) and will also have a default value (DefaultTypeLoader) if no customization is needed.
> 
> The existing SPIClassIterator would be refactored to only iterate and not cache the types. It will also be passed a parameter (as it was in Lucene) to filter for types that directly inherit the passed in type. The caching of types could be moved to a decorator ITypeLoader instead, but we should remove the constraints that we currently have on constructor signatures.
> 
> In addition, we could have an IInstanceFactory (abstract factory) that could be implemented in order to provide additional dependencies/configuration to types that are customized, similar to what is specified here: http://blog.ploeh.dk/2014/05/19/di-friendly-framework/.  Its purpose would be to replace the current Activator.CreateInstance() calls that only accept a default constructor (so we can pass in test variables that apply to the instance). We could also statically specify IInstanceFactory at application startup (Lucene.Net.Support.TypeLoader.SetInstanceFactory(IInstanceFactory)). One major benefit of this is that if one were so inclined, they could use a 3rd party dependency injection container to supply these instances to the NamedSPILoader (which is where they are cached until they are used).
> 
> The behavior of the default value of IInstanceFactory (DefaultInstanceFactory) is pretty clear - just wrap the same old call to Activator.CreateInstance(). We can then easily override this behavior in the test environment to inject whatever we need (I am thinking that we could just pass a Func<T> to the constructor so it can access the real-time value of the OLD_FORMAT_IMPERSONATION_IS_ACTIVE variable, which would most likely just be local to each test). We could even use this in other places in Lucene.Net to allow for injection of custom constructor parameters where we currently have fixed parameter signatures.
> 
> However, the behavior of the default value of ITypeLoader (DefaultTypeLoader) is less clear. Do we really need to load all of the types from the current AppDomain? If so, how do we control the order of the types and/or assemblies?
> 
> Option 1
> 
> A possible solution would be to just hard code - pure DI - the Lucene.Net assemblies (specified by type string) in DefaultTypeLoader (or possibly in a custom IComparer<Assembly> dependency?). We could easily control the order this way and if people don't need customization this would perform much better. If they do need customization, they could simply inherit DefaultTypeLoader, override its GetTypes() method, and call base.GetTypes() after specifying the customization types. The only downside to this is that new types are not automatically discovered.
> 
> Option 2
> 
> Another solution to this would be to load everything from the AppDomain as we do now, but ensure all of the built-in Lucene.Net assemblies are skipped until the very end of the process, which means all custom types are automatically loaded before all built-in types.
> 
> Option 3
> 
> Same as Option 1, but also have a configuration file of some sort (possibly JSON format...) that could, if supplied, be used to provide the names and order of the assemblies to load. This is actually the closest to how it was done in Java - there is a .classpath file that determines the order in which the packages are loaded, which made it configurable. However, the .classpath is apparently a closer analog to a .csproj file as this order is added to the compiled .jar package. It doesn't seem very reasonable to require someone to hand edit the .csproj file in order to control the dependency order (and I am not sure if the assembly/project reference order is preserved for a .NET compiled assembly anyway).
> 
> Option 4
> 
> Same as Option 1, but add a configuration setting "AutoDiscoverAssemblies" to plug in the default behavior from Option 2. Basically, default to best performance but allow automatic discovery of new types if the option is enabled.
> 
> Option 5
> 
> Same as Option 1, but have an alternative AssemblyTypeLoader ITypeLoader implementation with a constructor parameter that accepts an IEnumerable<Assembly> where the user can supply any assemblies to scan for Lucene.Net types (in the proper order). If the DefaultTypeLoader is used, no scanning will happen. But if the user sets the ITypeLoader to AssemblyTypeLoader, the constructor-specified assemblies will be scanned. The advantage here is that we don't need to mess with loading and then ignoring unwanted assemblies from the AppDomain, but we can still scan for types.
> 
> 
> So, what would be the most logical default behavior for DefaultInstanceLoader? Is there an alternative way of ordering the assemblies and/or types that we should consider?
> 
> 
> Shad Storhaug (NightOwl888)
> 
> 
> 
>