You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Trevor Watson <tw...@datassimilate.com> on 2012/02/29 18:06:55 UTC

Tika/IKVM Crashing after Assembly.GetExportedTypes()

I'm not sure if this is Tika related or more to the IKVM side.  But I'm 
going to post this here just in case as well.

Hello!

I've been trying to use Tika via IKVM to extract the contents of text 
files. With some help from this mailing list (thanks guys!) i've got it 
reading a MS Word (doc) file renamed to something odd (and that was the 
goal of using Tika over IFilters)

Our project includes the ability to add plug-ins (that we write) to 
process files that aren't handled by IFilters or Tika. These plugins are 
loaded during run-time. We use the Assembly.GetExportedTypes to make 
sure that the DLLs that we loaded are valid plugins. However, after 
calling asm.GetExportedTypes() Tika/IKVM no longer works and crashes 
with an odd exception.

We're using code found online called TikaOnDotNet to use Tika.

The code (C# / .NET 4.0) is as follows
--------------------------------------------------------------------------------------------------------- 

// Create and test Tika extractor
TikaOnDotNet.TextExtractor _cut = new TikaOnDotNet.TextExtractor();
TikaOnDotNet.TextExtractionResult result = 
_cut.Extract(@"D:\Work\NamedWrong\What you need for 
distribution_was_doc.qrt");
// Works here

// Works here
System.Reflection.Assembly asm = 
System.Reflection.Assembly.LoadFrom(file.FullName);
// Works here
foreach (Type t in asm.GetExportedTypes())
// Calling asm.GetExportedTypes() breaks Tika
--------------------------------------------------------------------------------------------------------- 


In the TextExtractor.cs file from TikaOnDotNet, the crash occurs when 
trying to load an AutoDetectParser (which when stepping through loads 
the ClassLoader from the MyClassLoader.cs class)

--------------------------------------------------------------------------------------------------------- 

var parser = new AutoDetectParser(); // Crashes on this line
--------------------------------------------------------------------------------------------------------- 



The error is as follows
--------------------------------------------------------------------------------------------------------- 



FactoryConfigurationError was unhandled

{"Provider ???\0\0\0?\0\0\0)System.Resources.ResourceReader, 
mscorlibsSystem.Resources.RuntimeResourceSet, mscorlib, 
Version=1.0.5000.0, Culture=neutral, 
PublicKeyToken=b77a5c561934e089\0\0\0\0\0\0\0\0\0]System.Byte[], 
mscorlib, Version=1.0.5000.0, Culture=neutral, 
PublicKeyToken=b77a5c561934e089PADP?nY\0\0\0\0\0-\0\0l\0z\0\0\0\0\0\0\0\0\0\0????\0\0\0\0\0\0\0\0\0\0)\0\0\0g??q?? 
not found"}


at javax.xml.parsers.DocumentBuilderFactory.newInstance()
at org.apache.tika.mime.MimeTypesReader.read(InputStream )
at org.apache.tika.mime.MimeTypesFactory.create(InputStream inputStream)
at org.apache.tika.mime.MimeTypesFactory.create(URL url)
at org.apache.tika.mime.MimeTypesFactory.create(String filePath)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes()
at org.apache.tika.config.TikaConfig..ctor(CompositeParser )
at org.apache.tika.config.TikaConfig..ctor()
at org.apache.tika.config.TikaConfig.getDefaultConfig()
at org.apache.tika.parser.AutoDetectParser..ctor()
at TikaOnDotNet.TextExtractor.Extract(String filePath) in 
C:\<project>\Tika\TextExtractor.cs:line 43
--------------------------------------------------------------------------------------------------------- 


Any assistance would be greatly appreciated.

Trevor Watson