You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Kevin Milburn <km...@gpslsolutions.com> on 2012/07/05 19:10:47 UTC
using tika with eclipse
Hi
I've been trying to add tika 1.1 support to an Eclipse RCP application
but am struggling to get the parsers loaded.
I have both tika-core-1.1.jar and tika-bundle-1.1.jar plugins added to
the target and selected within product and have confirmed both plugins
are present in the running program.
The fundamental problem appears to be that the TikaConfig is ultimately
reaching ServiceLoader.findServiceResources, looking for
META-INF/services/org.apache.tika.parser.Parser. While doing so, it
only appears to check the org.apache.tika.core plugin, it doesn't
contain it, so not Parsers are available.
Any ideas where I may have gone wrong or how to get it working?
TIA
Kevin.
Re: using tika with eclipse
Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 18 Jul 2012, rodgersh wrote:
> And here is my custom-mimetypes.xml file:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <mime-info>
> <mime-type type="image/nitf">
> <alias type="image/ntf"/>
> <glob pattern="*.nitf"/>
> </mime-type>
> </mime-info>
I've no idea about OSGi, so I can't comment on what you need to do to have
it look at your extra file. Hopefully one of our OSGi experts can help you
with the appropriate incantation / jar file blessing / etc.
However, I do know about mimetypes in Tika, so I've fixed your problem
that way - see TIKA-957. As of r1363160 Tika should now know about NTIF
files, and should have some mime magic for them (works on the few sample
files I tried)
Nick
Re: using tika with eclipse
Posted by rodgersh <hu...@lmco.com>.
I have a very similar issue, but using Tika on Karaf vs. eclipse.
I am using Tika v1.2 and Karaf v2.2.7 on Windows 7.
I have made an OSGi bundle that uses Tika and provides a
getFileExtensionForMimeType(...) method. I have added a
org/apache/tika/mime/custom-mimetypes.xml file to my src/main/resources
directory. I have made a custom parser and added a
META-INF/services/org.apache.tika.parser.Parser file that lists it (although
I am not trying to use the custom parser yet).
When another bundle invokes this bundle's getFileExtensionForMimeType(...)
method it works for mime types that Tika supports by default, but it does
not find the mime types in my custom-mimetypes.xml file.
It's like this custom mime types file is not found by the OSGi container.
Any help is appreciated.
Here is my method's code:
public String getFileExtensionForMimeType( String contentType ) throws
MimeTypeException
{
//TikaConfig config = TikaConfig.getDefaultConfig(); // this did
not work for custom mime types
TikaConfig config = null;
try
{
config = new TikaConfig( this.getClass().getClassLoader() );
}
catch ( IOException e )
{
logger.warn( "Error creating TikaConfig with ClassLoader", e );
return null;
}
MimeTypes mimeTypes = config.getMimeRepository();
String extension = null;
try
{
MimeType mimeType = mimeTypes.forName( contentType );
extension = mimeType.getExtension();
}
catch ( Exception e )
{
logger.warn( "Exception caught getting file extension for mime
type" + contentType, e );
}
logger.debug( "mimeType = " + contentType + ", file extension = ["
+ extension + "]" );
return extension;
}
And here is my custom-mimetypes.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<mime-info>
<mime-type type="image/nitf">
<alias type="image/ntf"/>
<glob pattern="*.nitf"/>
</mime-type>
</mime-info>
I have verified my input is "image/nitf" mime type. This method worked when
the input was "application/octet-stream", it returned ".bin"
--
View this message in context: http://apache-tika-users.1629097.n2.nabble.com/using-tika-with-eclipse-tp7572799p7572828.html
Sent from the Apache Tika - Users mailing list archive at Nabble.com.
Re: using tika with eclipse
Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 6 Jul 2012, Kevin Milburn wrote:
> It appears my main mistake is trying to use Tika or TikaConfig, like all
> every example I've found has done, which appears to be completely
> incompatible with using Tika in an OSGI environment! :(
>
> e.g. the following produces no output, despite the file containing text.
> Tika tika = new Tika();
> System.out.print(tika.parseToString(new FileInputStream(xmlFile)));
Once you work out the appropriate incantation, any chance you could write
something up for the Tika wiki about it? <http://wiki.apache.org/tika/>
(As you may have gathered, there aren't a lot of people using Tika with
OSGi yet, so the trail you blaze can hopefully help others later!)
Cheers
Nick
Re: using tika with eclipse
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Mon, Jul 16, 2012 at 1:13 PM, Kevin Milburn
<km...@gpslsolutions.com> wrote:
> It would be nice if the Tika and TikaConfig classes had greater awareness of
> the OSGI environment as they currently perform redundant work trying to load
> the services files which they'll never find.
Note that there are cases where people embed the tika-core jar into a
larger bundle that also comes with some of the parser libraries. Or
when a client bundle uses Tika with parser services loaded from the
class loader of the client bundle. In such cases it's a good idea that
also the Java service provider mechanism is used to load services.
And in any case the static service loading is a fairly cheap operation
that's typically only done once during the lifetime of an application
or a bundle.
BR,
Jukka Zitting
Re: using tika with eclipse
Posted by Kevin Milburn <km...@gpslsolutions.com>.
On 2012/07/06 22:31, Jukka Zitting wrote:
> On Fri, Jul 6, 2012 at 7:47 PM, Kevin Milburn
> <km...@gpslsolutions.com> wrote:
>> I've tested this by modifying the tika-core/pom.xml (see attached), and
>> adding the following line:
>>
>> <Bundle-Activator>
>> org.apache.tika.config.TikaActivator
>> </Bundle-Activator>
>> + <Bundle-ActivationPolicy>lazy</Bundle-ActivationPolicy>
>>
>> Any chance of this for the 1.2 release?
> Sure, I just committed it, see https://issues.apache.org/jira/browse/TIKA-951.
>
Thanks for that, I've tested the latest snapshot (and RC1) and things
behave themselves a lot better.
It would be nice if the Tika and TikaConfig classes had greater
awareness of the OSGI environment as they currently perform redundant
work trying to load the services files which they'll never find.
Thanks again
Kevin.
p.s. For those trying to get Tika to work in Eclipse, you need to do
something along these lines.
Change the Target Definition (or create a new one)
On the Definition tab, add the location of the tika-bundle and
tika-core jars
On the Content tab, make sure the core and bundle plugins are
selected
Set as Target Platform
In each plugin that needs Tika support, add org.apache.tika.core to the
plugins dependencies
Change the Product Configuration (or create a new one),
On the Dependencies tab, add org.apache.tika.core and o.a.t.bundle
On the Configuration tab, add o.a.t.bundle to the Start levels,
and set Auto-Start to true.
On the Overview tab, Test the product by launching a runtime
instance of it.
Re: using tika with eclipse
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Fri, Jul 6, 2012 at 7:47 PM, Kevin Milburn
<km...@gpslsolutions.com> wrote:
> Eclipse is a finicky beast, even if a bundle has an Activator it won't be
> activated if the Bundle-ActivationPolicy is not set, unless the product is
> modified to explicitly auto start the bundle.
Interesting, I didn't know that.
> Ideally, it would be preferable to set the Bundle-ActivationPolicy to lazy
> to allow Eclipse (and others?) to do the right thing without needless
> complication.
Sounds like a good idea!
> I've tested this by modifying the tika-core/pom.xml (see attached), and
> adding the following line:
>
> <Bundle-Activator>
> org.apache.tika.config.TikaActivator
> </Bundle-Activator>
> + <Bundle-ActivationPolicy>lazy</Bundle-ActivationPolicy>
>
> Any chance of this for the 1.2 release?
Sure, I just committed it, see https://issues.apache.org/jira/browse/TIKA-951.
> p.s. an alternative method of obtaining access to the Detector and Parser
> involves something like this in your own bundles activator:
The reason why we use ServiceTrackers instead is that we want to
support deployments where new parser and detector services can be
added or removed dynamically from the running system.
BR,
Jukka Zitting
Re: using tika with eclipse
Posted by Kevin Milburn <km...@gpslsolutions.com>.
On 2012/07/06 17:43, Jukka Zitting wrote:
> You'll want to make sure that both the tika-bundle and tika-core
> bundles are actually started/activated by the OSGi environment, as
> otherwise the relevant Activators that Tika uses to hook up with the
> available services won't get started.
Bingo, having spent much time on why the Parsers were not behaving,
it's actually the tika-core bunde that is not activating.
Eclipse is a finicky beast, even if a bundle has an Activator it won't
be activated if the Bundle-ActivationPolicy is not set, unless the
product is modified to explicitly auto start the bundle.
Ideally, it would be preferable to set the Bundle-ActivationPolicy to
lazy to allow Eclipse (and others?) to do the right thing without
needless complication.
I've tested this by modifying the tika-core/pom.xml (see attached), and
adding the following line:
<Bundle-Activator>
org.apache.tika.config.TikaActivator
</Bundle-Activator>
+ <Bundle-ActivationPolicy>lazy</Bundle-ActivationPolicy>
Any chance of this for the 1.2 release?
Thanks for the help.
Kevin.
p.s. an alternative method of obtaining access to the Detector and
Parser involves something like this in your own bundles activator:
import org.apache.tika.detect.Detector;
import org.apache.tika.parser.Parser;
...
@Override
public void start(BundleContext context) throws Exception {
super.start(context);
detector = (Detector)
context.getService(context.getServiceReference(Detector.class.getName()));
parser = (Parser)
context.getService(context.getServiceReference(Parser.class.getName()));
}
Re: using tika with eclipse
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Fri, Jul 6, 2012 at 6:27 PM, Kevin Milburn
<km...@gpslsolutions.com> wrote:
> It appears my main mistake is trying to use Tika or TikaConfig, like all
> every example I've found has done, which appears to be completely
> incompatible with using Tika in an OSGI environment! :(
That shouldn't be the case. What's the code you're using.
You'll want to make sure that both the tika-bundle and tika-core
bundles are actually started/activated by the OSGi environment, as
otherwise the relevant Activators that Tika uses to hook up with the
available services won't get started.
Adding a breakpoint or a System.out print to the
o.a.t.config.TikaActivator class in tika-core and the
o.a.t.parser.internal.Activator class in tika-parsers/-bundle should
help making sure that these Activators really are being invoked by the
OSGi environment.
> e.g. the following produces no output, despite the file containing text.
> Tika tika = new Tika();
> System.out.print(tika.parseToString(new FileInputStream(xmlFile)));
See the BundleIT test case inside the tika-bundle component. That's a
pretty similar piece of code that works fine in an OSGi environment.
BR,
Jukka Zitting
Re: using tika with eclipse
Posted by Kevin Milburn <km...@gpslsolutions.com>.
On 2012/07/06 16:14, Jukka Zitting wrote:
> The tika-bundle should start up Parser and Detector services that
> tika-core will then access through the OSGi framework.
OK, I've done a bit more debugging, and think I know where I've gone
wrong.
Having got a breakpoint in the right place, I can see that the Parser
and Detector services are being generate correctly.
It appears my main mistake is trying to use Tika or TikaConfig, like all
every example I've found has done, which appears to be completely
incompatible with using Tika in an OSGI environment! :(
e.g. the following produces no output, despite the file containing text.
Tika tika = new Tika();
System.out.print(tika.parseToString(new FileInputStream(xmlFile)));
Re: using tika with eclipse
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Fri, Jul 6, 2012 at 5:00 PM, Kevin Milburn
<km...@gpslsolutions.com> wrote:
> On 2012/07/05 18:22, Jukka Zitting wrote:
>> upgrade to the latest 1.2 SNAPSHOT where declarative services is no longer
>> needed (see https://issues.apache.org/jira/browse/TIKA-896).
>
> I've built and installed the 1.2 SNAPSHOT, but it has made no difference.
Hmm, do you start/activate the bundles after deploying them to the
OSGi environment? I've seen some OSGi setups that only resolve bundles
by default, which only makes the contained classes available, but
doesn't start up the services provided by the bundles.
> It still suffers from the same fundamental problem that the ServiceLoader
> (in tika-core) cannot find "META-INF/services/org.apache.tika.parser.Parser"
> (in tika-bundle).
It's not supposed to. The tika-bundle should start up Parser and
Detector services that tika-core will then access through the OSGi
framework.
As you mentioned, OSGi and SPI don't work that well together, which is
why we're using the OSGi services when Tika gets deployed to an OSGi
environment.
BR,
Jukka Zitting
Re: using tika with eclipse
Posted by Kevin Milburn <km...@gpslsolutions.com>.
On 2012/07/05 18:22, Jukka Zitting wrote:
> upgrade to the latest 1.2 SNAPSHOT where declarative services is no
> longer needed (see https://issues.apache.org/jira/browse/TIKA-896).
I've built and installed the 1.2 SNAPSHOT, but it has made no difference.
It still suffers from the same fundamental problem that the
ServiceLoader (in tika-core) cannot find
"META-INF/services/org.apache.tika.parser.Parser" (in tika-bundle).
Is there any guidance anywhere on how to setup an eclipse RCP
application to use the bundles?
Kevin..
Re: using tika with eclipse
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Thu, Jul 5, 2012 at 7:10 PM, Kevin Milburn
<km...@gpslsolutions.com> wrote:
> Any ideas where I may have gone wrong or how to get it working?
In an OSGi environment Tika makes the Parser and Detector
implementations available as OSGi services that tika-core then
automatically picks up for use with things like AutoDetectParser and
the Tika facade.
In 1.1 you need declarative services support for that to happen, which
is probably why you don't see the parsers coming up in your
deployment. You can either deploy Tika 1.1 with declarative services,
or upgrade to the latest 1.2 SNAPSHOT where declarative services is no
longer needed (see https://issues.apache.org/jira/browse/TIKA-896).
BR,
Jukka Zitting
Re: using tika with eclipse
Posted by Kevin Milburn <km...@gpslsolutions.com>.
On 2012/07/05 18:26, Uwe Schindler wrote:
> Do you have the JAR files in classpath or do you extract them and merge all
> class files and resources? This happens, e.g. if you ask Eclipse to create
> one uber-jar containing everything. The problem that then appears is, that
> every meta-inf file coming from separate jar files are overwriting each
> over. SPI is relying on actual jar packages as deployment units.
The JAR files (which are pulled from a Maven repository) have been added
to the plugins section of the RCP product and are both loaded (i.e. on
the apps classpath). The problem stems from the tika-bundle not being
on the classpath of tika-core bundle.
I could repackage the tika-core and tika-bundle into a single
OSGI-Bundle, effectively replicating the bundle before the 1.0
release. However, this would seem to defeat the purpose of the
OSGi-bundles provided by the tika project.
Also, From what I can gather, SPI is the cause of the problem, as OSGI
and SPI are largely incompatible.
RE: using tika with eclipse
Posted by Uwe Schindler <uw...@thetaphi.de>.
Do you have the JAR files in classpath or do you extract them and merge all
class files and resources? This happens, e.g. if you ask Eclipse to create
one uber-jar containing everything. The problem that then appears is, that
every meta-inf file coming from separate jar files are overwriting each
over. SPI is relying on actual jar packages as deployment units.
If you only add the unmodified jar files to classpath, this should work. The
same applies by the way for Solr and Lucene 4.0, which also use SPI for
their codec infrastructure.
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: Kevin Milburn [mailto:kmilburn@gpslsolutions.com]
> Sent: Thursday, July 05, 2012 7:11 PM
> To: user@tika.apache.org
> Subject: using tika with eclipse
>
> Hi
>
> I've been trying to add tika 1.1 support to an Eclipse RCP application
but am
> struggling to get the parsers loaded.
> I have both tika-core-1.1.jar and tika-bundle-1.1.jar plugins added to the
target
> and selected within product and have confirmed both plugins are present in
the
> running program.
>
> The fundamental problem appears to be that the TikaConfig is ultimately
> reaching ServiceLoader.findServiceResources, looking for
> META-INF/services/org.apache.tika.parser.Parser. While doing so, it
> only appears to check the org.apache.tika.core plugin, it doesn't contain
it, so
> not Parsers are available.
>
> Any ideas where I may have gone wrong or how to get it working?
>
> TIA
> Kevin.