You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Steven White <sw...@gmail.com> on 2016/02/05 21:40:14 UTC

Detecting if a file type is supported or not

Hi everyone,

How do I detect if a file type is supported or not?  Also, how do I detect
if a file type is supported but it cannot be processed because the parser
for it is missing (the required JARs are missing)?

For the missing JAR part, when I tried to parse a JAR file, I got this
exception:



Thanks

Steve

Re: Detecting if a file type is supported or not

Posted by Steven White <sw...@gmail.com>.
*** Sorry, I hit "send" by accident, reposing ***

Hi everyone,

How do I detect if a file type is supported or not?  Also, how do I detect
if a file type is supported but it cannot be processed because the parser
for it is missing (the required JARs are missing)?

For the missing JAR part, when I tried to parse a JAR file, I got this
exception:

    Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
        at java.lang.reflect.Method.invoke(Method.java:619)
        at
org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
    Caused by: java.lang.NoClassDefFoundError:
org.objectweb.asm.ClassVisitor
        at java.lang.ClassLoader.defineClassImpl(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:306)
        at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:154)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:777)

Is there a way for me to make an API call to find out if the file can be
handled or not instead of depending on the exception?  Depending on the
exception isn't reliable.

Thanks

Steve

On Fri, Feb 5, 2016 at 3:40 PM, Steven White <sw...@gmail.com> wrote:

> Hi everyone,
>
> How do I detect if a file type is supported or not?  Also, how do I detect
> if a file type is supported but it cannot be processed because the parser
> for it is missing (the required JARs are missing)?
>
> For the missing JAR part, when I tried to parse a JAR file, I got this
> exception:
>
>
>
> Thanks
>
> Steve
>

Re: Detecting if a file type is supported or not

Posted by Steven White <sw...@gmail.com>.
To further clarify my question.

1) Why am I getting a NoClassDefFoundError exception from Tika when I ask
it to parse a JAR file?
2) If it is due to missing parser JAR, is there a way I can ask Tika to
tell me so without throwing an exception?

Steve


On Fri, Feb 5, 2016 at 8:40 PM, Steven White <sw...@gmail.com> wrote:

> Hi Nick
>
> I'm asking Tika to parse a JAR file but Tika is throwing a
> NoClassDefFoundError exception (see the full call stack from my original
> email).  Why I'm asking Tika to parse a JAR file?  I have no control over
> file types I will pass it and per
> https://tika.apache.org/1.11/formats.html#Java_class_files_and_archives
> JAR format type is supported.  But in my case, it looks like the issue is a
> missing parser JAR
>
> As of now:
>
> 1) I want Tika to tell me what are the file types it supports (I got the
> answer for htat)
> 2) I want Tika to tell me it cannot parse a file due to missing parser JAR
> (I don't know how to do this)
>
> For #2, all that I'm getting now is a NoClassDefFoundError.  This is not
> good.  Is there a none exception way of asking Tika to tell me if it can
> parse a file or not?
>
> Thanks
>
> Steve
>
> On Fri, Feb 5, 2016 at 6:17 PM, Nick Burch <ap...@gagravarr.org> wrote:
>
>> On Fri, 5 Feb 2016, Steven White wrote:
>>
>>> For the missing JAR part
>>>>>
>>>> Set your Load Error Handler to Warn or Error to find out about parsers
>>>> with missing classes or dependencies
>>>>
>>>
>>> This won't do.  What's happening now is if I give Tika a JAR file to
>>> parse, it is throwing NoClassDefFoundError exception (see my original
>>> posting).
>>>
>>
>> Hang on - are you asking Tika to parse a Jar, or are you asking Tika to
>> use a parser in your jar?
>>
>> Is there a way for me to know that Tika doesn't have the parser for this
>>> type and thus I will not bother to parse it?
>>>
>>
>> If Tika knows that a parser isn't available, it won't use it. If you ask
>> Tika what active parsers it has, it won't include it
>>
>> Nick
>>
>
>

Re: Detecting if a file type is supported or not

Posted by Steven White <sw...@gmail.com>.
Hi Nick

I'm asking Tika to parse a JAR file but Tika is throwing a
NoClassDefFoundError exception (see the full call stack from my original
email).  Why I'm asking Tika to parse a JAR file?  I have no control over
file types I will pass it and per
https://tika.apache.org/1.11/formats.html#Java_class_files_and_archives JAR
format type is supported.  But in my case, it looks like the issue is a
missing parser JAR

As of now:

1) I want Tika to tell me what are the file types it supports (I got the
answer for htat)
2) I want Tika to tell me it cannot parse a file due to missing parser JAR
(I don't know how to do this)

For #2, all that I'm getting now is a NoClassDefFoundError.  This is not
good.  Is there a none exception way of asking Tika to tell me if it can
parse a file or not?

Thanks

Steve

On Fri, Feb 5, 2016 at 6:17 PM, Nick Burch <ap...@gagravarr.org> wrote:

> On Fri, 5 Feb 2016, Steven White wrote:
>
>> For the missing JAR part
>>>>
>>> Set your Load Error Handler to Warn or Error to find out about parsers
>>> with missing classes or dependencies
>>>
>>
>> This won't do.  What's happening now is if I give Tika a JAR file to
>> parse, it is throwing NoClassDefFoundError exception (see my original
>> posting).
>>
>
> Hang on - are you asking Tika to parse a Jar, or are you asking Tika to
> use a parser in your jar?
>
> Is there a way for me to know that Tika doesn't have the parser for this
>> type and thus I will not bother to parse it?
>>
>
> If Tika knows that a parser isn't available, it won't use it. If you ask
> Tika what active parsers it has, it won't include it
>
> Nick
>

Re: Detecting if a file type is supported or not

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 5 Feb 2016, Steven White wrote:
>>> For the missing JAR part
>> Set your Load Error Handler to Warn or Error to find out about parsers
>> with missing classes or dependencies
>
> This won't do.  What's happening now is if I give Tika a JAR file to 
> parse, it is throwing NoClassDefFoundError exception (see my original 
> posting).

Hang on - are you asking Tika to parse a Jar, or are you asking Tika to 
use a parser in your jar?

> Is there a way for me to know that Tika doesn't have the parser for this 
> type and thus I will not bother to parse it?

If Tika knows that a parser isn't available, it won't use it. If you ask 
Tika what active parsers it has, it won't include it

Nick

Re: Detecting if a file type is supported or not

Posted by Steven White <sw...@gmail.com>.
Thanks Nick.

I was able to address all my other issues, except for this:

>> For the missing JAR part
> Set your Load Error Handler to Warn or Error to find out about parsers
with missing classes or dependencies

This won't do.  What's happening now is if I give Tika a JAR file to parse,
it is throwing NoClassDefFoundError exception (see my original posting).
Is there a way for me to know that Tika doesn't have the parser for this
type and thus I will not bother to parse it?

Steve



On Fri, Feb 5, 2016 at 4:38 PM, Nick Burch <ap...@gagravarr.org> wrote:

> On Fri, 5 Feb 2016, Steven White wrote:
>
>> How do I detect if a file type is supported or not?
>>
>
> Run the detection only. If you get anything other than
> application/octet-stream back, Tika was able to detect it
>
> Also, how do I detect if a file type is supported but it cannot be
>> processed because the parser for it is missing (the required JARs are
>> missing)?
>>
>
> Ask Tika what mime types it has parsers for. If it isn't one of those, no
> parser exists.
>
> See the troubleshooting guide for ways to do that from code, the webapp,
> the app etc
>
> For the missing JAR part
>>
>
> Set your Load Error Handler to Warn or Error to find out about parsers
> with missing classes or dependencies
>
> Nick
>

Re: Detecting if a file type is supported or not

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 5 Feb 2016, Steven White wrote:
> How do I detect if a file type is supported or not?

Run the detection only. If you get anything other than 
application/octet-stream back, Tika was able to detect it

> Also, how do I detect if a file type is supported but it cannot be 
> processed because the parser for it is missing (the required JARs are 
> missing)?

Ask Tika what mime types it has parsers for. If it isn't one of those, no 
parser exists.

See the troubleshooting guide for ways to do that from code, the webapp, 
the app etc

> For the missing JAR part

Set your Load Error Handler to Warn or Error to find out about parsers 
with missing classes or dependencies

Nick