You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Stephan Mühlstrasser (Created JIRA)" <ji...@apache.org> on 2012/02/17 13:37:59 UTC

[jira] [Created] (TIKA-866) Incomplete configuration file causes OutOfMemoryException

Incomplete configuration file causes OutOfMemoryException
---------------------------------------------------------

                 Key: TIKA-866
                 URL: https://issues.apache.org/jira/browse/TIKA-866
             Project: Tika
          Issue Type: Bug
          Components: config
    Affects Versions: 1.0
            Reporter: Stephan Mühlstrasser
            Priority: Minor


I tried to override a built-in parser according to the method described in issue TIKA-527. During testing this approach I used an incomplete configuration file (as far as I learned from a discussion on the mailing list also mimetypes and a detector should be specified):

$ cat tika-config.xml
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
</parsers>
</properties>

Using this configuration file causes an OutOfMemoryException:

$ java -Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.<init>(String.java:216)
        at java.lang.StringBuilder.toString(StringBuilder.java:430)
        at org.apache.tika.mime.MediaType.toString(MediaType.java:237)
        at org.apache.tika.detect.MagicDetector.<init>(MagicDetector.java:142)
        at org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:254)
        at org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:202)
        at org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:186)
        at org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:152)
        at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:124)
        at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:107)
        at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:63)
        at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:91)
        at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:147)
        at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:455)
        at org.apache.tika.config.TikaConfig.typesFromDomElement(TikaConfig.java:273)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161)
        at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
        at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
        at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
        at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at java.lang.Class.newInstance0(Class.java:355)
        at java.lang.Class.newInstance(Class.java:308)
        at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:288)
        at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:162)
        at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
        at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
        at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
        at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 

Expected behavior: If the configuration file is not valid, and appropriate exception should be produced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (TIKA-866) Invalid configuration file causes OutOfMemoryException

Posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210429#comment-13210429 ] 

Jukka Zitting commented on TIKA-866:
------------------------------------

Actually, scrap the above rationale. The DefaultParser is OK for inclusion in a configuration file (that's actually what it was designed for, see TIKA-527), it's just AutoDetectParser that wouldn't work well with that mechanism. The infinite loop triggered by DefaultParser was rather a result of an unnecessary getDefaultConfig() call in MediaTypeRegistry.getDefaultRegistry().

I replaced that call and restored the ability to use DefaultParser in configuration in revision 1245692. And as discussed above, I also improved the config code use the default parser or detector loading mechanism when no explicit <parser> or <detector> entries are present in a configuration file. A missing mimetypes entry was already being handled by loading the default settings, which was the original cause of the OOM as explained above.
                
> Invalid configuration file causes OutOfMemoryException
> ------------------------------------------------------
>
>                 Key: TIKA-866
>                 URL: https://issues.apache.org/jira/browse/TIKA-866
>             Project: Tika
>          Issue Type: Bug
>          Components: config
>    Affects Versions: 1.0
>            Reporter: Stephan Mühlstrasser
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: ConfigFile.java
>
>
> I tried to override a built-in parser according to the method described in issue TIKA-527. During testing this approach I used an incomplete configuration file (as far as I learned from a discussion on the mailing list also mimetypes and a detector should be specified):
> $ cat tika-config.xml
> <properties>
> <parsers>
> <parser class="org.apache.tika.parser.DefaultParser"/>
> </parsers>
> </properties>
> Using this configuration file causes an OutOfMemoryException:
> $ java -Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3209)
>         at java.lang.String.<init>(String.java:216)
>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>         at org.apache.tika.mime.MediaType.toString(MediaType.java:237)
>         at org.apache.tika.detect.MagicDetector.<init>(MagicDetector.java:142)
>         at org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:254)
>         at org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:202)
>         at org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:186)
>         at org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:152)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:124)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:107)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:63)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:91)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:147)
>         at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:455)
>         at org.apache.tika.config.TikaConfig.typesFromDomElement(TikaConfig.java:273)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at java.lang.Class.newInstance0(Class.java:355)
>         at java.lang.Class.newInstance(Class.java:308)
>         at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:288)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:162)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 
> Expected behavior: If the configuration file is not valid, and appropriate exception should be produced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (TIKA-866) Invalid configuration file causes OutOfMemoryException

Posted by "Jukka Zitting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-866:
-------------------------------

    Summary: Invalid configuration file causes OutOfMemoryException  (was: Incomplete configuration file causes OutOfMemoryException)

The problem here aren't the missing mimetypes and detector entries but rather the fact that you're specifying the composite DefaultParser class in a <parser> element. When a DefaultParser instance is created, it tries to load the *default* configuration, which in this case leads to the infinite loop. The <parser> elements are designed only to list the actual format-specific Parser implementations, that composite parsers like DefaultParser or AutoDetectParser can then use as lower-level components.
                
> Invalid configuration file causes OutOfMemoryException
> ------------------------------------------------------
>
>                 Key: TIKA-866
>                 URL: https://issues.apache.org/jira/browse/TIKA-866
>             Project: Tika
>          Issue Type: Bug
>          Components: config
>    Affects Versions: 1.0
>            Reporter: Stephan Mühlstrasser
>            Priority: Minor
>         Attachments: ConfigFile.java
>
>
> I tried to override a built-in parser according to the method described in issue TIKA-527. During testing this approach I used an incomplete configuration file (as far as I learned from a discussion on the mailing list also mimetypes and a detector should be specified):
> $ cat tika-config.xml
> <properties>
> <parsers>
> <parser class="org.apache.tika.parser.DefaultParser"/>
> </parsers>
> </properties>
> Using this configuration file causes an OutOfMemoryException:
> $ java -Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3209)
>         at java.lang.String.<init>(String.java:216)
>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>         at org.apache.tika.mime.MediaType.toString(MediaType.java:237)
>         at org.apache.tika.detect.MagicDetector.<init>(MagicDetector.java:142)
>         at org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:254)
>         at org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:202)
>         at org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:186)
>         at org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:152)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:124)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:107)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:63)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:91)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:147)
>         at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:455)
>         at org.apache.tika.config.TikaConfig.typesFromDomElement(TikaConfig.java:273)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at java.lang.Class.newInstance0(Class.java:355)
>         at java.lang.Class.newInstance(Class.java:308)
>         at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:288)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:162)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 
> Expected behavior: If the configuration file is not valid, and appropriate exception should be produced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (TIKA-866) Incomplete configuration file causes OutOfMemoryException

Posted by "Stephan Mühlstrasser (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephan Mühlstrasser updated TIKA-866:
--------------------------------------

    Attachment: ConfigFile.java

Unit test to reproduce the problem.
                
> Incomplete configuration file causes OutOfMemoryException
> ---------------------------------------------------------
>
>                 Key: TIKA-866
>                 URL: https://issues.apache.org/jira/browse/TIKA-866
>             Project: Tika
>          Issue Type: Bug
>          Components: config
>    Affects Versions: 1.0
>            Reporter: Stephan Mühlstrasser
>            Priority: Minor
>         Attachments: ConfigFile.java
>
>
> I tried to override a built-in parser according to the method described in issue TIKA-527. During testing this approach I used an incomplete configuration file (as far as I learned from a discussion on the mailing list also mimetypes and a detector should be specified):
> $ cat tika-config.xml
> <properties>
> <parsers>
> <parser class="org.apache.tika.parser.DefaultParser"/>
> </parsers>
> </properties>
> Using this configuration file causes an OutOfMemoryException:
> $ java -Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3209)
>         at java.lang.String.<init>(String.java:216)
>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>         at org.apache.tika.mime.MediaType.toString(MediaType.java:237)
>         at org.apache.tika.detect.MagicDetector.<init>(MagicDetector.java:142)
>         at org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:254)
>         at org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:202)
>         at org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:186)
>         at org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:152)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:124)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:107)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:63)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:91)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:147)
>         at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:455)
>         at org.apache.tika.config.TikaConfig.typesFromDomElement(TikaConfig.java:273)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at java.lang.Class.newInstance0(Class.java:355)
>         at java.lang.Class.newInstance(Class.java:308)
>         at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:288)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:162)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 
> Expected behavior: If the configuration file is not valid, and appropriate exception should be produced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (TIKA-866) Invalid configuration file causes OutOfMemoryException

Posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-866.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
         Assignee: Jukka Zitting

Fixed in revision 1245445 by adding an explicit check against composite parsers in <parser> entries.
                
> Invalid configuration file causes OutOfMemoryException
> ------------------------------------------------------
>
>                 Key: TIKA-866
>                 URL: https://issues.apache.org/jira/browse/TIKA-866
>             Project: Tika
>          Issue Type: Bug
>          Components: config
>    Affects Versions: 1.0
>            Reporter: Stephan Mühlstrasser
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: ConfigFile.java
>
>
> I tried to override a built-in parser according to the method described in issue TIKA-527. During testing this approach I used an incomplete configuration file (as far as I learned from a discussion on the mailing list also mimetypes and a detector should be specified):
> $ cat tika-config.xml
> <properties>
> <parsers>
> <parser class="org.apache.tika.parser.DefaultParser"/>
> </parsers>
> </properties>
> Using this configuration file causes an OutOfMemoryException:
> $ java -Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3209)
>         at java.lang.String.<init>(String.java:216)
>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>         at org.apache.tika.mime.MediaType.toString(MediaType.java:237)
>         at org.apache.tika.detect.MagicDetector.<init>(MagicDetector.java:142)
>         at org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:254)
>         at org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:202)
>         at org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:186)
>         at org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:152)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:124)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:107)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:63)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:91)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:147)
>         at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:455)
>         at org.apache.tika.config.TikaConfig.typesFromDomElement(TikaConfig.java:273)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at java.lang.Class.newInstance0(Class.java:355)
>         at java.lang.Class.newInstance(Class.java:308)
>         at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:288)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:162)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 
> Expected behavior: If the configuration file is not valid, and appropriate exception should be produced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (TIKA-866) Incomplete configuration file causes OutOfMemoryException

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210219#comment-13210219 ] 

Nick Burch commented on TIKA-866:
---------------------------------

If the Tika Config file is missing elements (eg only has a parsers definition), then we have two choices:

 * Use the objects that TikaConfig.getDefaultConfig() would have done for the missing ones
 * Throw a helpful exception

I'm leaning towards the former, as that makes it easier for us to expand the config file without breaking things for existing users. Can anyone see a problem with that idea?
                
> Incomplete configuration file causes OutOfMemoryException
> ---------------------------------------------------------
>
>                 Key: TIKA-866
>                 URL: https://issues.apache.org/jira/browse/TIKA-866
>             Project: Tika
>          Issue Type: Bug
>          Components: config
>    Affects Versions: 1.0
>            Reporter: Stephan Mühlstrasser
>            Priority: Minor
>         Attachments: ConfigFile.java
>
>
> I tried to override a built-in parser according to the method described in issue TIKA-527. During testing this approach I used an incomplete configuration file (as far as I learned from a discussion on the mailing list also mimetypes and a detector should be specified):
> $ cat tika-config.xml
> <properties>
> <parsers>
> <parser class="org.apache.tika.parser.DefaultParser"/>
> </parsers>
> </properties>
> Using this configuration file causes an OutOfMemoryException:
> $ java -Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3209)
>         at java.lang.String.<init>(String.java:216)
>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>         at org.apache.tika.mime.MediaType.toString(MediaType.java:237)
>         at org.apache.tika.detect.MagicDetector.<init>(MagicDetector.java:142)
>         at org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:254)
>         at org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:202)
>         at org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:186)
>         at org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:152)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:124)
>         at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:107)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:63)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:91)
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:147)
>         at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:455)
>         at org.apache.tika.config.TikaConfig.typesFromDomElement(TikaConfig.java:273)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at java.lang.Class.newInstance0(Class.java:355)
>         at java.lang.Class.newInstance(Class.java:308)
>         at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:288)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:162)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237)
>         at org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42)
>         at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52)
>         at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 
> Expected behavior: If the configuration file is not valid, and appropriate exception should be produced.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira