You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2015/10/16 11:10:05 UTC

[jira] [Comment Edited] (SOLR-8166) Introduce possibility to configure ParseContext in ExtractingRequestHandler/ExtractingDocumentLoader

    [ https://issues.apache.org/jira/browse/SOLR-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960399#comment-14960399 ] 

Uwe Schindler edited comment on SOLR-8166 at 10/16/15 9:09 AM:
---------------------------------------------------------------

Hi,
we disallow using setAccessible inside reflection throughout Lucene/Solr (cause is Java 9 where this is veeeery limited), so your patch would not pass the code quality checks (forbidden-apis).
I would suggest to add a ParseContextFactory that you can specify in your config and that has to be supplied by the user, implemented as native Java code by the user (using Solr's plugin mechanism).

Alternatively add setters for all ParseContext methods in your parser.


was (Author: thetaphi):
Hi,
we disallow using setAccessible inside reflection throughout Lucene/Solr (cause is Java 9 where this is veeeery limited), so your patch would not pass the code quality checks (forbidden-apis).
I would suggest to add a ParseContextFactory that you can specify in your config and that has to be supplied by the user, implemented as native Java code by the user (using Solr's plugin mechanism).

> Introduce possibility to configure ParseContext in ExtractingRequestHandler/ExtractingDocumentLoader
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8166
>                 URL: https://issues.apache.org/jira/browse/SOLR-8166
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 5.3
>            Reporter: Andriy Binetsky
>
> Actually there is no possibility to hand over some additional configuration by document extracting with ExtractingRequestHandler/ExtractingDocumentLoader.
> For example I need to put org.apache.tika.parser.pdf.PDFParserConfig with "extractInlineImages" set to true in ParseContext to trigger extraction/OCR recognizing of embedded images from pdf. 
> It would be nice to have possibility to configure created ParseContext due xml-config file like TikaConfig does.
> I would suggest to have following:
> solrconfig.xml:
>   <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>     <str name="parseContext.config">parseContext.config</str>
>   </requestHandler>
> parseContext.config:
> <entries>
>   <entry class="org.apache.tika.parser.pdf.PDFParserConfig" value="org.apache.tika.parser.pdf.PDFParserConfig">
>     <property name="extractInlineImages" value="true"/>
>   </entry>
> </entries>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org