You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Andrew (Jira)" <ji...@apache.org> on 2020/08/04 09:16:00 UTC

[jira] [Comment Edited] (SOLR-13973) Deprecate Tika

    [ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170668#comment-17170668 ] 

Andrew edited comment on SOLR-13973 at 8/4/20, 9:15 AM:
--------------------------------------------------------

We are using Solr as a full-text indexing sub-system and the key feature for us - its embedded Tika capabilities (Solr Cell). And yes, we even use it in production for around 4 years and so far, it has been working fine.

It is super convenient to deploy and maintain one single service which can extract and index content of the file-attachments (more technically because our main DB is MongoDB we do use mongo-connector[solr], so actually there are two additional services to be maintained for keeping Solr-index in sync with DB).

 

From other side, being a software developers we do understand the motivation behind this ticket for removing non-core functionality in order to move faster with the core features development, what also makes much sense.

 

If you would like to hear “voice of the customer” - we would highly like to see Solr keep providing such feature as files indexing out of the box. Moving Cell functionality to the stand-alone solution like Tika Server sounds good, but from the other side for those who would like to keep files indexing functionality in place that would mean increased number of sync-integrations & services from 2 to 5:
 * Current setup:
 ** Source DB <-- sync connector (1) --> Solr (2)

_Number of services to be maintained in order to enable Solr (with files indexing) functionality: {color:#00875a}*2*{color}_

 
 * Future potential setup once Solr Cell (Tika) removed:
 ** Source DB <-- sync connector (1) --> Solr (2)
 ** Source DB <-- sync connector (3) --> Tika Server (4)
 ** Tika Server <-- sync connector (5) --> Solr

_Number of services to be maintained in order to enable Solr (with files indexing) functionality: {color:#ff0000}*5*{color}_

 

It is super nice feature of Solr to be able to index files out-of-the-box. Once it is deprecated and eventually removed from the core - having possibility to easily plug-in it without need to maintain additional sync services – would be highly appreciated!

 

Thank you for consideration!


was (Author: andrewgr):
We are using Solr as a full-text indexing sub-system and the key feature for us - its embedded Tika capabilities (Solr Cell). And yes, we even use it in production for around 4 years and so far, it has been working fine.

It is super convenient to deploy and maintain one single service which can extract and index content of the file-attachments (more technically because our main DB is MongoDB we do use mongo-connector[solr], so actually there are two additional services to be maintained for keeping Solr-index in sync with DB).

 

From other side, being a software developers we do understand the motivation behind this ticket for removing non-core functionality in order to move faster with the core features development, what also makes much sense.

 

If you would like to hear “voice of the customer” - we would highly like to see Solr keep providing such feature as files indexing out of the box. Moving Cell functionality to the stand-alone solution like Tika Server sounds good, but from the other side for those who would like to keep files indexing functionality in place that would mean increased number of sync-integrations & services from 2 to 5:
 * Current setup:
 ** Source DB <-- integration connector (1) --> Solr (2)

_Number of services to be maintained in order to enable Solr (with files indexing) functionality: {color:#00875a}*2*{color}_

 
 * Future potential setup once Solr Cell (Tika) removed:
 ** Source DB <-- integration connector (1) --> Solr (2)
 ** Source DB <-- integration connector (3) --> Tika Server (4)
 ** Tika Server <-- integration (5) --> Solr

_Number of services to be maintained in order to enable Solr (with files indexing) functionality: {color:#FF0000}*5*{color}_

 

It is super nice feature of Solr to be able to index files out-of-the-box. Once it is deprecated and eventually removed from the core - having possibility to easily plug-in it without need to maintain additional sync services – would be highly appreciated!

 

Thank you for consideration!

> Deprecate Tika
> --------------
>
>                 Key: SOLR-13973
>                 URL: https://issues.apache.org/jira/browse/SOLR-13973
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: 8.7
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, it should be possible to bring them into third party packages and installed via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org