You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alexandre Rafalovitch (JIRA)" <ji...@apache.org> on 2017/03/09 03:56:38 UTC

[jira] [Updated] (SOLR-9601) DIH: Radicially simplify Tika example to only show relevant configuration

     [ https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexandre Rafalovitch updated SOLR-9601:
----------------------------------------
    Attachment: tika2_20170308.tgz

It is a little hard to generate a readable DIFF between the original Tika example and one I created. So, for ease of testing, I just created it as a separate *tika2* core that can be dropped next to the other DIH cores.

I removed all of the unused gunk, so the remaining files are tiny. I wish I could remove the infoStream section, but the default is false and I am not sure I should.

I've also added a prototype-oriented demo of wildcard, renamed and simplified text field definition and did other minor cleanup in what is left.

I am not sure if I need to worry about docValues here. 

Also, I have commented out uniqueKey section, but the corresponding *id* field definition is missing. But it was missing in the original example too, so I am not sure it is worth adding in the commented out section. 

This is a big change (even if with tiny results files), so I would appreciate people commenting on it before I actually commit it.

> DIH: Radicially simplify Tika example to only show relevant configuration
> -------------------------------------------------------------------------
>
>                 Key: SOLR-9601
>                 URL: https://issues.apache.org/jira/browse/SOLR-9601
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction)
>    Affects Versions: 6.x, master (7.0)
>            Reporter: Alexandre Rafalovitch
>            Assignee: Alexandre Rafalovitch
>              Labels: examples, usability
>         Attachments: tika2_20170308.tgz
>
>
> Solr DIH examples are legacy examples to show how DIH work. However, they include full configurations that may obscure teaching points. This is no longer needed as we have 3 full-blown examples in the configsets. 
> Specifically for Tika, the field types definitions were at some point simplified to have less support files in the configuration directory. This, however, means that we now have field definitions that have same names as other examples, but different definitions. 
> Importantly, Tika does not use most (any?) of those modified definitions. They are there just for completeness. Similarly, the solrconfig.xml includes extract handler even though we are demonstrating a different path of using Tika. Somebody grepping through config files may get confused about what configuration aspects contributes to what experience.
> I am planning to significantly simplify configuration and schema of Tika example to **only** show DIH Tika extraction path. It will end-up a very short and focused example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org