You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shaun Campbell <ca...@gmail.com> on 2020/07/01 16:19:48 UTC

Searching document content and mult-valued fields

Hi

Been using Solr on a project now for a couple of years and is working well.
It's just a simple index of about 20 - 25 fields and 7,000 project records.

Now there's a requirement to be able to search on the content of documents
(web pages, Word, pdf etc) related to those projects.  My initial thought
was to just create a new index to store the Tika'd content and just search
on that. However, the requirement is to somehow search through both the
project records and the content records at the same time and list the main
project with perhaps some info on the matching content data. I tried to
explain that you may find matching main project records but no content, and
vice versa.

My only solution to this search problem is to either concatenate all the
document content into one field on the main project record, and add that to
my dismax search, and use boosting etc or to use a multi-valued field to
store the content of each project document.  I'm a bit reluctant to do this
as the application is running well and I'm a bit nervous about a change to
the schema and the indexing process.  I just wondered what you thought
about adding a lot of content to an existing schema (single or multivalued
field) that doesn't normally store big amounts of data.

Or does anyone know of any way, I can join two searches like this together
and two separate indexes?

Thanks
Shaun

Re: Searching document content and mult-valued fields

Posted by Emir Arnautović <em...@sematext.com>.
Hi Shaun,
If project content is relatively static, you could use nested documents <https://lucene.apache.org/solr/guide/8_0/indexing-nested-documents.html> or you could plain with join query parser <https://lucene.apache.org/solr/guide/7_3/other-parsers.html#join-query-parser>.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Jul 2020, at 18:19, Shaun Campbell <ca...@gmail.com> wrote:
> 
> Hi
> 
> Been using Solr on a project now for a couple of years and is working well.
> It's just a simple index of about 20 - 25 fields and 7,000 project records.
> 
> Now there's a requirement to be able to search on the content of documents
> (web pages, Word, pdf etc) related to those projects.  My initial thought
> was to just create a new index to store the Tika'd content and just search
> on that. However, the requirement is to somehow search through both the
> project records and the content records at the same time and list the main
> project with perhaps some info on the matching content data. I tried to
> explain that you may find matching main project records but no content, and
> vice versa.
> 
> My only solution to this search problem is to either concatenate all the
> document content into one field on the main project record, and add that to
> my dismax search, and use boosting etc or to use a multi-valued field to
> store the content of each project document.  I'm a bit reluctant to do this
> as the application is running well and I'm a bit nervous about a change to
> the schema and the indexing process.  I just wondered what you thought
> about adding a lot of content to an existing schema (single or multivalued
> field) that doesn't normally store big amounts of data.
> 
> Or does anyone know of any way, I can join two searches like this together
> and two separate indexes?
> 
> Thanks
> Shaun