You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bayu Widyasanyata <bw...@gmail.com> on 2013/10/28 17:12:59 UTC

Replace document title with filename if it's empty

Hi,

I just found that some of PDFs files crawled has no (empty) 'title'
metadata.
How to define or fetch the filename, and use it (filename) replacing empty
'title' field?

I didn't found "filename" field on schema.xml, and don't know how to make
conditional for above conditions (if title is empty then ....).

Thanks in advance.

-- 
wassalam,
[bayu]

Re: Replace document title with filename if it's empty

Posted by Bayu Widyasanyata <bw...@gmail.com>.
Hi Erick,

Thanks for the info.

Regards,


On Wed, Oct 30, 2013 at 8:01 AM, Erick Erickson <er...@gmail.com>wrote:

> You can write a custom bit of update code that lives on the Solr server
> that
> would essentially copy the filename field to title if title wasn't present.
>
> You could write a SolrJ program that does the Tika processing and add it
> before you sent the doc, see:
> http://searchhub.org/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
>
> On Mon, Oct 28, 2013 at 12:12 PM, Bayu Widyasanyata <
> bwidyasanyata@gmail.com
> > wrote:
>
> > Hi,
> >
> > I just found that some of PDFs files crawled has no (empty) 'title'
> > metadata.
> > How to define or fetch the filename, and use it (filename) replacing
> empty
> > 'title' field?
> >
> > I didn't found "filename" field on schema.xml, and don't know how to make
> > conditional for above conditions (if title is empty then ....).
> >
> > Thanks in advance.
> >
> > --
> > wassalam,
> > [bayu]
> >
>



-- 
wassalam,
[bayu]

Re: Replace document title with filename if it's empty

Posted by Erick Erickson <er...@gmail.com>.
You can write a custom bit of update code that lives on the Solr server that
would essentially copy the filename field to title if title wasn't present.

You could write a SolrJ program that does the Tika processing and add it
before you sent the doc, see:
http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick


On Mon, Oct 28, 2013 at 12:12 PM, Bayu Widyasanyata <bwidyasanyata@gmail.com
> wrote:

> Hi,
>
> I just found that some of PDFs files crawled has no (empty) 'title'
> metadata.
> How to define or fetch the filename, and use it (filename) replacing empty
> 'title' field?
>
> I didn't found "filename" field on schema.xml, and don't know how to make
> conditional for above conditions (if title is empty then ....).
>
> Thanks in advance.
>
> --
> wassalam,
> [bayu]
>