You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marcel Panse <ma...@gmail.com> on 2011/05/12 17:15:31 UTC

Show filename in search result using a FileListEntityProcessor

Hi Solr community,

I'm new to solr and trying to scan all pdf/doc files in a directory. This
works fine and I am able to scan all documents. The next thing i'm trying to
do is also receiving the filename of the file in the search results. The
filename however never shows up. I tried a couple of things, but the
documentation is not very helpfull about how to do this.

This is my dataConfig:

<dataConfig>
    <dataSource type="BinFileDataSource" name="bin"/>
    <document>
<entity name="f" processor="FileListEntityProcessor" recursive="true"
rootEntity="false"
 dataSource="null"  baseDir="H:/solrtestsmall"
fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)" onError="skip">

<entity name="tika-test" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
                <field column="Author" name="author" meta="true"/>
                <field column="title" name="title" meta="true"/>
                <field column="text" name="text"/>
</entity>
 <field column="fileName" name="fileName"/>
</entity>
    </document>
</dataConfig>


Thanks,
Marcel Panse

Re: Show filename in search result using a FileListEntityProcessor

Posted by Erick Erickson <er...@gmail.com>.
You haven't specified that DIH should put the file name in the document
as it indexes it, i.e. <field column="file" name="${f.fileName}" /> or some
such...

Best
Erick

On Thu, May 12, 2011 at 11:15 AM, Marcel Panse <ma...@gmail.com> wrote:
> Hi Solr community,
>
> I'm new to solr and trying to scan all pdf/doc files in a directory. This
> works fine and I am able to scan all documents. The next thing i'm trying to
> do is also receiving the filename of the file in the search results. The
> filename however never shows up. I tried a couple of things, but the
> documentation is not very helpfull about how to do this.
>
> This is my dataConfig:
>
> <dataConfig>
>    <dataSource type="BinFileDataSource" name="bin"/>
>    <document>
> <entity name="f" processor="FileListEntityProcessor" recursive="true"
> rootEntity="false"
>  dataSource="null"  baseDir="H:/solrtestsmall"
> fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)" onError="skip">
>
> <entity name="tika-test" processor="TikaEntityProcessor"
> url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
>                <field column="Author" name="author" meta="true"/>
>                <field column="title" name="title" meta="true"/>
>                <field column="text" name="text"/>
> </entity>
>  <field column="fileName" name="fileName"/>
> </entity>
>    </document>
> </dataConfig>
>
>
> Thanks,
> Marcel Panse
>

Re: Show filename in search result using a FileListEntityProcessor

Posted by Daniel Rijkhof <da...@gmail.com>.
You should use file instead of fileName in column

<field column="file" name="fileName"/>

Don't forget to add the 'fileName' to the schema.xml in the fields section.

<field name="fileName" type="string" indexed="true" stored="true" />


Have fun,

Daniel Rijkhof
06 12 14 12 17



On Mon, May 16, 2011 at 4:20 PM, Marcel Panse <ma...@gmail.com>wrote:

> Hi, thanks for the reply.
>
> I tried a couple of things both in the tika-test entity and in the entity
> named 'f'.
> In the tika-test entity I tried:
>
> <field column="fileName" name="${f.fileName}" />
> <field column="fileName" name="${f.file}" />
>
> even
>
> <field column="fileName" name="${f.fileAbsolutePath}" />
>
> I also tried doing things in the entity 'f' like:
>
> <field column="fileName" name="fileName"/>
> <field column="fileName" name="file"/>
>
> None of it works. I also added fileName to the schema like:
>
> <field name="fileName" type="string" indexed="true" stored="true" />
>
> In fields. Doesn't help.
>
> Can anyone provide me with a working example? I'm pretty stuck here on
> something that seems really trivial and simple :-(
>
>
>
> On Sat, May 14, 2011 at 22:56, kbootz <kb...@caci.com> wrote:
>
> > There is a JIRA item(can't recall it atm) that addresses the issue with
> the
> > docs. I'm running 3.1 and per your example you should be able to get it
> > using ${f.file}. I think* it should also be in the entity desc. but I'm
> > also
> > new and that's just how I access it.
> >
> > GL
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Show filename in search result using a FileListEntityProcessor

Posted by Marcel Panse <ma...@gmail.com>.
Hi, thanks for the reply.

I tried a couple of things both in the tika-test entity and in the entity
named 'f'.
In the tika-test entity I tried:

<field column="fileName" name="${f.fileName}" />
<field column="fileName" name="${f.file}" />

even

<field column="fileName" name="${f.fileAbsolutePath}" />

I also tried doing things in the entity 'f' like:

<field column="fileName" name="fileName"/>
<field column="fileName" name="file"/>

None of it works. I also added fileName to the schema like:

<field name="fileName" type="string" indexed="true" stored="true" />

In fields. Doesn't help.

Can anyone provide me with a working example? I'm pretty stuck here on
something that seems really trivial and simple :-(



On Sat, May 14, 2011 at 22:56, kbootz <kb...@caci.com> wrote:

> There is a JIRA item(can't recall it atm) that addresses the issue with the
> docs. I'm running 3.1 and per your example you should be able to get it
> using ${f.file}. I think* it should also be in the entity desc. but I'm
> also
> new and that's just how I access it.
>
> GL
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Show filename in search result using a FileListEntityProcessor

Posted by kbootz <kb...@caci.com>.
There is a JIRA item(can't recall it atm) that addresses the issue with the
docs. I'm running 3.1 and per your example you should be able to get it
using ${f.file}. I think* it should also be in the entity desc. but I'm also
new and that's just how I access it.

GL

--
View this message in context: http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html
Sent from the Solr - User mailing list archive at Nabble.com.