You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marcel Panse <ma...@gmail.com> on 2011/05/12 17:15:31 UTC
Show filename in search result using a FileListEntityProcessor
Hi Solr community,
I'm new to solr and trying to scan all pdf/doc files in a directory. This
works fine and I am able to scan all documents. The next thing i'm trying to
do is also receiving the filename of the file in the search results. The
filename however never shows up. I tried a couple of things, but the
documentation is not very helpfull about how to do this.
This is my dataConfig:
<dataConfig>
<dataSource type="BinFileDataSource" name="bin"/>
<document>
<entity name="f" processor="FileListEntityProcessor" recursive="true"
rootEntity="false"
dataSource="null" baseDir="H:/solrtestsmall"
fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)" onError="skip">
<entity name="tika-test" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
<field column="Author" name="author" meta="true"/>
<field column="title" name="title" meta="true"/>
<field column="text" name="text"/>
</entity>
<field column="fileName" name="fileName"/>
</entity>
</document>
</dataConfig>
Thanks,
Marcel Panse
Re: Show filename in search result using a FileListEntityProcessor
Posted by Erick Erickson <er...@gmail.com>.
You haven't specified that DIH should put the file name in the document
as it indexes it, i.e. <field column="file" name="${f.fileName}" /> or some
such...
Best
Erick
On Thu, May 12, 2011 at 11:15 AM, Marcel Panse <ma...@gmail.com> wrote:
> Hi Solr community,
>
> I'm new to solr and trying to scan all pdf/doc files in a directory. This
> works fine and I am able to scan all documents. The next thing i'm trying to
> do is also receiving the filename of the file in the search results. The
> filename however never shows up. I tried a couple of things, but the
> documentation is not very helpfull about how to do this.
>
> This is my dataConfig:
>
> <dataConfig>
> <dataSource type="BinFileDataSource" name="bin"/>
> <document>
> <entity name="f" processor="FileListEntityProcessor" recursive="true"
> rootEntity="false"
> dataSource="null" baseDir="H:/solrtestsmall"
> fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)" onError="skip">
>
> <entity name="tika-test" processor="TikaEntityProcessor"
> url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
> <field column="Author" name="author" meta="true"/>
> <field column="title" name="title" meta="true"/>
> <field column="text" name="text"/>
> </entity>
> <field column="fileName" name="fileName"/>
> </entity>
> </document>
> </dataConfig>
>
>
> Thanks,
> Marcel Panse
>
Re: Show filename in search result using a FileListEntityProcessor
Posted by Daniel Rijkhof <da...@gmail.com>.
You should use file instead of fileName in column
<field column="file" name="fileName"/>
Don't forget to add the 'fileName' to the schema.xml in the fields section.
<field name="fileName" type="string" indexed="true" stored="true" />
Have fun,
Daniel Rijkhof
06 12 14 12 17
On Mon, May 16, 2011 at 4:20 PM, Marcel Panse <ma...@gmail.com>wrote:
> Hi, thanks for the reply.
>
> I tried a couple of things both in the tika-test entity and in the entity
> named 'f'.
> In the tika-test entity I tried:
>
> <field column="fileName" name="${f.fileName}" />
> <field column="fileName" name="${f.file}" />
>
> even
>
> <field column="fileName" name="${f.fileAbsolutePath}" />
>
> I also tried doing things in the entity 'f' like:
>
> <field column="fileName" name="fileName"/>
> <field column="fileName" name="file"/>
>
> None of it works. I also added fileName to the schema like:
>
> <field name="fileName" type="string" indexed="true" stored="true" />
>
> In fields. Doesn't help.
>
> Can anyone provide me with a working example? I'm pretty stuck here on
> something that seems really trivial and simple :-(
>
>
>
> On Sat, May 14, 2011 at 22:56, kbootz <kb...@caci.com> wrote:
>
> > There is a JIRA item(can't recall it atm) that addresses the issue with
> the
> > docs. I'm running 3.1 and per your example you should be able to get it
> > using ${f.file}. I think* it should also be in the entity desc. but I'm
> > also
> > new and that's just how I access it.
> >
> > GL
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
Re: Show filename in search result using a FileListEntityProcessor
Posted by Marcel Panse <ma...@gmail.com>.
Hi, thanks for the reply.
I tried a couple of things both in the tika-test entity and in the entity
named 'f'.
In the tika-test entity I tried:
<field column="fileName" name="${f.fileName}" />
<field column="fileName" name="${f.file}" />
even
<field column="fileName" name="${f.fileAbsolutePath}" />
I also tried doing things in the entity 'f' like:
<field column="fileName" name="fileName"/>
<field column="fileName" name="file"/>
None of it works. I also added fileName to the schema like:
<field name="fileName" type="string" indexed="true" stored="true" />
In fields. Doesn't help.
Can anyone provide me with a working example? I'm pretty stuck here on
something that seems really trivial and simple :-(
On Sat, May 14, 2011 at 22:56, kbootz <kb...@caci.com> wrote:
> There is a JIRA item(can't recall it atm) that addresses the issue with the
> docs. I'm running 3.1 and per your example you should be able to get it
> using ${f.file}. I think* it should also be in the entity desc. but I'm
> also
> new and that's just how I access it.
>
> GL
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: Show filename in search result using a FileListEntityProcessor
Posted by kbootz <kb...@caci.com>.
There is a JIRA item(can't recall it atm) that addresses the issue with the
docs. I'm running 3.1 and per your example you should be able to get it
using ${f.file}. I think* it should also be in the entity desc. but I'm also
new and that's just how I access it.
GL
--
View this message in context: http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html
Sent from the Solr - User mailing list archive at Nabble.com.