You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by eShard <zi...@yahoo.com> on 2013/05/15 20:40:35 UTC
How to store the document folder path in solr?
Good afternoon,
I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34
today, a user asked a great question. What if I only know the name of the
folder that the documents are in?
Can I just search on the folder name?
Currently, I'm only indexing documents; how do I capture the folder name (or
full path) and store it?
I have a variety of repositories: web, rss, livelink (I can get the folder
hierarchy for this); I guess indexing a file share would be straight forward
and the path readily available but I haven't been asked to index those yet.
I'll try to run some tests on network file shares...
Can anyone point me in the right direction?
Thanks,
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-store-the-document-folder-path-in-solr-tp4063581.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to store the document folder path in solr?
Posted by Jack Krupansky <ja...@basetechnology.com>.
You'll have to consult with the ManifoldCF project on exactly what
parameters they send SolrCell, but here's a raw SolrCell example:
curl "http://localhost:8983/solr/update/extract?literal.id=doc-1\
&commit=true&uprefix=attr_" -F "myfile=@HelloWorld.docx"
Query response:
...
<result name="response" numFound="1" start="0">
<doc>
<arr name="attr_meta">
<str>stream_source_info</str>
<str>myfile</str>
<str>stream_content_type</str>
<str>application/octet-stream</str>
<str>stream_size</str>
<str>10096</str>
<str>stream_name</str>
<str>HelloWorld.docx</str>
<str>Content-Type</str>
<str>application/vnd.openxmlformats-officedocument.wordprocessingml.document</str>
</arr>
<str name="id">doc-1</str>
...
<arr name="attr_stream_name">
<str>HelloWorld.docx</str>
</arr>
...
The "meta" metadata element following "stream_name" has the stream name
which is the file name in this case. Happens to be a relative path here. And
"attr_stream_name" has the same thing. Solr Cell uses "attr_stream_name"
only if a field named "stream_name" doesn't exist in your Solr schema.
If you just add a schema field named "stream_name", SolrCell should use it.
Or, with MCF, add a fmap.stream_name=your-stream-path-field-name parameter
to the SolrCell parameters.
-- Jack Krupansky
-----Original Message-----
From: eShard
Sent: Wednesday, May 15, 2013 2:40 PM
To: solr-user@lucene.apache.org
Subject: How to store the document folder path in solr?
Good afternoon,
I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34
today, a user asked a great question. What if I only know the name of the
folder that the documents are in?
Can I just search on the folder name?
Currently, I'm only indexing documents; how do I capture the folder name (or
full path) and store it?
I have a variety of repositories: web, rss, livelink (I can get the folder
hierarchy for this); I guess indexing a file share would be straight forward
and the path readily available but I haven't been asked to index those yet.
I'll try to run some tests on network file shares...
Can anyone point me in the right direction?
Thanks,
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-store-the-document-folder-path-in-solr-tp4063581.html
Sent from the Solr - User mailing list archive at Nabble.com.