You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by eShard <zi...@yahoo.com> on 2013/05/15 20:40:35 UTC

How to store the document folder path in solr?

Good afternoon,
I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34
today, a user asked a great question. What if I only know the name of the
folder that the documents are in?
Can I just search on the folder name?
Currently, I'm only indexing documents; how do I capture the folder name (or
full path) and store it?
I have a variety of repositories: web, rss, livelink (I can get the folder
hierarchy for this); I guess indexing a file share would be straight forward
and the path readily available but I haven't been asked to index those yet.
I'll try to run some tests on network file shares...

Can anyone point me in the right direction?

Thanks,




--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-store-the-document-folder-path-in-solr-tp4063581.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to store the document folder path in solr?

Posted by Jack Krupansky <ja...@basetechnology.com>.
You'll have to consult with the ManifoldCF project on exactly what 
parameters they send SolrCell, but here's a raw SolrCell example:

  curl "http://localhost:8983/solr/update/extract?literal.id=doc-1\
  &commit=true&uprefix=attr_" -F "myfile=@HelloWorld.docx"

Query response:

  ...
  <result name="response" numFound="1" start="0">
    <doc>
      <arr name="attr_meta">
        <str>stream_source_info</str>
        <str>myfile</str>
        <str>stream_content_type</str>
        <str>application/octet-stream</str>
        <str>stream_size</str>
        <str>10096</str>
        <str>stream_name</str>
        <str>HelloWorld.docx</str>
        <str>Content-Type</str>
        <str>application/vnd.openxmlformats-officedocument.wordprocessingml.document</str>
      </arr>
      <str name="id">doc-1</str>
      ...
        <arr name="attr_stream_name">
          <str>HelloWorld.docx</str>
        </arr>
      ...

The "meta" metadata element following "stream_name" has the stream name 
which is the file name in this case. Happens to be a relative path here. And 
"attr_stream_name" has the same thing. Solr Cell uses "attr_stream_name" 
only if a field named "stream_name" doesn't exist in your Solr schema.

If you just add a schema field named "stream_name", SolrCell should use it. 
Or, with MCF, add a fmap.stream_name=your-stream-path-field-name parameter 
to the SolrCell parameters.

-- Jack Krupansky

-----Original Message----- 
From: eShard
Sent: Wednesday, May 15, 2013 2:40 PM
To: solr-user@lucene.apache.org
Subject: How to store the document folder path in solr?

Good afternoon,
I'm using solr 4.0 final with manifoldcf v1.2dev on tomcat 7.0.34
today, a user asked a great question. What if I only know the name of the
folder that the documents are in?
Can I just search on the folder name?
Currently, I'm only indexing documents; how do I capture the folder name (or
full path) and store it?
I have a variety of repositories: web, rss, livelink (I can get the folder
hierarchy for this); I guess indexing a file share would be straight forward
and the path readily available but I haven't been asked to index those yet.
I'll try to run some tests on network file shares...

Can anyone point me in the right direction?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-store-the-document-folder-path-in-solr-tp4063581.html
Sent from the Solr - User mailing list archive at Nabble.com.