You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Cord Thomas <co...@gmail.com> on 2013/05/20 22:18:55 UTC

Question on implementation for schema design - parsing path information into stored field

Hello,

I am submitting rich documents to a SOLR index via Solr Cell.   This is all
working well.

The documents are organized in meaningful folders.  I would like to capture
the folder names in my index so that I can use the folder names to provide
facets.

I can pass the path data into the indexing process and would like to
convert 2 paths deep into indexed and stored data - or copy field data.

Say i have files in these folders:

Financial
Financial/Annual
Financial/Audit
Organizational
Organizational/Offices
Organizational/Staff

I would like to then provide facets using these names.

Can someone please guide me in the right direction on how I might
accomplish this?

Thank you

Cord

Re: Question on implementation for schema design - parsing path information into stored field

Posted by Cord Thomas <co...@gmail.com>.
Thank you Brendan,

I had started to read about the tokenizers and couldn't quite piece
together how it would work.  I will read about this and post my
implementation if successful.

Cord


On Mon, May 20, 2013 at 4:13 PM, Brendan Grainger <
brendan.grainger@gmail.com> wrote:

> Hi Cord,
>
> I think you'd do it like this:
>
> 1. Add this to schema.xml
>
>     <!--
>       Example of using PathHierarchyTokenizerFactory at index time, so
>       queries for paths match documents at that path, or in descendent
> paths
>     -->
>     <fieldType name="descendent_path" class="solr.TextField">
>       <analyzer type="index">
>       <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"
> />
>       </analyzer>
>       <analyzer type="query">
>       <tokenizer class="solr.KeywordTokenizerFactory" />
>       </analyzer>
>     </fieldType>
>
> <field name="folders_facet" type="descendent_path" indexed="true"
> stored="true" multiValued="true" />
>
> 2. When you index add the 'folders' to the folders_facet field (or whatever
> you want to call it).
> 3. Your query would look something like:
>
> http://localhost:8982/solr/
> <core_name>/select?facet=on&facet.field=folders_facet&facet.mincount=1&....
>
> There is a good explanation here:
>
> http://wiki.apache.org/solr/HierarchicalFaceting#PathHierarchyTokenizerFactory
>
>
> Hope that helps.
> Brendan
>
>
>
>
>
>
> On Mon, May 20, 2013 at 4:18 PM, Cord Thomas <co...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am submitting rich documents to a SOLR index via Solr Cell.   This is
> all
> > working well.
> >
> > The documents are organized in meaningful folders.  I would like to
> capture
> > the folder names in my index so that I can use the folder names to
> provide
> > facets.
> >
> > I can pass the path data into the indexing process and would like to
> > convert 2 paths deep into indexed and stored data - or copy field data.
> >
> > Say i have files in these folders:
> >
> > Financial
> > Financial/Annual
> > Financial/Audit
> > Organizational
> > Organizational/Offices
> > Organizational/Staff
> >
> > I would like to then provide facets using these names.
> >
> > Can someone please guide me in the right direction on how I might
> > accomplish this?
> >
> > Thank you
> >
> > Cord
> >
>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>

Re: Question on implementation for schema design - parsing path information into stored field

Posted by Brendan Grainger <br...@gmail.com>.
Hi Cord,

I think you'd do it like this:

1. Add this to schema.xml

    <!--
      Example of using PathHierarchyTokenizerFactory at index time, so
      queries for paths match documents at that path, or in descendent paths
    -->
    <fieldType name="descendent_path" class="solr.TextField">
      <analyzer type="index">
      <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.KeywordTokenizerFactory" />
      </analyzer>
    </fieldType>

<field name="folders_facet" type="descendent_path" indexed="true"
stored="true" multiValued="true" />

2. When you index add the 'folders' to the folders_facet field (or whatever
you want to call it).
3. Your query would look something like:

http://localhost:8982/solr/
<core_name>/select?facet=on&facet.field=folders_facet&facet.mincount=1&....

There is a good explanation here:
http://wiki.apache.org/solr/HierarchicalFaceting#PathHierarchyTokenizerFactory


Hope that helps.
Brendan






On Mon, May 20, 2013 at 4:18 PM, Cord Thomas <co...@gmail.com> wrote:

> Hello,
>
> I am submitting rich documents to a SOLR index via Solr Cell.   This is all
> working well.
>
> The documents are organized in meaningful folders.  I would like to capture
> the folder names in my index so that I can use the folder names to provide
> facets.
>
> I can pass the path data into the indexing process and would like to
> convert 2 paths deep into indexed and stored data - or copy field data.
>
> Say i have files in these folders:
>
> Financial
> Financial/Annual
> Financial/Audit
> Organizational
> Organizational/Offices
> Organizational/Staff
>
> I would like to then provide facets using these names.
>
> Can someone please guide me in the right direction on how I might
> accomplish this?
>
> Thank you
>
> Cord
>



-- 
Brendan Grainger
www.kuripai.com