You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Cord Thomas <co...@gmail.com> on 2013/05/20 22:18:55 UTC
Question on implementation for schema design - parsing path
information into stored field
Hello,
I am submitting rich documents to a SOLR index via Solr Cell. This is all
working well.
The documents are organized in meaningful folders. I would like to capture
the folder names in my index so that I can use the folder names to provide
facets.
I can pass the path data into the indexing process and would like to
convert 2 paths deep into indexed and stored data - or copy field data.
Say i have files in these folders:
Financial
Financial/Annual
Financial/Audit
Organizational
Organizational/Offices
Organizational/Staff
I would like to then provide facets using these names.
Can someone please guide me in the right direction on how I might
accomplish this?
Thank you
Cord
Re: Question on implementation for schema design - parsing path
information into stored field
Posted by Cord Thomas <co...@gmail.com>.
Thank you Brendan,
I had started to read about the tokenizers and couldn't quite piece
together how it would work. I will read about this and post my
implementation if successful.
Cord
On Mon, May 20, 2013 at 4:13 PM, Brendan Grainger <
brendan.grainger@gmail.com> wrote:
> Hi Cord,
>
> I think you'd do it like this:
>
> 1. Add this to schema.xml
>
> <!--
> Example of using PathHierarchyTokenizerFactory at index time, so
> queries for paths match documents at that path, or in descendent
> paths
> -->
> <fieldType name="descendent_path" class="solr.TextField">
> <analyzer type="index">
> <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"
> />
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory" />
> </analyzer>
> </fieldType>
>
> <field name="folders_facet" type="descendent_path" indexed="true"
> stored="true" multiValued="true" />
>
> 2. When you index add the 'folders' to the folders_facet field (or whatever
> you want to call it).
> 3. Your query would look something like:
>
> http://localhost:8982/solr/
> <core_name>/select?facet=on&facet.field=folders_facet&facet.mincount=1&....
>
> There is a good explanation here:
>
> http://wiki.apache.org/solr/HierarchicalFaceting#PathHierarchyTokenizerFactory
>
>
> Hope that helps.
> Brendan
>
>
>
>
>
>
> On Mon, May 20, 2013 at 4:18 PM, Cord Thomas <co...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am submitting rich documents to a SOLR index via Solr Cell. This is
> all
> > working well.
> >
> > The documents are organized in meaningful folders. I would like to
> capture
> > the folder names in my index so that I can use the folder names to
> provide
> > facets.
> >
> > I can pass the path data into the indexing process and would like to
> > convert 2 paths deep into indexed and stored data - or copy field data.
> >
> > Say i have files in these folders:
> >
> > Financial
> > Financial/Annual
> > Financial/Audit
> > Organizational
> > Organizational/Offices
> > Organizational/Staff
> >
> > I would like to then provide facets using these names.
> >
> > Can someone please guide me in the right direction on how I might
> > accomplish this?
> >
> > Thank you
> >
> > Cord
> >
>
>
>
> --
> Brendan Grainger
> www.kuripai.com
>
Re: Question on implementation for schema design - parsing path
information into stored field
Posted by Brendan Grainger <br...@gmail.com>.
Hi Cord,
I think you'd do it like this:
1. Add this to schema.xml
<!--
Example of using PathHierarchyTokenizerFactory at index time, so
queries for paths match documents at that path, or in descendent paths
-->
<fieldType name="descendent_path" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
</analyzer>
</fieldType>
<field name="folders_facet" type="descendent_path" indexed="true"
stored="true" multiValued="true" />
2. When you index add the 'folders' to the folders_facet field (or whatever
you want to call it).
3. Your query would look something like:
http://localhost:8982/solr/
<core_name>/select?facet=on&facet.field=folders_facet&facet.mincount=1&....
There is a good explanation here:
http://wiki.apache.org/solr/HierarchicalFaceting#PathHierarchyTokenizerFactory
Hope that helps.
Brendan
On Mon, May 20, 2013 at 4:18 PM, Cord Thomas <co...@gmail.com> wrote:
> Hello,
>
> I am submitting rich documents to a SOLR index via Solr Cell. This is all
> working well.
>
> The documents are organized in meaningful folders. I would like to capture
> the folder names in my index so that I can use the folder names to provide
> facets.
>
> I can pass the path data into the indexing process and would like to
> convert 2 paths deep into indexed and stored data - or copy field data.
>
> Say i have files in these folders:
>
> Financial
> Financial/Annual
> Financial/Audit
> Organizational
> Organizational/Offices
> Organizational/Staff
>
> I would like to then provide facets using these names.
>
> Can someone please guide me in the right direction on how I might
> accomplish this?
>
> Thank you
>
> Cord
>
--
Brendan Grainger
www.kuripai.com