You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wayne W <wa...@gmail.com> on 2012/01/23 22:26:08 UTC

ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0

Hi,

Im been trying to figure this out now for a few days and I'm just not
getting anywhere, so any pointers would be MOST welcome. I'm in the
process of upgrading from 1.3 to the latest and greatest version of
Solr and I'm getting there slowly. However I have this (final) problem
that when sending a document for extraction, 2 of my fields defined in
my schema are ignored. When I don't using the extraction the fields
are used fine (I can see them via Luke).

My schema has:
<field name="uid" type="string" stored="true"/>
        <field name="type" type="string" stored="true" />
        <field name="id" indexed="false" type="long" stored="true"/>
        <field name="project-id" type="long" stored="true"/>
        <field name="company-id" type="long" stored="true"/>
        <field name="importTimestamp" type="long" stored="true"/>
        <field name="label" type="text_ws" indexed="true"
stored="true" multiValued="true" omitNorms="true"/>
        <field name="text" type="text" indexed="true" stored="true"
multiValued="true" />
        <field name="title" type="text" indexed="true" stored="true"
multiValued="true"/>
        <field name="date" type="date" indexed="true" stored="true"
multiValued="true"/>


My request:
INFO: [] webapp=/solr path=/update/extract
params={literal.company-id=8&literal.uid=hub.app.model.Document#203657&literal.date=2012-01-23T21:10:42Z&literal.id=203657&literal.type=hub.app.model.Document&idx.attr=true&literal.label=&literal.title=hotel+surfers.pdf&def.fl=text&literal.project-id=36}
status=0 QTime=3579
Jan 24, 2012 8:10:58 AM org.apache.solr.update.DirectUpdateHandler2 commit


For unknown reasons the fields 'company-id', and 'project-id' are ignored.

any ideas?
many thanks
Wayne

Re: Hierarchical faceting in UI

Posted by Johannes Goll <jo...@gmail.com>.
another way is to store the original hierarchy in a sql database (in
the form: id, parent_id, name, level) and in the Lucene index store
the complete
hierarchy (from root to leave node) for each document in one field
using the ids of the sql database. In that way you can get documents
at any level of
the hierarchy. You can use the sql database to dynamically expand the
tree by building facet queries to fetch document collections of
child-nodes.

Johannes


 from the root level down to leave node in one field "1 13 32 42 23 12"

2012/1/23  <da...@ontrenet.com>:
>
> On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao <nf...@yahoo.com>
> wrote:
>> Programmatically, something like this might work: for each facet field,
>> add another hidden field that identifies its parent.  Then, program
>> additional logic in the UI to show only the facet terms at the currently
>> selected level.  For example, if one filters on "cat:electronics", the
> new
>> UI logic would apply the additional filter "cat_parent:electronics".
> Can
>> this be done?
>
> Yes. This is how I do it.
>
>> Would it be a lot of work?
> No. Its not a lot of work, simply represent your hierarchy as parent/child
> relations in the document fields and in your UI drill down by issuing new
> faceted searches. Use the current facet (tree level) as the parent:<level>
> in the next query. Its much easier than other suggestions for this.
>
>> Is there a better way?
> Not in my opinion, there isn't. This is the simplest to implement and
> understand.
>
>>
>> By the way, Flamenco (another faceted browser) has built-in support for
>> hierarchies, and it has worked well for my data in this aspect (but less
>> well than Solr in others).  I'm looking for the same kind of
> hierarchical
>> UI feature in Solr.

Re: Hierarchical faceting in UI

Posted by Chris Hostetter <ho...@fucit.org>.
I'm not really following your specific example, but a worked through 
example of the "index full breadcrumb" type approach darren was suggesting 
for doing drill down i na hierarchy is described in slides 32-35 of 
this presentation (which was recorded as a webcast)...

http://people.apache.org/%7Ehossman/apachecon2010/facets/
http://www.lucidimagination.com/why-lucid/webinars/mastering-power-faceted-search

in general, i *strongly* recommend that you use unique ids for each 
"node" of your taxonomy as a way to avoid confusion when multiple nodes 
have the same label/name.  (in the presentation i talk about doing that, 
but the slides show simple strings to help viewers follow what's going on)

-Hoss

Re: Hierarchical faceting in UI

Posted by Yuhao <nf...@yahoo.com>.
Hi Darren.  You said: 


"Your UI will associate the correct parent id to build the facet query"

This is the part I'm having trouble figuring out how to accomplish and some guidance would help. How would I get the value of the parent to build the facet query in the UI, if the value is in another document field?  I was imagining that I would add the additional filter of "parent:<parent path>" to the "fq" URL parameter.  But I don't have a way to do it yet.

Perhaps seeing some data would help.  Here is a record in old (flattened) and new (parent-enabled) versions, both in JSON format:

OLD:
    {
        "ID" : "3816",
        "Gene Symbol" : "KLK1",
        "Alternate Names" : "hCG_22931;Klk6;hK1;KLKR",
        "Description" : "Kallikrein 1, a peptidase that cleaves kininogen, functions in glucose homeostasis, heart contraction, semen liquefaction, and vasoconstriction, aberrantly expressed in pancreatitis and endometrial cancer; gene polymorphism correlates with kidney failure (BKL)",
        "GAD_Positive_Disease_Associations" : ["Mental Disorders(MESH:D001523) >> Dementia, Vascular(MESH:D015140)", "Cardiovascular Diseases(MESH:D002318) >> Coronary Artery Disease(MESH:D003324)"],
        "HuGENet_GeneProspector_Associations" : ["atherosclerosis", "HDL"],
    }



NEW:
    {
        "ID" : "3816",
        "Gene Symbol" : "KLK1",
        "Alternate Names" : "hCG_22931;Klk6;hK1;KLKR",
        "Description" : "Kallikrein 1, a peptidase that cleaves kininogen, functions in glucose homeostasis, heart contraction, semen liquefaction, and vasoconstriction, aberrantly expressed in pancreatitis and endometrial cancer; gene polymorphism correlates with kidney failure (BKL)",
        "GAD_Positive_Disease_Associations" : ["Dementia, Vascular(MESH:D015140)", "Coronary Artery Disease(MESH:D003324)"],
        "GAD_Positive_Disease_Associations_parent" : ["Mental Disorders(MESH:D001523)", "Cardiovascular Diseases(MESH:D002318)"],
        "HuGENet_GeneProspector_Associations" : ["atherosclerosis", "HDL"],
    }

In the old version, the field "GAD_Positive_Disease_Associations" had 2 levels of hierarchy that were flattened.  It had the full path of the hierarchy leading to the current term.  In the new version, the field only has the current term.  A separate field called "GAD_Positive_Disease_Associations_parent" has the full path preceding the current term.

So, let's say in the UI, I click on the term "Dementia, Vascular(MESH:D015140)" to get its child terms and data.  My filters in the URL querystring would be exactly: 

fq=GAD_Positive_Disease_Associations:"Dementia, Vascular(MESH:D015140)"&fq=GAD_Positive_Disease_Associations_parent:"Mental Disorders(MESH:D001523)"

My question is, how to get the parent value of "Mental Disorders(MESH:D001523)" to build that querystring?

Thanks!

Yuhao




________________________________
 From: Darren Govoni <da...@ontrenet.com>
To: solr-user@lucene.apache.org 
Sent: Tuesday, January 24, 2012 1:23 PM
Subject: Re: Hierarchical faceting in UI
 
Yuhao,
     Ok, let me think about this. A term can have multiple parents. Each of those parents would be 'different', yes?
In this case, use a multivalued field for the parent and add all the parent names or id's to it. The relations should be unique.

Your UI will associate the correct parent id to build the facet query from and return the correct children because the user
is descending down a specific path in the UI and the parent node unique id's are returned along the way.

Now, if you are having parent names/id's that themselves can appear in multiple locations (vs. just terms 'the leafs'),
then perhaps your hierarchy needs refactoring for redundancy?

Happy to help with more details.

Darren


On 01/24/2012 11:22 AM, Yuhao wrote:
> Darren,
>
> One challenge for me is that a term can appear in multiple places of the hierarchy.  So it's not safe to simply use the term as it appears to get its children; I probably need to include the entire tree path up to this term.  For example, if the hierarchy is "Cardiovascular Diseases>  Arteriosclerosis>  Coronary Artery Disease", and I'm getting the children of the middle term Arteriosclerosi, I need to filter on something like "parent:Cardiovascular Diseases/Arteriosclerosis".
>
> I'm having trouble figuring out how I can get the complete path per above to add to the URL of each facet term.  I know "velocity/facet_field.vm" is where I build the URL.  I know how to simply add a "parent:<term>" filter to the URL.  But I don't know how to access a document field, like the complete parent path, in "facet_field.vm".  Any help would be great.
>
> Yuhao
>
>
>
>
> ________________________________
>   From: "darren@ontrenet.com"<da...@ontrenet.com>
> To: Yuhao<nf...@yahoo.com>
> Cc: solr-user@lucene.apache.org
> Sent: Monday, January 23, 2012 7:16 PM
> Subject: Re: Hierarchical faceting in UI
>
>
> On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao<nf...@yahoo.com>
> wrote:
>> Programmatically, something like this might work: for each facet field,
>> add another hidden field that identifies its parent.  Then, program
>> additional logic in the UI to show only the facet terms at the currently
>> selected level.  For example, if one filters on "cat:electronics", the
> new
>> UI logic would apply the additional filter "cat_parent:electronics".
> Can
>> this be done?
> Yes. This is how I do it.
>
>> Would it be a lot of work?
> No. Its not a lot of work, simply represent your hierarchy as parent/child
> relations in the document fields and in your UI drill down by issuing new
> faceted searches. Use the current facet (tree level) as the parent:<level>
> in the next query. Its much easier than other suggestions for this.
>
>> Is there a better way?
> Not in my opinion, there isn't. This is the simplest to implement and
> understand.
>
>> By the way, Flamenco (another faceted browser) has built-in support for
>> hierarchies, and it has worked well for my data in this aspect (but less
>> well than Solr in others).  I'm looking for the same kind of
> hierarchical
>> UI feature in Solr.

Re: Hierarchical faceting in UI

Posted by Darren Govoni <da...@ontrenet.com>.
Yuhao,
     Ok, let me think about this. A term can have multiple parents. Each of those parents would be 'different', yes?
In this case, use a multivalued field for the parent and add all the parent names or id's to it. The relations should be unique.

Your UI will associate the correct parent id to build the facet query from and return the correct children because the user
is descending down a specific path in the UI and the parent node unique id's are returned along the way.

Now, if you are having parent names/id's that themselves can appear in multiple locations (vs. just terms 'the leafs'),
then perhaps your hierarchy needs refactoring for redundancy?

Happy to help with more details.

Darren


On 01/24/2012 11:22 AM, Yuhao wrote:
> Darren,
>
> One challenge for me is that a term can appear in multiple places of the hierarchy.  So it's not safe to simply use the term as it appears to get its children; I probably need to include the entire tree path up to this term.  For example, if the hierarchy is "Cardiovascular Diseases>  Arteriosclerosis>  Coronary Artery Disease", and I'm getting the children of the middle term Arteriosclerosi, I need to filter on something like "parent:Cardiovascular Diseases/Arteriosclerosis".
>
> I'm having trouble figuring out how I can get the complete path per above to add to the URL of each facet term.  I know "velocity/facet_field.vm" is where I build the URL.  I know how to simply add a "parent:<term>" filter to the URL.  But I don't know how to access a document field, like the complete parent path, in "facet_field.vm".  Any help would be great.
>
> Yuhao
>
>
>
>
> ________________________________
>   From: "darren@ontrenet.com"<da...@ontrenet.com>
> To: Yuhao<nf...@yahoo.com>
> Cc: solr-user@lucene.apache.org
> Sent: Monday, January 23, 2012 7:16 PM
> Subject: Re: Hierarchical faceting in UI
>
>
> On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao<nf...@yahoo.com>
> wrote:
>> Programmatically, something like this might work: for each facet field,
>> add another hidden field that identifies its parent.  Then, program
>> additional logic in the UI to show only the facet terms at the currently
>> selected level.  For example, if one filters on "cat:electronics", the
> new
>> UI logic would apply the additional filter "cat_parent:electronics".
> Can
>> this be done?
> Yes. This is how I do it.
>
>> Would it be a lot of work?
> No. Its not a lot of work, simply represent your hierarchy as parent/child
> relations in the document fields and in your UI drill down by issuing new
> faceted searches. Use the current facet (tree level) as the parent:<level>
> in the next query. Its much easier than other suggestions for this.
>
>> Is there a better way?
> Not in my opinion, there isn't. This is the simplest to implement and
> understand.
>
>> By the way, Flamenco (another faceted browser) has built-in support for
>> hierarchies, and it has worked well for my data in this aspect (but less
>> well than Solr in others).  I'm looking for the same kind of
> hierarchical
>> UI feature in Solr.


Re: Hierarchical faceting in UI

Posted by Yuhao <nf...@yahoo.com>.
Darren,

One challenge for me is that a term can appear in multiple places of the hierarchy.  So it's not safe to simply use the term as it appears to get its children; I probably need to include the entire tree path up to this term.  For example, if the hierarchy is "Cardiovascular Diseases > Arteriosclerosis > Coronary Artery Disease", and I'm getting the children of the middle term Arteriosclerosi, I need to filter on something like "parent:Cardiovascular Diseases/Arteriosclerosis".

I'm having trouble figuring out how I can get the complete path per above to add to the URL of each facet term.  I know "velocity/facet_field.vm" is where I build the URL.  I know how to simply add a "parent:<term>" filter to the URL.  But I don't know how to access a document field, like the complete parent path, in "facet_field.vm".  Any help would be great.

Yuhao




________________________________
 From: "darren@ontrenet.com" <da...@ontrenet.com>
To: Yuhao <nf...@yahoo.com> 
Cc: solr-user@lucene.apache.org 
Sent: Monday, January 23, 2012 7:16 PM
Subject: Re: Hierarchical faceting in UI
 

On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao <nf...@yahoo.com>
wrote:
> Programmatically, something like this might work: for each facet field,
> add another hidden field that identifies its parent.  Then, program
> additional logic in the UI to show only the facet terms at the currently
> selected level.  For example, if one filters on "cat:electronics", the
new
> UI logic would apply the additional filter "cat_parent:electronics". 
Can
> this be done?  

Yes. This is how I do it.

> Would it be a lot of work?  
No. Its not a lot of work, simply represent your hierarchy as parent/child
relations in the document fields and in your UI drill down by issuing new
faceted searches. Use the current facet (tree level) as the parent:<level>
in the next query. Its much easier than other suggestions for this.

> Is there a better way?
Not in my opinion, there isn't. This is the simplest to implement and
understand.

> 
> By the way, Flamenco (another faceted browser) has built-in support for
> hierarchies, and it has worked well for my data in this aspect (but less
> well than Solr in others).  I'm looking for the same kind of
hierarchical
> UI feature in Solr.

Re: Hierarchical faceting in UI

Posted by da...@ontrenet.com.
On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao <nf...@yahoo.com>
wrote:
> Programmatically, something like this might work: for each facet field,
> add another hidden field that identifies its parent.  Then, program
> additional logic in the UI to show only the facet terms at the currently
> selected level.  For example, if one filters on "cat:electronics", the
new
> UI logic would apply the additional filter "cat_parent:electronics". 
Can
> this be done?  

Yes. This is how I do it.

> Would it be a lot of work?  
No. Its not a lot of work, simply represent your hierarchy as parent/child
relations in the document fields and in your UI drill down by issuing new
faceted searches. Use the current facet (tree level) as the parent:<level>
in the next query. Its much easier than other suggestions for this.

> Is there a better way?
Not in my opinion, there isn't. This is the simplest to implement and
understand.

> 
> By the way, Flamenco (another faceted browser) has built-in support for
> hierarchies, and it has worked well for my data in this aspect (but less
> well than Solr in others).  I'm looking for the same kind of
hierarchical
> UI feature in Solr.

Re: Hierarchical faceting in UI

Posted by Chris Hostetter <ho...@fucit.org>.
: References:
:     <CA...@mail.gmail.com>
: Message-ID: <13...@web160302.mail.bf1.yahoo.com>
: Subject: Hierarchical faceting in UI

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Hierarchical faceting in UI

Posted by Yuhao <nf...@yahoo.com>.
I have some hierarchical data that I want to represent in the Solr UI (/browse).  I've read through many discussions on this topic, including http://wiki.apache.org/solr/HierarchicalFaceting and http://packtlib.packtpub.com/library/9781849516068/ch06lvl1sec09 .  However, I didn't see a solution that solves case.

For each facet field in my data, the depth varies depending on the facet term.  For example, at the root level, facet term 1 may have 3 levels down, but facet term 2 will have 8 levels down.  In the UI, I want to show only the facet terms at the currently selected level.  Using the example that comes with Solr, the facet "cat" has the field "electronics", which then has several children.  When my user initially enters the UI, he should only see "electronics"; he should not see any of its children until he clicks on "electronics".

Programmatically, something like this might work: for each facet field, add another hidden field that identifies its parent.  Then, program additional logic in the UI to show only the facet terms at the currently selected level.  For example, if one filters on "cat:electronics", the new UI logic would apply the additional filter "cat_parent:electronics".  Can this be done?  Would it be a lot of work?  Is there a better way?

By the way, Flamenco (another faceted browser) has built-in support for hierarchies, and it has worked well for my data in this aspect (but less well than Solr in others).  I'm looking for the same kind of hierarchical UI feature in Solr.

Re: ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0

Posted by Wayne W <wa...@gmail.com>.
Ah perfect - thank you Jan so much. :-)


On Tue, Jan 24, 2012 at 11:14 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> Hi,
>
> It's because lowernames=true by default in solrconfig.xml, and it will convert any "-" into "_" in field names. So try adding a request parameter &lowernames=false or change the default in solrconfig.xml. Alternatively, leave as is but name your fields project_id and company_id :)
>
> http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 23. jan. 2012, at 22:26, Wayne W wrote:
>
>> Hi,
>>
>> Im been trying to figure this out now for a few days and I'm just not
>> getting anywhere, so any pointers would be MOST welcome. I'm in the
>> process of upgrading from 1.3 to the latest and greatest version of
>> Solr and I'm getting there slowly. However I have this (final) problem
>> that when sending a document for extraction, 2 of my fields defined in
>> my schema are ignored. When I don't using the extraction the fields
>> are used fine (I can see them via Luke).
>>
>> My schema has:
>> <field name="uid" type="string" stored="true"/>
>>        <field name="type" type="string" stored="true" />
>>        <field name="id" indexed="false" type="long" stored="true"/>
>>        <field name="project-id" type="long" stored="true"/>
>>        <field name="company-id" type="long" stored="true"/>
>>        <field name="importTimestamp" type="long" stored="true"/>
>>        <field name="label" type="text_ws" indexed="true"
>> stored="true" multiValued="true" omitNorms="true"/>
>>        <field name="text" type="text" indexed="true" stored="true"
>> multiValued="true" />
>>        <field name="title" type="text" indexed="true" stored="true"
>> multiValued="true"/>
>>        <field name="date" type="date" indexed="true" stored="true"
>> multiValued="true"/>
>>
>>
>> My request:
>> INFO: [] webapp=/solr path=/update/extract
>> params={literal.company-id=8&literal.uid=hub.app.model.Document#203657&literal.date=2012-01-23T21:10:42Z&literal.id=203657&literal.type=hub.app.model.Document&idx.attr=true&literal.label=&literal.title=hotel+surfers.pdf&def.fl=text&literal.project-id=36}
>> status=0 QTime=3579
>> Jan 24, 2012 8:10:58 AM org.apache.solr.update.DirectUpdateHandler2 commit
>>
>>
>> For unknown reasons the fields 'company-id', and 'project-id' are ignored.
>>
>> any ideas?
>> many thanks
>> Wayne
>

Re: ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

It's because lowernames=true by default in solrconfig.xml, and it will convert any "-" into "_" in field names. So try adding a request parameter &lowernames=false or change the default in solrconfig.xml. Alternatively, leave as is but name your fields project_id and company_id :)

http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 23. jan. 2012, at 22:26, Wayne W wrote:

> Hi,
> 
> Im been trying to figure this out now for a few days and I'm just not
> getting anywhere, so any pointers would be MOST welcome. I'm in the
> process of upgrading from 1.3 to the latest and greatest version of
> Solr and I'm getting there slowly. However I have this (final) problem
> that when sending a document for extraction, 2 of my fields defined in
> my schema are ignored. When I don't using the extraction the fields
> are used fine (I can see them via Luke).
> 
> My schema has:
> <field name="uid" type="string" stored="true"/>
>        <field name="type" type="string" stored="true" />
>        <field name="id" indexed="false" type="long" stored="true"/>
>        <field name="project-id" type="long" stored="true"/>
>        <field name="company-id" type="long" stored="true"/>
>        <field name="importTimestamp" type="long" stored="true"/>
>        <field name="label" type="text_ws" indexed="true"
> stored="true" multiValued="true" omitNorms="true"/>
>        <field name="text" type="text" indexed="true" stored="true"
> multiValued="true" />
>        <field name="title" type="text" indexed="true" stored="true"
> multiValued="true"/>
>        <field name="date" type="date" indexed="true" stored="true"
> multiValued="true"/>
> 
> 
> My request:
> INFO: [] webapp=/solr path=/update/extract
> params={literal.company-id=8&literal.uid=hub.app.model.Document#203657&literal.date=2012-01-23T21:10:42Z&literal.id=203657&literal.type=hub.app.model.Document&idx.attr=true&literal.label=&literal.title=hotel+surfers.pdf&def.fl=text&literal.project-id=36}
> status=0 QTime=3579
> Jan 24, 2012 8:10:58 AM org.apache.solr.update.DirectUpdateHandler2 commit
> 
> 
> For unknown reasons the fields 'company-id', and 'project-id' are ignored.
> 
> any ideas?
> many thanks
> Wayne