You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kenneth hansen <ke...@hotmail.co.uk> on 2011/12/14 14:38:15 UTC

Faceting with null dates

hello,I have the following faceting parameters, which gives me some unwanted non-null dates in the result set. Is there a way to query the index to not give me non-null dates in return? I.e. I would like to get a result set which contains only non-nulls on the validToDate, but as I am faceting on non-null values on the validToDate, I would like to get the non-null values in the faceting result. This response example below gives me 10 results, with 7 non-null validToDates. What I would like to get is 3 results and 7 non-null validToDate facets. And as I write this, I start to wonder if this is possible at all as the facets are dependent on the result set and that this might be better to handle in the application layer by just extracting 10-7=3...
Any help would be appreciated!
br,ken
<code><str name="facet">true</str><str name="f.validToDate.facet.range.start">NOW/DAYS-4MONTHS</str><str name="facet.mincount">1</str><str name="q">(*:*)</str><arr name="facet.range"><str>validToDate</str></arr><str name="facet.range.end">NOW/DAY+1DAY</str><str name="facet.range.gap">+1MONTH</str></code>

<result name="response" numFound="10" start="0"><lst name="facet_counts"><lst name="facet_ranges">  <lst name="validToDate">  <lst name="counts">  <int name="2011-11-14T00:00:00Z">7</int>

 		 	   		  

Re: Faceting with null dates

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, I'm not sure I'm following this.
"Is there a way to query the index to not give me non-null dates in return"
So you want null dates?
and:
"which gives me some unwanted non-null dates in the result set"
which seems to indicate you do NOT want null dates.

I honestly don't know what your desired outcome is, could you
clarify?

Best
Erick


On Wed, Dec 14, 2011 at 8:38 AM, kenneth hansen <ke...@hotmail.co.uk> wrote:
>
> hello,I have the following faceting parameters, which gives me some unwanted non-null dates in the result set. Is there a way to query the index to not give me non-null dates in return? I.e. I would like to get a result set which contains only non-nulls on the validToDate, but as I am faceting on non-null values on the validToDate, I would like to get the non-null values in the faceting result. This response example below gives me 10 results, with 7 non-null validToDates. What I would like to get is 3 results and 7 non-null validToDate facets. And as I write this, I start to wonder if this is possible at all as the facets are dependent on the result set and that this might be better to handle in the application layer by just extracting 10-7=3...
> Any help would be appreciated!
> br,ken
> <code><str name="facet">true</str><str name="f.validToDate.facet.range.start">NOW/DAYS-4MONTHS</str><str name="facet.mincount">1</str><str name="q">(*:*)</str><arr name="facet.range"><str>validToDate</str></arr><str name="facet.range.end">NOW/DAY+1DAY</str><str name="facet.range.gap">+1MONTH</str></code>
>
> <result name="response" numFound="10" start="0"><lst name="facet_counts"><lst name="facet_ranges">  <lst name="validToDate">  <lst name="counts">  <int name="2011-11-14T00:00:00Z">7</int>
>
>

Re: Faceting with null dates

Posted by Erick Erickson <er...@gmail.com>.
1) the number of documents for a given date range R1 that do not have
a value for the validToDate, i.e. the 99% of the documents

Makes no sense either. "for a given date range R1 that don't have a value".
You can't specify a range for a document that doesn't have a value!
I think you're asking for "the documents that *satisfy my query* that don't
have a date value". In which case Chris' suggestion to use a pure-negative
will give you what you want. You can specify arbitrary "facet.query" clauses
along with your facet.range stuff, they're just treated as separate facets. Just
tack it on your query and it'll come back in a separate section of the response.

Best
Erick

On Thu, Dec 22, 2011 at 3:45 AM, kenneth hansen <ke...@hotmail.co.uk> wrote:
>
> yes,
> I see that my question was a bit confusing. But thanks for your answers.
> I will try to clarify a bit.
>
> I query on a date field, validToDate. The value for this field is not present for 99% of the documents.
> What I would like to get is
> 1) the number of documents for a given date range R1 that do not have a value for the validToDate, i.e. the 99% of the documents
> 2) the number of documents for a given date range R2 that do have a value for the validToDate
>
> My question is really: is it possible to have just one query, or do I need to have two queries; one for 1) and one for 2). Will the "facet.range.other=all" help me in any way here?
>
> /k
>
>
>
>> Date: Thu, 15 Dec 2011 12:25:12 -0800
>> From: hossman_lucene@fucit.org
>> To: solr-user@lucene.apache.org
>> Subject: Re: Faceting with null dates
>>
>>
>> First of all, we need to clarify some terminology here: there is no such
>> thing as a "null date" in solr -- or for that matter, there is no such
>> thing as a "full value" in any field. documents either have some value(s)
>> for a field, or they do not hvae any values.
>>
>> If you want to constrain your query to only documents that have a value in
>> a field, you can use something like fq=field_name:[* TO *] ... if you want
>> to constraint your query to only documents that do *NOT* have a value in a
>> field, you can use fq=-field_name:[* TO *]
>>
>> Now, having said that, like Erick, i'm a little confused by your question
>> -- it's not clear if what you really want to do is:
>>
>> a) change the set of documents returned in the main result list
>> b) change the set of documents considered when generating facet counts
>> (w/o changing the main result list)
>> c) return an additional count of documents that are in the main result
>> list, but are not in the facet counts because they do not have the field
>> being faceted on.
>>
>> My best guess is that you are asking about "c" based on your last
>> sentence...
>>
>> : get is 3 results and 7 non-null validToDate facets. And as I write this,
>> : I start to wonder if this is possible at all as the facets are dependent
>> : on the result set and that this might be better to handle in the
>> : application layer by just extracting 10-7=3...
>>
>> ...subtracting the sum of all constraint counts from your range facet from
>> the total number of documents found won't neccessarily tell you the number
>> of documents that have no value in the field you are faceting on --
>> because documents may have values out side the range of your start/end.
>>
>> Depending on what exactly it is you are looking for, you might find the
>> "facet.range.other=all" param useful, as it will return things like the
>> "between" counts (summing up all the docs between start->end) as well as
>> the "before" and "after" counts.
>>
>> But if you really just want to know "how many docs have no value for my
>> validToDate field?" you can get that very explicitly and easily using
>> facet.query=-validToDate:[* TO *]
>>
>> : <code><str name="facet">true</str><str
>> : name="f.validToDate.facet.range.start">NOW/DAYS-4MONTHS</str><str
>> : name="facet.mincount">1</str><str name="q">(*:*)</str><arr
>> : name="facet.range"><str>validToDate</str></arr><str
>> : name="facet.range.end">NOW/DAY+1DAY</str><str
>> : name="facet.range.gap">+1MONTH</str></code>
>> :
>> : <result name="response" numFound="10" start="0"><lst
>> : name="facet_counts"><lst name="facet_ranges"> <lst name="validToDate">
>> : <lst name="counts"> <int name="2011-11-14T00:00:00Z">7</int>
>>
>>
>> -Hoss
>

RE: Faceting with null dates

Posted by kenneth hansen <ke...@hotmail.co.uk>.
yes, 
I see that my question was a bit confusing. But thanks for your answers. 
I will try to clarify a bit.
 
I query on a date field, validToDate. The value for this field is not present for 99% of the documents.
What I would like to get is
1) the number of documents for a given date range R1 that do not have a value for the validToDate, i.e. the 99% of the documents
2) the number of documents for a given date range R2 that do have a value for the validToDate
 
My question is really: is it possible to have just one query, or do I need to have two queries; one for 1) and one for 2). Will the "facet.range.other=all" help me in any way here?
 
/k

 

> Date: Thu, 15 Dec 2011 12:25:12 -0800
> From: hossman_lucene@fucit.org
> To: solr-user@lucene.apache.org
> Subject: Re: Faceting with null dates
> 
> 
> First of all, we need to clarify some terminology here: there is no such 
> thing as a "null date" in solr -- or for that matter, there is no such 
> thing as a "full value" in any field. documents either have some value(s) 
> for a field, or they do not hvae any values.
> 
> If you want to constrain your query to only documents that have a value in 
> a field, you can use something like fq=field_name:[* TO *] ... if you want 
> to constraint your query to only documents that do *NOT* have a value in a 
> field, you can use fq=-field_name:[* TO *]
> 
> Now, having said that, like Erick, i'm a little confused by your question 
> -- it's not clear if what you really want to do is:
> 
> a) change the set of documents returned in the main result list
> b) change the set of documents considered when generating facet counts 
> (w/o changing the main result list)
> c) return an additional count of documents that are in the main result 
> list, but are not in the facet counts because they do not have the field 
> being faceted on.
> 
> My best guess is that you are asking about "c" based on your last 
> sentence...
> 
> : get is 3 results and 7 non-null validToDate facets. And as I write this, 
> : I start to wonder if this is possible at all as the facets are dependent 
> : on the result set and that this might be better to handle in the 
> : application layer by just extracting 10-7=3...
> 
> ...subtracting the sum of all constraint counts from your range facet from 
> the total number of documents found won't neccessarily tell you the number 
> of documents that have no value in the field you are faceting on -- 
> because documents may have values out side the range of your start/end.
> 
> Depending on what exactly it is you are looking for, you might find the 
> "facet.range.other=all" param useful, as it will return things like the 
> "between" counts (summing up all the docs between start->end) as well as 
> the "before" and "after" counts.
> 
> But if you really just want to know "how many docs have no value for my 
> validToDate field?" you can get that very explicitly and easily using 
> facet.query=-validToDate:[* TO *]
> 
> : <code><str name="facet">true</str><str 
> : name="f.validToDate.facet.range.start">NOW/DAYS-4MONTHS</str><str 
> : name="facet.mincount">1</str><str name="q">(*:*)</str><arr 
> : name="facet.range"><str>validToDate</str></arr><str 
> : name="facet.range.end">NOW/DAY+1DAY</str><str 
> : name="facet.range.gap">+1MONTH</str></code>
> : 
> : <result name="response" numFound="10" start="0"><lst 
> : name="facet_counts"><lst name="facet_ranges"> <lst name="validToDate"> 
> : <lst name="counts"> <int name="2011-11-14T00:00:00Z">7</int>
> 
> 
> -Hoss
 		 	   		  

Re: Faceting with null dates

Posted by Chris Hostetter <ho...@fucit.org>.
First of all, we need to clarify some terminology here: there is no such 
thing as a "null date" in solr -- or for that matter, there is no such 
thing as a "full value" in any field.  documents either have some value(s) 
for a field, or they do not hvae any values.

If you want to constrain your query to only documents that have a value in 
a field, you can use something like fq=field_name:[* TO *] ... if you want 
to constraint your query to only documents that do *NOT* have a value in a 
field, you can use fq=-field_name:[* TO *]

Now, having said that, like Erick, i'm a little confused by your question 
-- it's not clear if what you really want to do is:

a) change the set of documents returned in the main result list
b) change the set of documents considered when generating facet counts 
(w/o changing the main result list)
c) return an additional count of documents that are in the main result 
list, but are not in the facet counts because they do not have the field 
being faceted on.

My best guess is that you are asking about "c" based on your last 
sentence...

: get is 3 results and 7 non-null validToDate facets. And as I write this, 
: I start to wonder if this is possible at all as the facets are dependent 
: on the result set and that this might be better to handle in the 
: application layer by just extracting 10-7=3...

...subtracting the sum of all constraint counts from your range facet from 
the total number of documents found won't neccessarily tell you the number 
of documents that have no value in the field you are faceting on -- 
because documents may have values out side the range of your start/end.

Depending on what exactly it is you are looking for, you might find the 
"facet.range.other=all" param useful, as it will return things like the  
"between" counts (summing up all the docs between start->end) as well as 
the "before" and "after" counts.

But if you really just want to know "how many docs have no value for my 
validToDate field?" you can get that very explicitly and easily using 
facet.query=-validToDate:[* TO *]

: <code><str name="facet">true</str><str 
: name="f.validToDate.facet.range.start">NOW/DAYS-4MONTHS</str><str 
: name="facet.mincount">1</str><str name="q">(*:*)</str><arr 
: name="facet.range"><str>validToDate</str></arr><str 
: name="facet.range.end">NOW/DAY+1DAY</str><str 
: name="facet.range.gap">+1MONTH</str></code>
: 
: <result name="response" numFound="10" start="0"><lst 
: name="facet_counts"><lst name="facet_ranges"> <lst name="validToDate"> 
: <lst name="counts"> <int name="2011-11-14T00:00:00Z">7</int>


-Hoss