You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rohit <ro...@in-rev.com> on 2011/05/09 15:57:58 UTC

Solr Range Facets

Hi Chris ,

 

I did try what you suggested, but I am not getting the expected results. The
code is given below,

 

 

            SolrQuery query = new SolrQuery();

            query.set("q","apple");

            query.set("facet","true"); 

       

            query.set("facet.range", "createdOnGMTDate");

            query.set("facet.range.start", "2010-01-01T00:00:00Z") ;

            query.set("facet.range.gap", "+1DAY");

            

            QueryResponse qr = server.query(query);

            

            SolrDocumentList sdl = qr.getResults();

            

            System.out.println("Found: " + sdl.getNumFound());

            System.out.println("Start: " + sdl.getStart());

 

            System.out.println("-----------");

           

            List<FacetField> facets = qr.getFacetFields();

            

            for(FacetField facet : facets)

            {

                List<FacetField.Count> facetEntries = facet.getValues();

 

                for(FacetField.Count fcount : facetEntries)

                {

                    System.out.println(fcount.getName() + ": " +
fcount.getCount());

                }

            }       

 

Regards,

Rohit

 

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: 07 May 2011 04:36
To: solr-user@lucene.apache.org
Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date
String:

 

 

: Thanks for the response, actually what we need to achive is see group by

: results based on dates like,

: 

: 2011-01-01  23

: 2011-01-02  14

: 2011-01-03  40

: 2011-01-04  10

: 

: Now the records in my table run into millions, grouping the result based
on

: UTC date would not produce the right result since the result should be

: grouped on users timezone.  Is there anyway we can achieve this in Solr?

 

Date faceting is entirely driven by query params, so if you index your 

events using the "true" time that they happend at (formatted as a string 

in UTC) you can then select your date ranges using whatever timezone 

offset is specified by your user at query time as a UTC offset.

 

      facet.range = dateField

      facet.range.start = 2011-01-01T00:00:00Z+${useroffset}MINUTES

      facet.range.gap = +1DAY

      etc...

 

 

-Hoss


Re: Solr Range Facets

Posted by Rohit Gupta <ro...@in-rev.com>.
Hi Chris,

I made a mistake in explaining the second part of my question. 

If you notice the faceted result, you will notice for results for the 2nd May 
2011 there are 4 results, but when I query for the 2nd May I  should get only 1 
result since after apply the offset all the remaining results should be shifted 
to the 3rd of May.

But I think i got the reason for this, I guess offset is applied to only the 
edges and not to the actual result. I mean when we say facet with an offset of 
+330MINUTES, what solr actually does is just move the facets by +330MINUTES, but 
not each and every document.

Regards,
Rohit



________________________________
From: Chris Hostetter <ho...@fucit.org>
To: solr-user@lucene.apache.org
Sent: Thu, 19 May, 2011 6:16:53 AM
Subject: RE: Solr Range Facets


: Thanks for explaining the point system, please find below the complete

Sorry .. that part was ment to be a joke, I think i was really tired when 
i wrote that.  The key take away: details matter.


:                     <int
: name="2011-05-02T05:30:00Z">4</int>
:                     <int
: name="2011-05-03T05:30:00Z">63</int>
:                     <int
: name="2011-05-04T05:30:00Z">0</int>
:                     <int
: name="2011-05-05T05:30:00Z">0</int>
    ...
: Now if you notice that the response show 4 records for the 2th of May 2011
: which will fall in the IST timezone (+330MINUTES), but when I try to get the

right.

: results I see that there is only 1 result for the 5th why is this happening.

Why do you say that?

According to those facet results, there are 0 docs between 
2011-05-05T05:30:00Z and 2011-05-05T05:30:00Z+1DAY (which is what i 
assume you mean by "the 5th" ... ie: "May 5th, in that timezone offset")

Not only that, but the query you posted isn't attempting to filter on "the 
5th" by any possible definition of the concept...

:             <str
: name="fq">createdOnGMTDate:[2011-05-01T00:00:00Z+330MINUTES TO *]
:             </str>

...that's saying you want all docs with a date on or after "the 1st".

: If I don't apply the offset the results match with the facet count, is there
: something wrong in my query?

it looks like your query is just plain wrong.  if you're goal was to 
drill down and show only documents from "the 5th" it should have been 
something like...

fq = createdOnGMTDate:[2011-05-05T00:00:00Z+330MINUTES TO 
2011-05-05T00:00:00Z+330MINUTES+1DAY]

...but note also that there is the question of "edge inclusion" and when 
you want to use [A TO B] vs [A TO B}.  The facet.range.include option is 
how you control wether the edges are used in the facet counts...

http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.include


-Hoss

RE: Solr Range Facets

Posted by Chris Hostetter <ho...@fucit.org>.
: Thanks for explaining the point system, please find below the complete

Sorry .. that part was ment to be a joke, I think i was really tired when 
i wrote that.  The key take away: details matter.


: 					<int
: name="2011-05-02T05:30:00Z">4</int>
: 					<int
: name="2011-05-03T05:30:00Z">63</int>
: 					<int
: name="2011-05-04T05:30:00Z">0</int>
: 					<int
: name="2011-05-05T05:30:00Z">0</int>
	...
: Now if you notice that the response show 4 records for the 2th of May 2011
: which will fall in the IST timezone (+330MINUTES), but when I try to get the

right.

: results I see that there is only 1 result for the 5th why is this happening.

Why do you say that?

According to those facet results, there are 0 docs between 
2011-05-05T05:30:00Z and 2011-05-05T05:30:00Z+1DAY (which is what i 
assume you mean by "the 5th" ... ie: "May 5th, in that timezone offset")

Not only that, but the query you posted isn't attempting to filter on "the 
5th" by any possible definition of the concept...

: 			<str
: name="fq">createdOnGMTDate:[2011-05-01T00:00:00Z+330MINUTES TO *]
: 			</str>

...that's saying you want all docs with a date on or after "the 1st".

: If I don't apply the offset the results match with the facet count, is there
: something wrong in my query?

it looks like your query is just plain wrong.  if you're goal was to 
drill down and show only documents from "the 5th" it should have been 
something like...

fq = createdOnGMTDate:[2011-05-05T00:00:00Z+330MINUTES TO 2011-05-05T00:00:00Z+330MINUTES+1DAY]

...but note also that there is the question of "edge inclusion" and when 
you want to use [A TO B] vs [A TO B}.  The facet.range.include option is 
how you control wether the edges are used in the facet counts...

http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.include


-Hoss

RE: Solr Range Facets

Posted by Rohit <ro...@in-rev.com>.
Hi Chris, 

Thanks for explaining the point system, please find below the complete
problem. Hopefully I am not doing something stupid.

I am trying to facet based on date field and apply user timezone offset so
that the faceted results are in user timezone. My faceted result is given
below,

<?xml version="1.0" encoding="UTF-8"?>
<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">6</int>
		<lst name="params">
			<str name="facet">true</str>
			<str name="q">icici</str>
			<str
name="facet.range.start">2011-05-02T00:00:00Z+330MINUTES</str>
			<str name="facet.range">createdOnGMTDate</str>
			<str
name="facet.range.end">2011-05-18T00:00:00Z</str>
			<str name="facet.range.gap">+1DAY</str>
		</lst>
	</lst>
	<lst name="facet_counts">
<lst name="facet_ranges">
			<lst name="createdOnGMTDate">
				<lst name="counts">
					<int
name="2011-05-02T05:30:00Z">4</int>
					<int
name="2011-05-03T05:30:00Z">63</int>
					<int
name="2011-05-04T05:30:00Z">0</int>
					<int
name="2011-05-05T05:30:00Z">0</int>
......
				</lst>
				<str name="gap">+1DAY</str>
				<date
name="start">2011-05-02T05:30:00Z</date>
				<date name="end">2011-05-18T05:30:00Z</date>
			</lst>
		</lst>
	</lst>
</response>

Now if you notice that the response show 4 records for the 2th of May 2011
which will fall in the IST timezone (+330MINUTES), but when I try to get the
results I see that there is only 1 result for the 5th why is this happening.


<?xml version="1.0" encoding="UTF-8"?>
<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">5</int>
		<lst name="params">
			<str name="sort">createdOnGMTDate asc</str>
			<str
name="fl">createdOnGMT,createdOnGMTDate,twtText</str>
			<str
name="fq">createdOnGMTDate:[2011-05-01T00:00:00Z+330MINUTES TO *]
			</str>
			<str name="q">icici</str>
		</lst>
	</lst>
	<result name="response" numFound="67" start="0">
		<doc>
			<str name="createdOnGMT">Mon, 02 May 2011 16:27:05
+0000</str>
			<date
name="createdOnGMTDate">2011-05-02T16:27:05Z</date>
			<str name="twtText">#TechStrat615. Infosys (business
soln &amp; IT
				outsourcer) manages damages with new
chairman K.Kamath (ex ICICI
				Bank chairman) to begin Aug 21.</str>
		</doc>
		<doc>
			<str name="createdOnGMT">Mon, 02 May 2011 19:00:44
+0000</str>
			<date
name="createdOnGMTDate">2011-05-02T19:00:44Z</date>
			<str name="twtText">how to get icici mobile
banking</str>
		</doc>
		<doc>
			<str name="createdOnGMT">Tue, 03 May 2011 01:53:05
+0000</str>
			<date
name="createdOnGMTDate">2011-05-03T01:53:05Z</date>
			<str name="twtText">ICICI BANK LTD, L. M. MIRAJ
branch in SANGLI,
				MAHARASHTRA. IFSC Code: ICIC0006537, MICR
Code: ...
				http://bit.ly/fJCuWl #ifsc #micr #bank</str>
		</doc>
		<doc>
			<str name="createdOnGMT">Tue, 03 May 2011 01:53:05
+0000</str>
			<date
name="createdOnGMTDate">2011-05-03T01:53:05Z</date>
			<str name="twtText">ICICI BANK LTD, L. M. MIRAJ
branch in SANGLI,
				MAHARASHTRA. IFSC Code: ICIC0006537, MICR
Code: ...
				http://bit.ly/fJCuWl #ifsc #micr #bank</str>
		</doc>
		<doc>
			<str name="createdOnGMT">Tue, 03 May 2011 08:52:37
+0000</str>
			<date
name="createdOnGMTDate">2011-05-03T08:52:37Z</date>
			<str name="twtText">RT @nice4ufan: ICICI BANK
PERSONAL LOAN
	
http://ee4you.blogspot.com/2011/04/icici-bank-personal-loan.html
			</str>
		</doc>

If I don't apply the offset the results match with the facet count, is there
something wrong in my query?

Regards,
Rohit


P.S
-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: 14 May 2011 05:28
To: solr-user@lucene.apache.org
Subject: Re: Solr Range Facets


: I did try what you suggested, but I am not getting the expected results.
The
: code is given below,

+5 points for posting the code you tried, but -10 points for not 
explaining how the results you get are differnet from the results you 
expect, and -5 more points for not even giving an example of the results 
you did get.

In the absense of any other info about how this doesn't match your 
expecations, my hunch is it's because you left out hte crucial part of my 
suggestion...

:             query.set("facet.range.start", "2010-01-01T00:00:00Z") ;

You said you wanted the facet results to be based on the users local 
timezone, but you aren't including the "timezone offset" info that i 
mentioned you should add (Unless this example is suppose to show the 
results for a user whose local timezone is UTC)

See below...

: -----Original Message-----
: From: Chris Hostetter
	...
: Date faceting is entirely driven by query params, so if you index your 
: events using the "true" time that they happend at (formatted as a string 
: in UTC) you can then select your date ranges using whatever timezone 
: offset is specified by your user at query time as a UTC offset.

:       facet.range.start = 2011-01-01T00:00:00Z+${useroffset}MINUTES
:       facet.range.gap = +1DAY
:       etc...

-Hoss

Re: Solr Range Facets

Posted by Chris Hostetter <ho...@fucit.org>.
: I did try what you suggested, but I am not getting the expected results. The
: code is given below,

+5 points for posting the code you tried, but -10 points for not 
explaining how the results you get are differnet from the results you 
expect, and -5 more points for not even giving an example of the results 
you did get.

In the absense of any other info about how this doesn't match your 
expecations, my hunch is it's because you left out hte crucial part of my 
suggestion...

:             query.set("facet.range.start", "2010-01-01T00:00:00Z") ;

You said you wanted the facet results to be based on the users local 
timezone, but you aren't including the "timezone offset" info that i 
mentioned you should add (Unless this example is suppose to show the 
results for a user whose local timezone is UTC)

See below...

: -----Original Message-----
: From: Chris Hostetter
	...
: Date faceting is entirely driven by query params, so if you index your 
: events using the "true" time that they happend at (formatted as a string 
: in UTC) you can then select your date ranges using whatever timezone 
: offset is specified by your user at query time as a UTC offset.

:       facet.range.start = 2011-01-01T00:00:00Z+${useroffset}MINUTES
:       facet.range.gap = +1DAY
:       etc...

-Hoss