You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2014/08/22 07:23:11 UTC

[jira] [Comment Edited] (SOLR-6187) facet.mincount ignored in range faceting using distributed search

    [ https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106479#comment-14106479 ] 

Erick Erickson edited comment on SOLR-6187 at 8/22/14 5:22 AM:
---------------------------------------------------------------

OK, a patch with tests. It's not ready to commit, but I'd like some comments.

The approach just doesn't feel very good. Unfortunately I don't see any good way to apply mincounts to distributed facets except to post-process the fully-counted list. I hope I did it in the right place. Suggestions for other approaches welcome!

I've got some code in there that processes (deprecated) date facets that is really questionable. Problem is that there's no good way to tell where the counts are for an old-style date facet. You get pairs like
"2001-01-01T00:00:00Z"
34 (as an integer)

which is fine, but pretty soon you get things like
"gap"
"+1YEAR" (as a String)

The current code says "if the value is an integer, it must be a count so test it" which is bogus at best. I propose to rip it out and just not support date facets for this issue, but wanted people to see it. I'd rather have people who want mincounts for dates use the modern range faceting rather than introduce this kind of fragility and, perhaps, unintended consequences.

This patch doesn't handle Pivot Facets, Interval Facets or Query Facets. The first one scares me, the second one I just haven't figured out how to test yet and the third doesn't strike me as very important since there aren't likely to be very many individual Query Facets. I may tackle Query and Interval if this approach isn't shot down.

I'm not very willing to try to put this in 4.10, even if the approach is OK. I'd like to get this more in the 4.11 time-frame if at all.

Probably handles SOLR-6154 and SOLR-6300 although I haven't tested those yet.

All tests pass, although I haven't run precommit yet.


was (Author: erickerickson):
OK, a patch with tests. It's not ready to commit, but I'd like some comments.

The approach just doesn't feel very good. Unfortunately I don't see any good way to apply mincounts to distributed facets except to post-process the fully-counted list. I hope I did it in the right place. Suggestions for other approaches welcome!

I've got some code in there that processes (deprecated) date facets that is really questionable. Problem is that there's no good way to tell where the counts are for an old-style date facet. You get pairs like
"2001-01-01T00:00:00Z"
34 (as an integer)

which is fine, but pretty soon you get things like
"gap"
"+1YEAR" (as a String)

The current code says "if the value is an integer, it must be a count so test it" which is bogus at best. I propose to rip it out and just not support date facets for this issue, but wanted people to see it. I'd rather have people who want mincounts for dates use the modern range faceting rather than introduce this kind of fragility and, perhaps, unintended consequences.

This patch doesn't handle Pivot Facets, Interval Facets or Query Facets. The first one scares me, the second one I just haven't figured out how to test yet and the third doesn't strike me as very important since there aren't likely to be very many individual Query Facets. I may tackle Query and Interval if this approach isn't shot down.

I'm not very willing to try to put this in 4.10, even if the approach is OK. I'd like to get this more in the 4.11 time-frame if at all.

Probably handles SOLR-6154 and SOLR-6300 although I haven't tested those yet.

> facet.mincount ignored in range faceting using distributed search
> -----------------------------------------------------------------
>
>                 Key: SOLR-6187
>                 URL: https://issues.apache.org/jira/browse/SOLR-6187
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 4.8, 4.8.1
>            Reporter: Zaccheo Bagnati
>            Assignee: Erick Erickson
>         Attachments: SOLR-6187.patch
>
>
> While I was trying to do a range faceting with gap +1YEAR using shards, I noticed that facet.mincount parameter seems to be ignored.
> Issue can be reproduced in this way:
> Create 2 cores "testshard1" and "testshard2" with:
> solrconfig.xml
> <?xml version="1.0" encoding="UTF-8" ?>
> <config>
>   <luceneMatchVersion>LUCENE_41</luceneMatchVersion>
>   <lib dir="/opt/solr/dist" regex="solr-cell-.*\.jar"/>
>   <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>   <updateHandler class="solr.DirectUpdateHandler2" />
>   <requestHandler name="/select" class="solr.SearchHandler">
>      <lst name="defaults">
>        <str name="echoParams">explicit</str>
>        <int name="rows">10</int>
>        <str name="df">id</str>
>      </lst>
>   </requestHandler>
>   <requestHandler name="/update" class="solr.UpdateRequestHandler"  />
>   <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
>   <requestHandler name="/admin/ping" class="solr.PingRequestHandler">
>     <lst name="invariants">
>       <str name="q">solrpingquery</str>
>     </lst>
>     <lst name="defaults">
>       <str name="echoParams">all</str>
>     </lst>
>   </requestHandler>
> </config>
> schema.xml
> <?xml version="1.0" ?>
> <schema name="${solr.core.name}" version="1.5" xmlns:xi="http://www.w3.org/2001/XInclude">
>   <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
>   <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
>   <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
>   <field name="_version_" type="long"     indexed="true"  stored="true"/>
>   <field name="id" type="int" indexed="true" stored="true" multiValued="false" />
>   <field name="date" type="date" indexed="true" stored="true" multiValued="false" />
>   <uniqueKey>id</uniqueKey>
>   <defaultSearchField>id</defaultSearchField>
> </schema>
> Insert in testshard1:
> <add>
>  <doc>
>   <field name="id">1</field>
>   <field name="date">2014-06-20T12:51:00Z</field>
>  </doc>
> </add>
> Insert into testshard2:
> <add>
>  <doc>
>   <field name="id">2</field>
>   <field name="date">2013-06-20T12:51:00Z</field>
>  </doc>
> </add>
> Now if I execute:
> curl "http://localhost:8983/solr/testshard1/select?q=id:1&facet=true&facet.mincount=1&facet.range=date&f.date.facet.range.start=1900-01-01T00:00:00Z&f.date.facet.range.end=NOW&f.date.facet.range.gap=%2B1YEAR&shards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2&shards.info=true&wt=json"
> I obtain:
> {"responseHeader":{"status":0,"QTime":88,"params":{"f.date.facet.range.gap":"+1YEAR","f.date.facet.range.start":"1900-01-01T00:00:00Z","facet":"true","shards":"localhost:8983/solr/testshard1,localhost:8983/solr/testshard2","facet.mincount":"1","q":"id:1","shards.info":"true","facet.range":"date","wt":"json","f.date.facet.range.end":"NOW"}},"shards.info":{"localhost:8983/solr/testshard2":{"numFound":0,"maxScore":0.0,"shardAddress":"http://localhost:8983/solr/testshard2","time":76},"localhost:8983/solr/testshard1":{"numFound":1,"maxScore":0.30685282,"shardAddress":"http://localhost:8983/solr/testshard1","time":79}},"response":{"numFound":1,"start":0,"maxScore":0.30685282,"docs":[{"id":1,"date":"2014-06-20T12:51:00Z"}]},"facet_counts":{"facet_queries":{},"facet_fields":{},"facet_dates":{},"facet_ranges":{"date":{"counts":["1900-01-01T00:00:00Z",0,"1901-01-01T00:00:00Z",0,"1902-01-01T00:00:00Z",0,"1903-01-01T00:00:00Z",0,"1904-01-01T00:00:00Z",0,"1905-01-01T00:00:00Z",0,"1906-01-01T00:00:00Z",0,"1907-01-01T00:00:00Z",0,"1908-01-01T00:00:00Z",0,"1909-01-01T00:00:00Z",0,"1910-01-01T00:00:00Z",0,"1911-01-01T00:00:00Z",0,"1912-01-01T00:00:00Z",0,"1913-01-01T00:00:00Z",0,"1914-01-01T00:00:00Z",0,"1915-01-01T00:00:00Z",0,"1916-01-01T00:00:00Z",0,"1917-01-01T00:00:00Z",0,"1918-01-01T00:00:00Z",0,"1919-01-01T00:00:00Z",0,"1920-01-01T00:00:00Z",0,"1921-01-01T00:00:00Z",0,"1922-01-01T00:00:00Z",0,"1923-01-01T00:00:00Z",0,"1924-01-01T00:00:00Z",0,"1925-01-01T00:00:00Z",0,"1926-01-01T00:00:00Z",0,"1927-01-01T00:00:00Z",0,"1928-01-01T00:00:00Z",0,"1929-01-01T00:00:00Z",0,"1930-01-01T00:00:00Z",0,"1931-01-01T00:00:00Z",0,"1932-01-01T00:00:00Z",0,"1933-01-01T00:00:00Z",0,"1934-01-01T00:00:00Z",0,"1935-01-01T00:00:00Z",0,"1936-01-01T00:00:00Z",0,"1937-01-01T00:00:00Z",0,"1938-01-01T00:00:00Z",0,"1939-01-01T00:00:00Z",0,"1940-01-01T00:00:00Z",0,"1941-01-01T00:00:00Z",0,"1942-01-01T00:00:00Z",0,"1943-01-01T00:00:00Z",0,"1944-01-01T00:00:00Z",0,"1945-01-01T00:00:00Z",0,"1946-01-01T00:00:00Z",0,"1947-01-01T00:00:00Z",0,"1948-01-01T00:00:00Z",0,"1949-01-01T00:00:00Z",0,"1950-01-01T00:00:00Z",0,"1951-01-01T00:00:00Z",0,"1952-01-01T00:00:00Z",0,"1953-01-01T00:00:00Z",0,"1954-01-01T00:00:00Z",0,"1955-01-01T00:00:00Z",0,"1956-01-01T00:00:00Z",0,"1957-01-01T00:00:00Z",0,"1958-01-01T00:00:00Z",0,"1959-01-01T00:00:00Z",0,"1960-01-01T00:00:00Z",0,"1961-01-01T00:00:00Z",0,"1962-01-01T00:00:00Z",0,"1963-01-01T00:00:00Z",0,"1964-01-01T00:00:00Z",0,"1965-01-01T00:00:00Z",0,"1966-01-01T00:00:00Z",0,"1967-01-01T00:00:00Z",0,"1968-01-01T00:00:00Z",0,"1969-01-01T00:00:00Z",0,"1970-01-01T00:00:00Z",0,"1971-01-01T00:00:00Z",0,"1972-01-01T00:00:00Z",0,"1973-01-01T00:00:00Z",0,"1974-01-01T00:00:00Z",0,"1975-01-01T00:00:00Z",0,"1976-01-01T00:00:00Z",0,"1977-01-01T00:00:00Z",0,"1978-01-01T00:00:00Z",0,"1979-01-01T00:00:00Z",0,"1980-01-01T00:00:00Z",0,"1981-01-01T00:00:00Z",0,"1982-01-01T00:00:00Z",0,"1983-01-01T00:00:00Z",0,"1984-01-01T00:00:00Z",0,"1985-01-01T00:00:00Z",0,"1986-01-01T00:00:00Z",0,"1987-01-01T00:00:00Z",0,"1988-01-01T00:00:00Z",0,"1989-01-01T00:00:00Z",0,"1990-01-01T00:00:00Z",0,"1991-01-01T00:00:00Z",0,"1992-01-01T00:00:00Z",0,"1993-01-01T00:00:00Z",0,"1994-01-01T00:00:00Z",0,"1995-01-01T00:00:00Z",0,"1996-01-01T00:00:00Z",0,"1997-01-01T00:00:00Z",0,"1998-01-01T00:00:00Z",0,"1999-01-01T00:00:00Z",0,"2000-01-01T00:00:00Z",0,"2001-01-01T00:00:00Z",0,"2002-01-01T00:00:00Z",0,"2003-01-01T00:00:00Z",0,"2004-01-01T00:00:00Z",0,"2005-01-01T00:00:00Z",0,"2006-01-01T00:00:00Z",0,"2007-01-01T00:00:00Z",0,"2008-01-01T00:00:00Z",0,"2009-01-01T00:00:00Z",0,"2010-01-01T00:00:00Z",0,"2011-01-01T00:00:00Z",0,"2012-01-01T00:00:00Z",0,"2013-01-01T00:00:00Z",0,"2014-01-01T00:00:00Z",1],"gap":"+1YEAR","start":"1900-01-01T00:00:00Z","end":"2015-01-01T00:00:00Z"}}}}
> though facet.mincount is set to 1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org