You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2014/09/06 00:53:28 UTC

[jira] [Commented] (SOLR-6187) facet.mincount ignored in range faceting using distributed search

    [ https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123743#comment-14123743 ] 

Erick Erickson commented on SOLR-6187:
--------------------------------------

About date ranges. Since they're deprecated and the same functionality can be had with range facets and the fix would be fragile, I'm not going to include fixing date ranges for mincount and distributed processing.

The problem is this: The response packet is jumbled together, so you might have something like
2001-01-01T00:00:00Z 78
2002-01-01T00:00:00Z 33
gap: +1YEAR
start: "2000-01-01T00:00:00Z"

So recognizing which parts of the response are numbers associated with dates is fragile. They are pairs, but even trying to recognize the date format regex will fail if the labels are changed with the "key" trick.

And the objects are different in the response. In the above, 78 and 33 are integers, but not, of course, "+1YEAR". I did try a hack that said, essentially, "if it's a date facet and if the second member of the pair is an integer, it must be a count so see if it exceeds mincount and remove it if not". Yuuuuck.

None of this applies to non-distributed modes since the things being examined for mincount are numerics _before_ they're put into a response packet. For distributed, we have to do things after they're put in a response packet and collated.

> facet.mincount ignored in range faceting using distributed search
> -----------------------------------------------------------------
>
>                 Key: SOLR-6187
>                 URL: https://issues.apache.org/jira/browse/SOLR-6187
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 4.8, 4.8.1
>            Reporter: Zaccheo Bagnati
>            Assignee: Erick Erickson
>         Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch
>
>
> While I was trying to do a range faceting with gap +1YEAR using shards, I noticed that facet.mincount parameter seems to be ignored.
> Issue can be reproduced in this way:
> Create 2 cores "testshard1" and "testshard2" with:
> solrconfig.xml
> <?xml version="1.0" encoding="UTF-8" ?>
> <config>
>   <luceneMatchVersion>LUCENE_41</luceneMatchVersion>
>   <lib dir="/opt/solr/dist" regex="solr-cell-.*\.jar"/>
>   <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>   <updateHandler class="solr.DirectUpdateHandler2" />
>   <requestHandler name="/select" class="solr.SearchHandler">
>      <lst name="defaults">
>        <str name="echoParams">explicit</str>
>        <int name="rows">10</int>
>        <str name="df">id</str>
>      </lst>
>   </requestHandler>
>   <requestHandler name="/update" class="solr.UpdateRequestHandler"  />
>   <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
>   <requestHandler name="/admin/ping" class="solr.PingRequestHandler">
>     <lst name="invariants">
>       <str name="q">solrpingquery</str>
>     </lst>
>     <lst name="defaults">
>       <str name="echoParams">all</str>
>     </lst>
>   </requestHandler>
> </config>
> schema.xml
> <?xml version="1.0" ?>
> <schema name="${solr.core.name}" version="1.5" xmlns:xi="http://www.w3.org/2001/XInclude">
>   <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
>   <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
>   <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
>   <field name="_version_" type="long"     indexed="true"  stored="true"/>
>   <field name="id" type="int" indexed="true" stored="true" multiValued="false" />
>   <field name="date" type="date" indexed="true" stored="true" multiValued="false" />
>   <uniqueKey>id</uniqueKey>
>   <defaultSearchField>id</defaultSearchField>
> </schema>
> Insert in testshard1:
> <add>
>  <doc>
>   <field name="id">1</field>
>   <field name="date">2014-06-20T12:51:00Z</field>
>  </doc>
> </add>
> Insert into testshard2:
> <add>
>  <doc>
>   <field name="id">2</field>
>   <field name="date">2013-06-20T12:51:00Z</field>
>  </doc>
> </add>
> Now if I execute:
> curl "http://localhost:8983/solr/testshard1/select?q=id:1&facet=true&facet.mincount=1&facet.range=date&f.date.facet.range.start=1900-01-01T00:00:00Z&f.date.facet.range.end=NOW&f.date.facet.range.gap=%2B1YEAR&shards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2&shards.info=true&wt=json"
> I obtain:
> {"responseHeader":{"status":0,"QTime":88,"params":{"f.date.facet.range.gap":"+1YEAR","f.date.facet.range.start":"1900-01-01T00:00:00Z","facet":"true","shards":"localhost:8983/solr/testshard1,localhost:8983/solr/testshard2","facet.mincount":"1","q":"id:1","shards.info":"true","facet.range":"date","wt":"json","f.date.facet.range.end":"NOW"}},"shards.info":{"localhost:8983/solr/testshard2":{"numFound":0,"maxScore":0.0,"shardAddress":"http://localhost:8983/solr/testshard2","time":76},"localhost:8983/solr/testshard1":{"numFound":1,"maxScore":0.30685282,"shardAddress":"http://localhost:8983/solr/testshard1","time":79}},"response":{"numFound":1,"start":0,"maxScore":0.30685282,"docs":[{"id":1,"date":"2014-06-20T12:51:00Z"}]},"facet_counts":{"facet_queries":{},"facet_fields":{},"facet_dates":{},"facet_ranges":{"date":{"counts":["1900-01-01T00:00:00Z",0,"1901-01-01T00:00:00Z",0,"1902-01-01T00:00:00Z",0,"1903-01-01T00:00:00Z",0,"1904-01-01T00:00:00Z",0,"1905-01-01T00:00:00Z",0,"1906-01-01T00:00:00Z",0,"1907-01-01T00:00:00Z",0,"1908-01-01T00:00:00Z",0,"1909-01-01T00:00:00Z",0,"1910-01-01T00:00:00Z",0,"1911-01-01T00:00:00Z",0,"1912-01-01T00:00:00Z",0,"1913-01-01T00:00:00Z",0,"1914-01-01T00:00:00Z",0,"1915-01-01T00:00:00Z",0,"1916-01-01T00:00:00Z",0,"1917-01-01T00:00:00Z",0,"1918-01-01T00:00:00Z",0,"1919-01-01T00:00:00Z",0,"1920-01-01T00:00:00Z",0,"1921-01-01T00:00:00Z",0,"1922-01-01T00:00:00Z",0,"1923-01-01T00:00:00Z",0,"1924-01-01T00:00:00Z",0,"1925-01-01T00:00:00Z",0,"1926-01-01T00:00:00Z",0,"1927-01-01T00:00:00Z",0,"1928-01-01T00:00:00Z",0,"1929-01-01T00:00:00Z",0,"1930-01-01T00:00:00Z",0,"1931-01-01T00:00:00Z",0,"1932-01-01T00:00:00Z",0,"1933-01-01T00:00:00Z",0,"1934-01-01T00:00:00Z",0,"1935-01-01T00:00:00Z",0,"1936-01-01T00:00:00Z",0,"1937-01-01T00:00:00Z",0,"1938-01-01T00:00:00Z",0,"1939-01-01T00:00:00Z",0,"1940-01-01T00:00:00Z",0,"1941-01-01T00:00:00Z",0,"1942-01-01T00:00:00Z",0,"1943-01-01T00:00:00Z",0,"1944-01-01T00:00:00Z",0,"1945-01-01T00:00:00Z",0,"1946-01-01T00:00:00Z",0,"1947-01-01T00:00:00Z",0,"1948-01-01T00:00:00Z",0,"1949-01-01T00:00:00Z",0,"1950-01-01T00:00:00Z",0,"1951-01-01T00:00:00Z",0,"1952-01-01T00:00:00Z",0,"1953-01-01T00:00:00Z",0,"1954-01-01T00:00:00Z",0,"1955-01-01T00:00:00Z",0,"1956-01-01T00:00:00Z",0,"1957-01-01T00:00:00Z",0,"1958-01-01T00:00:00Z",0,"1959-01-01T00:00:00Z",0,"1960-01-01T00:00:00Z",0,"1961-01-01T00:00:00Z",0,"1962-01-01T00:00:00Z",0,"1963-01-01T00:00:00Z",0,"1964-01-01T00:00:00Z",0,"1965-01-01T00:00:00Z",0,"1966-01-01T00:00:00Z",0,"1967-01-01T00:00:00Z",0,"1968-01-01T00:00:00Z",0,"1969-01-01T00:00:00Z",0,"1970-01-01T00:00:00Z",0,"1971-01-01T00:00:00Z",0,"1972-01-01T00:00:00Z",0,"1973-01-01T00:00:00Z",0,"1974-01-01T00:00:00Z",0,"1975-01-01T00:00:00Z",0,"1976-01-01T00:00:00Z",0,"1977-01-01T00:00:00Z",0,"1978-01-01T00:00:00Z",0,"1979-01-01T00:00:00Z",0,"1980-01-01T00:00:00Z",0,"1981-01-01T00:00:00Z",0,"1982-01-01T00:00:00Z",0,"1983-01-01T00:00:00Z",0,"1984-01-01T00:00:00Z",0,"1985-01-01T00:00:00Z",0,"1986-01-01T00:00:00Z",0,"1987-01-01T00:00:00Z",0,"1988-01-01T00:00:00Z",0,"1989-01-01T00:00:00Z",0,"1990-01-01T00:00:00Z",0,"1991-01-01T00:00:00Z",0,"1992-01-01T00:00:00Z",0,"1993-01-01T00:00:00Z",0,"1994-01-01T00:00:00Z",0,"1995-01-01T00:00:00Z",0,"1996-01-01T00:00:00Z",0,"1997-01-01T00:00:00Z",0,"1998-01-01T00:00:00Z",0,"1999-01-01T00:00:00Z",0,"2000-01-01T00:00:00Z",0,"2001-01-01T00:00:00Z",0,"2002-01-01T00:00:00Z",0,"2003-01-01T00:00:00Z",0,"2004-01-01T00:00:00Z",0,"2005-01-01T00:00:00Z",0,"2006-01-01T00:00:00Z",0,"2007-01-01T00:00:00Z",0,"2008-01-01T00:00:00Z",0,"2009-01-01T00:00:00Z",0,"2010-01-01T00:00:00Z",0,"2011-01-01T00:00:00Z",0,"2012-01-01T00:00:00Z",0,"2013-01-01T00:00:00Z",0,"2014-01-01T00:00:00Z",1],"gap":"+1YEAR","start":"1900-01-01T00:00:00Z","end":"2015-01-01T00:00:00Z"}}}}
> though facet.mincount is set to 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org