You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/02/09 15:59:11 UTC

Re: [jira] Created: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

On Feb 9, 2010, at 9:21 AM, Zacarias wrote:

> Hi,
> 
> I want to solve the
> https://issues.apache.org/jira/browse/SOLR-1713improvment but I have
> some questions. If somebody can give a little
> orientation should be great.
> 
> What the issue says is "Query rows=10 but cluster on more"?
> If this is what it says, the idea is to solve using results or collection
> part of the ClusteringComponent. (Because Collection part uses
> DocumentEngine, which is in experimental state).
> If the user wants to cluster on more rows, should I query twice or just
> query by the biggest quantity of rows and then reduce the number at the end?

I think we want to avoid querying twice.  I would query by the max of rows and a new parameter (cluster_rows? internal_rows?  Other?) and then reduce the number at the end.  It's a little tricky, b/c we likely don't want to couple the QueryComponent to the ClusterComponent, so we may want to make this just a wee bit more generic.

-Grant

Re: [jira] Created: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

Posted by Zacarias <zm...@gmail.com>.
Hi,

I figured out a solution putting all the logic in ClusterComponent. This
component manages the logic between rows parameter and new rows parameters.
The second one is the parameter which defines rows count to be clustered.

I've attached a  propousal in Jira. At the end describes an alternative
solution if the first fails to meet the initial expectations

Regards,
Zacarias




On Tue, Feb 9, 2010 at 11:59 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Feb 9, 2010, at 9:21 AM, Zacarias wrote:
>
> > Hi,
> >
> > I want to solve the
> > https://issues.apache.org/jira/browse/SOLR-1713improvment but I have
> > some questions. If somebody can give a little
> > orientation should be great.
> >
> > What the issue says is "Query rows=10 but cluster on more"?
> > If this is what it says, the idea is to solve using results or collection
> > part of the ClusteringComponent. (Because Collection part uses
> > DocumentEngine, which is in experimental state).
> > If the user wants to cluster on more rows, should I query twice or just
> > query by the biggest quantity of rows and then reduce the number at the
> end?
>
> I think we want to avoid querying twice.  I would query by the max of rows
> and a new parameter (cluster_rows? internal_rows?  Other?) and then reduce
> the number at the end.  It's a little tricky, b/c we likely don't want to
> couple the QueryComponent to the ClusterComponent, so we may want to make
> this just a wee bit more generic.
>
> -Grant