You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2010/01/09 22:25:54 UTC

[jira] Created: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size
-------------------------------------------------------------------------------------------------------

                 Key: SOLR-1713
                 URL: https://issues.apache.org/jira/browse/SOLR-1713
             Project: Solr
          Issue Type: Improvement
          Components: contrib - Clustering
            Reporter: Grant Ingersoll


It would be nice if, in the Carrot2 clustering, we could only return 10 rows as part of the query, but cluster on more.  Alternatively, it may even make sense to be able to cluster on the DocSet, too.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Created: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

Posted by Zacarias <zm...@gmail.com>.

Hi,

I figured out a solution putting all the logic in ClusterComponent. This
component manages the logic between rows parameter and new rows parameters.
The second one is the parameter which defines rows count to be clustered.

I've attached a  propousal in Jira. At the end describes an alternative
solution if the first fails to meet the initial expectations

Regards,
Zacarias




On Tue, Feb 9, 2010 at 11:59 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Feb 9, 2010, at 9:21 AM, Zacarias wrote:
>
> > Hi,
> >
> > I want to solve the
> > https://issues.apache.org/jira/browse/SOLR-1713improvment but I have
> > some questions. If somebody can give a little
> > orientation should be great.
> >
> > What the issue says is "Query rows=10 but cluster on more"?
> > If this is what it says, the idea is to solve using results or collection
> > part of the ClusteringComponent. (Because Collection part uses
> > DocumentEngine, which is in experimental state).
> > If the user wants to cluster on more rows, should I query twice or just
> > query by the biggest quantity of rows and then reduce the number at the
> end?
>
> I think we want to avoid querying twice.  I would query by the max of rows
> and a new parameter (cluster_rows? internal_rows?  Other?) and then reduce
> the number at the end.  It's a little tricky, b/c we likely don't want to
> couple the QueryComponent to the ClusterComponent, so we may want to make
> this just a wee bit more generic.
>
> -Grant

Re: [jira] Created: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

Posted by Grant Ingersoll <gs...@apache.org>.

On Feb 9, 2010, at 9:21 AM, Zacarias wrote:

> Hi,
> 
> I want to solve the
> https://issues.apache.org/jira/browse/SOLR-1713improvment but I have
> some questions. If somebody can give a little
> orientation should be great.
> 
> What the issue says is "Query rows=10 but cluster on more"?
> If this is what it says, the idea is to solve using results or collection
> part of the ClusteringComponent. (Because Collection part uses
> DocumentEngine, which is in experimental state).
> If the user wants to cluster on more rows, should I query twice or just
> query by the biggest quantity of rows and then reduce the number at the end?

I think we want to avoid querying twice.  I would query by the max of rows and a new parameter (cluster_rows? internal_rows?  Other?) and then reduce the number at the end.  It's a little tricky, b/c we likely don't want to couple the QueryComponent to the ClusterComponent, so we may want to make this just a wee bit more generic.

-Grant

Re: [jira] Created: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

Posted by Zacarias <zm...@gmail.com>.

Hi,

I want to solve the
https://issues.apache.org/jira/browse/SOLR-1713improvment but I have
some questions. If somebody can give a little
orientation should be great.

What the issue says is "Query rows=10 but cluster on more"?
If this is what it says, the idea is to solve using results or collection
part of the ClusteringComponent. (Because Collection part uses
DocumentEngine, which is in experimental state).
If the user wants to cluster on more rows, should I query twice or just
query by the biggest quantity of rows and then reduce the number at the end?

Regards,
Zacarias.

On Sat, Jan 9, 2010 at 6:25 PM, Grant Ingersoll (JIRA) <ji...@apache.org>wrote:

> Carrot2 Clustering should have an option to cluster on a different number
> of rows than the DocList size
>
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1713
>                 URL: https://issues.apache.org/jira/browse/SOLR-1713
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Clustering
>            Reporter: Grant Ingersoll
>
>
> It would be nice if, in the Carrot2 clustering, we could only return 10
> rows as part of the query, but cluster on more.  Alternatively, it may even
> make sense to be able to cluster on the DocSet, too.
>
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Updated: (SOLR-1713) Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size

Posted by "Zacarias (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zacarias updated SOLR-1713:
---------------------------

    Attachment: Solr-1713 Improvement Proposal.pdf

Propousal of improvement. 
Explains a posible solution, advantajes and warnings. At the end describes an alternative solution to have a different view of difficulties and source impact.

> Carrot2 Clustering should have an option to cluster on a different number of rows than the DocList size
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1713
>                 URL: https://issues.apache.org/jira/browse/SOLR-1713
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Clustering
>            Reporter: Grant Ingersoll
>         Attachments: Solr-1713 Improvement Proposal.pdf
>
>
> It would be nice if, in the Carrot2 clustering, we could only return 10 rows as part of the query, but cluster on more.  Alternatively, it may even make sense to be able to cluster on the DocSet, too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.