You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2015/05/29 04:29:45 UTC

Number of clustering labels to show

Hi,

I'm trying to increase the number of cluster result to be shown during the
search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels
shown.

Is this the correct way to do this? I understand that setting it to 20
might not necessary mean 20 lables will be shown, as the setting is for
maximum number. But when I set this to 5, it should reduce the number of
labels to 5?

I'm using Solr 5.1.


Regards,
Edwin

Re: Number of clustering labels to show

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Thank you so much for your explanation.

On 2 June 2015 at 17:31, Alessandro Benedetti <be...@gmail.com>
wrote:

> The scope in there is to try to make clustering lighter and more related to
> the query.
> The summary produced is a fragment that is surrounding the query terms in
> the document content.
> Actually this is arguably a way to improve the quality of clusters, but for
> sure it makes the clustering operation lighter, as the content used to
> produce the clusters is much smaller than the full content.
>
> We can discuss of course if the window of text surrounding queries match is
> really helpful to cluster the documents in a more precise way.
> That is not an easy research topic, and for sure it depends strictly on the
> use cases.
> For this reason a user should decide if going with the summary ( lighter)
> approach or the more comprehensive , full content approach.
>
> Cheers
>
> 2015-06-02 3:21 GMT+01:00 Zheng Lin Edwin Yeo <ed...@gmail.com>:
>
> > Thank you so much Alessandro.
> >
> > But i do not find any difference with the quality of the clustering
> results
> > when I change the hl.fragszie to a  even though I've set my
> > carrot.produceSummary to true.
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 June 2015 at 17:31, Alessandro Benedetti <
> benedetti.alex85@gmail.com>
> > wrote:
> >
> > > Only to clarify the initial mail, The carrot.fragSize has nothing to do
> > > with the number of clusters produced.
> > >
> > > When you select to work with field summary ( you will work only on
> > snippets
> > > from the original content, snippets produced by the highlight of the
> > query
> > > in the content), the fragSize will specify the size of these fragments.
> > >
> > > From Carrot documentation :
> > >
> > > carrot.produceSummary
> > >
> > > When true, the carrot.snippet
> > > <https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet>
> field
> > > (if
> > > no snippet field, then the carrot.title
> > > <https://wiki.apache.org/solr/ClusteringComponent#carrot.title> field)
> > > will
> > > be highlighted and the highlighted text will be used for clustering.
> > > Highlighting is recommended when the snippet field contains a lot of
> > > content. Highlighting can also increase the quality of clustering
> because
> > > the clustered content will get an additional query-specific context.
> > > carrot.fragSize
> > >
> > > The frag size to use for highlighting. Meaningful only when
> > > carrot.produceSummary
> > > <
> https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary>
> > > is
> > > true. If not specified, the default highlighting fragsize (hl.fragsize)
> > > will be used. If that isn't specified, then 100.
> > >
> > >
> > > Cheers
> > >
> > > 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo <ed...@gmail.com>:
> > >
> > > > Thank you Stanislaw for the links. Will read them up to better
> > understand
> > > > how the algorithm works.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > > On 29 May 2015 at 17:22, Stanislaw Osinski <
> > > > stanislaw.osinski@carrotsearch.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The number of clusters primarily depends on the parameters of the
> > > > specific
> > > > > clustering algorithm. If you're using the default Lingo algorithm,
> > the
> > > > > number of clusters is governed by
> > > > > the LingoClusteringAlgorithm.desiredClusterCountBase parameter.
> Take
> > a
> > > > look
> > > > > at the documentation (
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> > > > > )
> > > > > for some more details (the "Tweaking at Query-Time" section shows
> how
> > > to
> > > > > pass the specific parameters at request time). A complete overview
> of
> > > the
> > > > > Lingo clustering algorithm parameters is here:
> > > > > http://doc.carrot2.org/#section.component.lingo.
> > > > >
> > > > > Stanislaw
> > > > >
> > > > > --
> > > > > Stanislaw Osinski, stanislaw.osinski@carrotsearch.com
> > > > > http://carrotsearch.com
> > > > >
> > > > > On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <
> > > > edwinyeozl@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm trying to increase the number of cluster result to be shown
> > > during
> > > > > the
> > > > > > search. I tried to set carrot.fragSize=20 but only 15 cluster
> > labels
> > > is
> > > > > > shown. Even when I tried to set carrot.fragSize=5, there's also
> 15
> > > > labels
> > > > > > shown.
> > > > > >
> > > > > > Is this the correct way to do this? I understand that setting it
> to
> > > 20
> > > > > > might not necessary mean 20 lables will be shown, as the setting
> is
> > > for
> > > > > > maximum number. But when I set this to 5, it should reduce the
> > number
> > > > of
> > > > > > labels to 5?
> > > > > >
> > > > > > I'm using Solr 5.1.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Number of clustering labels to show

Posted by Alessandro Benedetti <be...@gmail.com>.
The scope in there is to try to make clustering lighter and more related to
the query.
The summary produced is a fragment that is surrounding the query terms in
the document content.
Actually this is arguably a way to improve the quality of clusters, but for
sure it makes the clustering operation lighter, as the content used to
produce the clusters is much smaller than the full content.

We can discuss of course if the window of text surrounding queries match is
really helpful to cluster the documents in a more precise way.
That is not an easy research topic, and for sure it depends strictly on the
use cases.
For this reason a user should decide if going with the summary ( lighter)
approach or the more comprehensive , full content approach.

Cheers

2015-06-02 3:21 GMT+01:00 Zheng Lin Edwin Yeo <ed...@gmail.com>:

> Thank you so much Alessandro.
>
> But i do not find any difference with the quality of the clustering results
> when I change the hl.fragszie to a  even though I've set my
> carrot.produceSummary to true.
>
>
> Regards,
> Edwin
>
>
> On 1 June 2015 at 17:31, Alessandro Benedetti <be...@gmail.com>
> wrote:
>
> > Only to clarify the initial mail, The carrot.fragSize has nothing to do
> > with the number of clusters produced.
> >
> > When you select to work with field summary ( you will work only on
> snippets
> > from the original content, snippets produced by the highlight of the
> query
> > in the content), the fragSize will specify the size of these fragments.
> >
> > From Carrot documentation :
> >
> > carrot.produceSummary
> >
> > When true, the carrot.snippet
> > <https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet> field
> > (if
> > no snippet field, then the carrot.title
> > <https://wiki.apache.org/solr/ClusteringComponent#carrot.title> field)
> > will
> > be highlighted and the highlighted text will be used for clustering.
> > Highlighting is recommended when the snippet field contains a lot of
> > content. Highlighting can also increase the quality of clustering because
> > the clustered content will get an additional query-specific context.
> > carrot.fragSize
> >
> > The frag size to use for highlighting. Meaningful only when
> > carrot.produceSummary
> > <https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary>
> > is
> > true. If not specified, the default highlighting fragsize (hl.fragsize)
> > will be used. If that isn't specified, then 100.
> >
> >
> > Cheers
> >
> > 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo <ed...@gmail.com>:
> >
> > > Thank you Stanislaw for the links. Will read them up to better
> understand
> > > how the algorithm works.
> > >
> > > Regards,
> > > Edwin
> > >
> > > On 29 May 2015 at 17:22, Stanislaw Osinski <
> > > stanislaw.osinski@carrotsearch.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > The number of clusters primarily depends on the parameters of the
> > > specific
> > > > clustering algorithm. If you're using the default Lingo algorithm,
> the
> > > > number of clusters is governed by
> > > > the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take
> a
> > > look
> > > > at the documentation (
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> > > > )
> > > > for some more details (the "Tweaking at Query-Time" section shows how
> > to
> > > > pass the specific parameters at request time). A complete overview of
> > the
> > > > Lingo clustering algorithm parameters is here:
> > > > http://doc.carrot2.org/#section.component.lingo.
> > > >
> > > > Stanislaw
> > > >
> > > > --
> > > > Stanislaw Osinski, stanislaw.osinski@carrotsearch.com
> > > > http://carrotsearch.com
> > > >
> > > > On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <
> > > edwinyeozl@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm trying to increase the number of cluster result to be shown
> > during
> > > > the
> > > > > search. I tried to set carrot.fragSize=20 but only 15 cluster
> labels
> > is
> > > > > shown. Even when I tried to set carrot.fragSize=5, there's also 15
> > > labels
> > > > > shown.
> > > > >
> > > > > Is this the correct way to do this? I understand that setting it to
> > 20
> > > > > might not necessary mean 20 lables will be shown, as the setting is
> > for
> > > > > maximum number. But when I set this to 5, it should reduce the
> number
> > > of
> > > > > labels to 5?
> > > > >
> > > > > I'm using Solr 5.1.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Number of clustering labels to show

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Thank you so much Alessandro.

But i do not find any difference with the quality of the clustering results
when I change the hl.fragszie to a  even though I've set my
carrot.produceSummary to true.


Regards,
Edwin


On 1 June 2015 at 17:31, Alessandro Benedetti <be...@gmail.com>
wrote:

> Only to clarify the initial mail, The carrot.fragSize has nothing to do
> with the number of clusters produced.
>
> When you select to work with field summary ( you will work only on snippets
> from the original content, snippets produced by the highlight of the query
> in the content), the fragSize will specify the size of these fragments.
>
> From Carrot documentation :
>
> carrot.produceSummary
>
> When true, the carrot.snippet
> <https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet> field
> (if
> no snippet field, then the carrot.title
> <https://wiki.apache.org/solr/ClusteringComponent#carrot.title> field)
> will
> be highlighted and the highlighted text will be used for clustering.
> Highlighting is recommended when the snippet field contains a lot of
> content. Highlighting can also increase the quality of clustering because
> the clustered content will get an additional query-specific context.
> carrot.fragSize
>
> The frag size to use for highlighting. Meaningful only when
> carrot.produceSummary
> <https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary>
> is
> true. If not specified, the default highlighting fragsize (hl.fragsize)
> will be used. If that isn't specified, then 100.
>
>
> Cheers
>
> 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo <ed...@gmail.com>:
>
> > Thank you Stanislaw for the links. Will read them up to better understand
> > how the algorithm works.
> >
> > Regards,
> > Edwin
> >
> > On 29 May 2015 at 17:22, Stanislaw Osinski <
> > stanislaw.osinski@carrotsearch.com> wrote:
> >
> > > Hi,
> > >
> > > The number of clusters primarily depends on the parameters of the
> > specific
> > > clustering algorithm. If you're using the default Lingo algorithm, the
> > > number of clusters is governed by
> > > the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a
> > look
> > > at the documentation (
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> > > )
> > > for some more details (the "Tweaking at Query-Time" section shows how
> to
> > > pass the specific parameters at request time). A complete overview of
> the
> > > Lingo clustering algorithm parameters is here:
> > > http://doc.carrot2.org/#section.component.lingo.
> > >
> > > Stanislaw
> > >
> > > --
> > > Stanislaw Osinski, stanislaw.osinski@carrotsearch.com
> > > http://carrotsearch.com
> > >
> > > On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <
> > edwinyeozl@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm trying to increase the number of cluster result to be shown
> during
> > > the
> > > > search. I tried to set carrot.fragSize=20 but only 15 cluster labels
> is
> > > > shown. Even when I tried to set carrot.fragSize=5, there's also 15
> > labels
> > > > shown.
> > > >
> > > > Is this the correct way to do this? I understand that setting it to
> 20
> > > > might not necessary mean 20 lables will be shown, as the setting is
> for
> > > > maximum number. But when I set this to 5, it should reduce the number
> > of
> > > > labels to 5?
> > > >
> > > > I'm using Solr 5.1.
> > > >
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Number of clustering labels to show

Posted by Alessandro Benedetti <be...@gmail.com>.
Only to clarify the initial mail, The carrot.fragSize has nothing to do
with the number of clusters produced.

When you select to work with field summary ( you will work only on snippets
from the original content, snippets produced by the highlight of the query
in the content), the fragSize will specify the size of these fragments.

>From Carrot documentation :

carrot.produceSummary

When true, the carrot.snippet
<https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet> field (if
no snippet field, then the carrot.title
<https://wiki.apache.org/solr/ClusteringComponent#carrot.title> field) will
be highlighted and the highlighted text will be used for clustering.
Highlighting is recommended when the snippet field contains a lot of
content. Highlighting can also increase the quality of clustering because
the clustered content will get an additional query-specific context.
carrot.fragSize

The frag size to use for highlighting. Meaningful only when
carrot.produceSummary
<https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary> is
true. If not specified, the default highlighting fragsize (hl.fragsize)
will be used. If that isn't specified, then 100.


Cheers

2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo <ed...@gmail.com>:

> Thank you Stanislaw for the links. Will read them up to better understand
> how the algorithm works.
>
> Regards,
> Edwin
>
> On 29 May 2015 at 17:22, Stanislaw Osinski <
> stanislaw.osinski@carrotsearch.com> wrote:
>
> > Hi,
> >
> > The number of clusters primarily depends on the parameters of the
> specific
> > clustering algorithm. If you're using the default Lingo algorithm, the
> > number of clusters is governed by
> > the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a
> look
> > at the documentation (
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> > )
> > for some more details (the "Tweaking at Query-Time" section shows how to
> > pass the specific parameters at request time). A complete overview of the
> > Lingo clustering algorithm parameters is here:
> > http://doc.carrot2.org/#section.component.lingo.
> >
> > Stanislaw
> >
> > --
> > Stanislaw Osinski, stanislaw.osinski@carrotsearch.com
> > http://carrotsearch.com
> >
> > On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I'm trying to increase the number of cluster result to be shown during
> > the
> > > search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
> > > shown. Even when I tried to set carrot.fragSize=5, there's also 15
> labels
> > > shown.
> > >
> > > Is this the correct way to do this? I understand that setting it to 20
> > > might not necessary mean 20 lables will be shown, as the setting is for
> > > maximum number. But when I set this to 5, it should reduce the number
> of
> > > labels to 5?
> > >
> > > I'm using Solr 5.1.
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Number of clustering labels to show

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Thank you Stanislaw for the links. Will read them up to better understand
how the algorithm works.

Regards,
Edwin

On 29 May 2015 at 17:22, Stanislaw Osinski <
stanislaw.osinski@carrotsearch.com> wrote:

> Hi,
>
> The number of clusters primarily depends on the parameters of the specific
> clustering algorithm. If you're using the default Lingo algorithm, the
> number of clusters is governed by
> the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a look
> at the documentation (
>
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> )
> for some more details (the "Tweaking at Query-Time" section shows how to
> pass the specific parameters at request time). A complete overview of the
> Lingo clustering algorithm parameters is here:
> http://doc.carrot2.org/#section.component.lingo.
>
> Stanislaw
>
> --
> Stanislaw Osinski, stanislaw.osinski@carrotsearch.com
> http://carrotsearch.com
>
> On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > I'm trying to increase the number of cluster result to be shown during
> the
> > search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
> > shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels
> > shown.
> >
> > Is this the correct way to do this? I understand that setting it to 20
> > might not necessary mean 20 lables will be shown, as the setting is for
> > maximum number. But when I set this to 5, it should reduce the number of
> > labels to 5?
> >
> > I'm using Solr 5.1.
> >
> >
> > Regards,
> > Edwin
> >
>

Re: Number of clustering labels to show

Posted by Stanislaw Osinski <st...@carrotsearch.com>.
Hi,

The number of clusters primarily depends on the parameters of the specific
clustering algorithm. If you're using the default Lingo algorithm, the
number of clusters is governed by
the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a look
at the documentation (
https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings)
for some more details (the "Tweaking at Query-Time" section shows how to
pass the specific parameters at request time). A complete overview of the
Lingo clustering algorithm parameters is here:
http://doc.carrot2.org/#section.component.lingo.

Stanislaw

--
Stanislaw Osinski, stanislaw.osinski@carrotsearch.com
http://carrotsearch.com

On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi,
>
> I'm trying to increase the number of cluster result to be shown during the
> search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
> shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels
> shown.
>
> Is this the correct way to do this? I understand that setting it to 20
> might not necessary mean 20 lables will be shown, as the setting is for
> maximum number. But when I set this to 5, it should reduce the number of
> labels to 5?
>
> I'm using Solr 5.1.
>
>
> Regards,
> Edwin
>