You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "integer [daniel prawdzik]" <in...@trist.de> on 2005/01/26 18:17:24 UTC

-> Grouping Search Results by Clustering Snippets:

Grouping Search Results by Clustering Snippets:

The presentation of search engines are typically long unsorted lists of
results. To find the page you’re looking for, is often time-consuming
and unsatisfying. 
Showing the results in groups by similar  topics is a quite more
suitable solution to give an user a quick overview over the results.
This can be done by a technology called cluster analysis. Actually I’m
working on my diploma master thesis about this topic. In my
understanding, it’s too nice to be born for the archive, so I want to
implement this feature in an opensource software. The coding of this
programm already gone pretty far, I’ve got some tests done and the
results are impresive and might still get better [you can see some
results on http://www.trist.de/CV/Text-Mining/ -> sorry, only in german]

To make a long story short: 
I’m wondering, if this is an attractive feature for the lucene
community?

regards,
integer


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


RE: -> Grouping Search Results by Clustering Snippets:

Posted by Otis Gospodnetic <ot...@yahoo.com>.
This is very much of interest to me.  Although it's not in the UI, I
did integrate Lucene and Carrot2 in Simpy ( http://www.simpy.com ). 
Clustering is currently triggered only by a search.  Although you may
not be able to tell (again, sucky UI) Simpy is designed in a way that
will let me hook in a recommender system, much like you describe it. 
Users store links into their Simpy accounts, they tag them, perform
searches, find other users, add them to their Topics (Simpy-specific
thing), and so on, so there is a lot of knowledge about a user that can
be derived from all that.  Currently, the only quasi-smart thing that
goes beyond a simple search is 'More users like this', and even that
has a small bug that I need to fix for the next release, but what you
are describing sounds very much like one of the directions in which I
want to take Simpy and its users. :)

Otis


--- Adam Saltiel <ad...@btinternet.com> wrote:

> This has been implemented in open source, but not with lucene?
> http://www.cs.put.poznan.pl/dweiss/carrot/
> and
> http://carrot2.sourceforge.net/
> David Weiss is a Polish academic at Poznan University, Poland. He and
> others have implemented a servlet based web app that uses pipe lined
> components that communicate using http and implement a couple of
> clustering algorithms.
> Clustering, of course, can go way beyond search result presentation
> and
> there are some very suggestive examples at
> http://www.sics.se/humle/socialcomputing/
> Where the encore project (Martin Svennson) is based on orthogonal
> transformations of a large sparse matrix (a possible method for
> matrix
> dimension reduction). I think it would be interesting to hook a
> recommender system into lucene, thus clustering would take place on
> the
> basis of user profile which may be built up automatically by
> accumulating clicks and comparing to other visitors, with some
> intelligent weighting to node inputs.
> This calls into question what really a search is, does it have to be
> instigated by the user or might their context and history suggest
> enough
> to pull in additional material? So this would be on top of snippets
> and
> also influence what snippets are returned as well as their
> presentation.
> Coller still would be to be able to recognise the user without a
> login.
> This might be implemented with cookies, but to deal with the user in
> terms of types of interests, a series of faceted profiles, so that
> portals could become fluidly dynamic. Sounds far flung, but I
> actually
> think it is just round the corner.
> Let me know if this is of interest.
> 
> Adam
> 
> > -----Original Message-----
> > From: integer [daniel prawdzik] [mailto:integer@trist.de]
> > Sent: Wednesday, January 26, 2005 5:17 PM
> > To: lucene-dev@jakarta.apache.org
> > Subject: -> Grouping Search Results by Clustering Snippets:
> >
> > Grouping Search Results by Clustering Snippets:
> >
> > The presentation of search engines are typically long unsorted
> lists
> of
> > results. To find the page you�re looking for, is often
> time-consuming
> > and unsatisfying.
> > Showing the results in groups by similar  topics is a quite more
> > suitable solution to give an user a quick overview over the
> results.
> > This can be done by a technology called cluster analysis. Actually
> I�m
> > working on my diploma master thesis about this topic. In my
> > understanding, it�s too nice to be born for the archive, so I want
> to
> > implement this feature in an opensource software. The coding of
> this
> > programm already gone pretty far, I�ve got some tests done and the
> > results are impresive and might still get better [you can see some
> > results on http://www.trist.de/CV/Text-Mining/ -> sorry, only in
> german]
> >
> > To make a long story short:
> > I�m wondering, if this is an attractive feature for the lucene
> > community?
> >
> > regards,
> > integer
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


RE: -> Grouping Search Results by Clustering Snippets:

Posted by Adam Saltiel <ad...@btinternet.com>.
This has been implemented in open source, but not with lucene?
http://www.cs.put.poznan.pl/dweiss/carrot/
and
http://carrot2.sourceforge.net/
David Weiss is a Polish academic at Poznan University, Poland. He and
others have implemented a servlet based web app that uses pipe lined
components that communicate using http and implement a couple of
clustering algorithms.
Clustering, of course, can go way beyond search result presentation and
there are some very suggestive examples at
http://www.sics.se/humle/socialcomputing/
Where the encore project (Martin Svennson) is based on orthogonal
transformations of a large sparse matrix (a possible method for matrix
dimension reduction). I think it would be interesting to hook a
recommender system into lucene, thus clustering would take place on the
basis of user profile which may be built up automatically by
accumulating clicks and comparing to other visitors, with some
intelligent weighting to node inputs.
This calls into question what really a search is, does it have to be
instigated by the user or might their context and history suggest enough
to pull in additional material? So this would be on top of snippets and
also influence what snippets are returned as well as their presentation.
Coller still would be to be able to recognise the user without a login.
This might be implemented with cookies, but to deal with the user in
terms of types of interests, a series of faceted profiles, so that
portals could become fluidly dynamic. Sounds far flung, but I actually
think it is just round the corner.
Let me know if this is of interest.

Adam

> -----Original Message-----
> From: integer [daniel prawdzik] [mailto:integer@trist.de]
> Sent: Wednesday, January 26, 2005 5:17 PM
> To: lucene-dev@jakarta.apache.org
> Subject: -> Grouping Search Results by Clustering Snippets:
>
> Grouping Search Results by Clustering Snippets:
>
> The presentation of search engines are typically long unsorted lists
of
> results. To find the page you�re looking for, is often time-consuming
> and unsatisfying.
> Showing the results in groups by similar  topics is a quite more
> suitable solution to give an user a quick overview over the results.
> This can be done by a technology called cluster analysis. Actually I�m
> working on my diploma master thesis about this topic. In my
> understanding, it�s too nice to be born for the archive, so I want to
> implement this feature in an opensource software. The coding of this
> programm already gone pretty far, I�ve got some tests done and the
> results are impresive and might still get better [you can see some
> results on http://www.trist.de/CV/Text-Mining/ -> sorry, only in
german]
>
> To make a long story short:
> I�m wondering, if this is an attractive feature for the lucene
> community?
>
> regards,
> integer
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: -> Grouping Search Results by Clustering Snippets:

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Herr Integer ;)

Yes it is - very interesting!
We are working on establishing lucene.apache.org - Lucene as a top
level Apache project, which could serve as a good home for projects
like yours.

If you could remind us after we make the lucene.apache.org move, we
could try to get your project in there.

Otis


--- "integer [daniel prawdzik]" <in...@trist.de> wrote:

> Grouping Search Results by Clustering Snippets:
> 
> The presentation of search engines are typically long unsorted lists
> of
> results. To find the page you�re looking for, is often time-consuming
> and unsatisfying. 
> Showing the results in groups by similar  topics is a quite more
> suitable solution to give an user a quick overview over the results.
> This can be done by a technology called cluster analysis. Actually
> I�m
> working on my diploma master thesis about this topic. In my
> understanding, it�s too nice to be born for the archive, so I want to
> implement this feature in an opensource software. The coding of this
> programm already gone pretty far, I�ve got some tests done and the
> results are impresive and might still get better [you can see some
> results on http://www.trist.de/CV/Text-Mining/ -> sorry, only in
> german]
> 
> To make a long story short: 
> I�m wondering, if this is an attractive feature for the lucene
> community?
> 
> regards,
> integer
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org