You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Alexander Klimetschek <ak...@day.com> on 2009/01/20 18:00:46 UTC

Should rep:similar search return node itself?

Hi all,

currently, if you do a rep:similar() search [1], it will also return
the node that you specify as the base for searching other similar
nodes. Typically it will be the first search result, since the node to
itself is 100% similar. Here is an example:

Search:
//*[rep:similar(., '/content/foobar')]

Result (ordered by similarity by default):
- /content/foobar
- /content/other/similar/node
- /also/similar
- /not/so/similar
- /very/different

IMHO we should exclude the node (/content/foobar) from the result, as
I think most people would not expect it there. But since this works
that way already in several released versions (rep:similar is
available since 1.4), we have to think about use cases where code
might actually rely on this behaviour.

WDYT?

[1] http://wiki.apache.org/jackrabbit/SimilaritySearch

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Should rep:similar search return node itself?

Posted by Alexander Klimetschek <ak...@day.com>.

On Thu, Jan 22, 2009 at 4:25 PM, Marcel Reutegger
<ma...@gmx.net> wrote:
> but I guess we cannot easily change this
> behavior, because of backward compatibility

I wonder how applications could rely on this. rep:similar is kind of a
"fuzzy" query, so nothing you would rely code upon. I guess it's only
useful (and used) for full text search pages, ie. it is directly
passed on to the human user.

>. maybe we can add a third optional
> parameter that tells the function to exclude the base node?

Might be a solution, but I would prefer if we could live without that.

Maybe we should have a poll on users@ to get some insights on how
people use rep:similar.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Should rep:similar search return node itself?

Posted by Marcel Reutegger <ma...@gmx.net>.

Hi,

you are probably right, that in most cases one is not interested in the node
that was given to rep:similar. but I guess we cannot easily change this
behavior, because of backward compatibility. maybe we can add a third optional
parameter that tells the function to exclude the base node?

regards
 marcel

Alexander Klimetschek wrote:
> Hi all,
> 
> currently, if you do a rep:similar() search [1], it will also return
> the node that you specify as the base for searching other similar
> nodes. Typically it will be the first search result, since the node to
> itself is 100% similar. Here is an example:
> 
> Search:
> //*[rep:similar(., '/content/foobar')]
> 
> Result (ordered by similarity by default):
> - /content/foobar
> - /content/other/similar/node
> - /also/similar
> - /not/so/similar
> - /very/different
> 
> IMHO we should exclude the node (/content/foobar) from the result, as
> I think most people would not expect it there. But since this works
> that way already in several released versions (rep:similar is
> available since 1.4), we have to think about use cases where code
> might actually rely on this behaviour.
> 
> WDYT?
> 
> [1] http://wiki.apache.org/jackrabbit/SimilaritySearch
> 
> Regards,
> Alex
>