You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Yi ZHOU <zh...@hotmail.com> on 2015/03/24 11:37:39 UTC

Affinity propagation for Gelly

Hello everyone,

I am working on  affinity propagation implementation for Gelly (FLINK 
1707 <https://issues.apache.org/jira/browse/FLINK-1707>).    The 
algorithm passes messages between every pair of vertices (NOT every pair 
of connected vertices) in each iteration with computation complexity 
(N²*Iter), it has a memory complexity of O(N²) also. So I believe  the 
algorithm will suffer from large communication complexity, no matter how 
we distribute the graph into different machines. The simple fact is that 
the algorithm passing two kinds of message on a complement graph. I see 
some similar discussion in SPARK 
<https://issues.apache.org/jira/browse/SPARK-5832>

I found an adaptive implementation on hadoop, It runs affinity 
appropagation on the  subgraphs , then merges  these clusters into 
larger ones. 
http://www.ijeee.net/uploadfile/2014/0807/20140807114023665.pdf . It is 
not equal to the original algorithm. So,does any one know another 
distributed version or have any suggestions?


ZHOU Yi

Re: Affinity propagation for Gelly

Posted by Vasiliki Kalavri <va...@gmail.com>.

Hello Yi Zhou,

you are right that if every "data point"-vertex sends messages to every
other vertex, this algorithm is too expensive and won't scale.
However, I think that we don't need to consider all data points as
exemplars for every vertex.

Actually, you can see the input graph capturing thesimilarity relationship
between data points, i.e. if 2 vertices are connected, then they serve as
potential exemplars to each other. This way, we only need to send messages
across the existing graph edges, instead of creating a fully connected
graph. This would be equivalent to setting the similarity between vertices
that are not neighbors to a very small value.

You can also take a look at the Affinity Propagation FAQ [1], where the
following is explained:
*"...if you know beforehand that each data point has the potential to be
assigned to only a small fraction of the other data points, then the answer
is "yes". Running affinity propagation on such data is equivalent to
setting the similarity between two data points that shouldn't be linked
together to negative infinity. However, there is a version of affinity
propagation that only exchanges messages between pairs of points when the
similarity is not negative infinity. This algorithm's time and memory
requirements scale linearly with the number of similarities, which would be
NxN if a full set of pairwise similarities is input, but much, much less if
the set of similarities is sparse."*

Of course, we will have to clearly document this behavior in our
documentation and warm the users about the complexity, in case they want to
consider all exemplars.

Let me know what your thoughts are on this!

Cheers,
-Vasia.

[1]: http://www.psi.toronto.edu/affinitypropagation/faq.html

On 24 March 2015 at 11:37, Yi ZHOU <zh...@hotmail.com> wrote:

> Hello everyone,
>
> I am working on  affinity propagation implementation for Gelly (FLINK 1707
> <https://issues.apache.org/jira/browse/FLINK-1707>).    The algorithm
> passes messages between every pair of vertices (NOT every pair of connected
> vertices) in each iteration with computation complexity (N²*Iter), it has a
> memory complexity of O(N²) also. So I believe  the algorithm will suffer
> from large communication complexity, no matter how we distribute the graph
> into different machines. The simple fact is that the algorithm passing two
> kinds of message on a complement graph. I see some similar discussion in
> SPARK <https://issues.apache.org/jira/browse/SPARK-5832>
>
> I found an adaptive implementation on hadoop, It runs affinity
> appropagation on the  subgraphs , then merges  these clusters into larger
> ones. http://www.ijeee.net/uploadfile/2014/0807/20140807114023665.pdf .
> It is not equal to the original algorithm. So,does any one know another
> distributed version or have any suggestions?
>
>
> ZHOU Yi
>