You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Bin Cai <ca...@gmail.com> on 2009/08/03 11:33:51 UTC

FYI X-RIME: Hadoop based large scale social network analysis released

*X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
network analysis*
*
Motivation*
Today's telecom service providers and Internet-based social network sites
possess huge user communities. They hold large amount of data about their
users and want to generate core competency from the data. A key enabler for
this is a cost efficient solution for social data management and social
network analysis (SNA).

Such a solution faces a few challenges. The most important one is that the
solution should be able to handle massive and heterogeneous data sets.
Facing this challenge, the traditional data warehouse based solutions are
usually not cost efficient enough. On the other hand, existing SNA tools are
mostly used in single workstation mode, and not scalable enough. To this
end, low cost and highly scalable data management and processing
technologies from cloud computing society should be brought in to help.

However, most of existing cloud based data analysis solutions are trying to
provide SQL-like general purpose query languages, and do not directly
support social network analysis. This makes them hard to optimize and hard
to use for SNA users. So, we came up with X-RIME to fix this gap.

So, briefly speaking, X-RIME wants to provide a few value-added layers on
top of existing cloud infrastructure, to support smart decision loops based
on massive data sets and SNA. To end users, X-RIME is a library consists of
Map-Reduce programs, which are used to do raw data pre-processing,
transformation, SNA metrics and structures calculation, and graph / network
visualization. The library could be integrated with other Hadoop based data
warehouses (e.g., HIVE) to build more comprehensive solutions.

*Currently Supported SNA Metrics and Structures*
vertex degree statistics
weakly connected components (WCC)
strongly connected components (SCC)
bi-connected components (BCC)
ego-centric density
bread first search / single source shortest path (BFS/SSSP)
K-core
maximal cliques
pagerank
hyperlink-induced topic search (HITS)
minimal spanning tree (MST)

Re: FYI X-RIME: Hadoop based large scale social network analysis released

Posted by "Edward J. Yoon" <ed...@apache.org>.
That's really cool. BTW, Have you tried these algorithms on the
distributed environment?

On Mon, Aug 3, 2009 at 6:33 PM, Bin Cai<ca...@gmail.com> wrote:
> *X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
> network analysis*
> *
> Motivation*
> Today's telecom service providers and Internet-based social network sites
> possess huge user communities. They hold large amount of data about their
> users and want to generate core competency from the data. A key enabler for
> this is a cost efficient solution for social data management and social
> network analysis (SNA).
>
> Such a solution faces a few challenges. The most important one is that the
> solution should be able to handle massive and heterogeneous data sets.
> Facing this challenge, the traditional data warehouse based solutions are
> usually not cost efficient enough. On the other hand, existing SNA tools are
> mostly used in single workstation mode, and not scalable enough. To this
> end, low cost and highly scalable data management and processing
> technologies from cloud computing society should be brought in to help.
>
> However, most of existing cloud based data analysis solutions are trying to
> provide SQL-like general purpose query languages, and do not directly
> support social network analysis. This makes them hard to optimize and hard
> to use for SNA users. So, we came up with X-RIME to fix this gap.
>
> So, briefly speaking, X-RIME wants to provide a few value-added layers on
> top of existing cloud infrastructure, to support smart decision loops based
> on massive data sets and SNA. To end users, X-RIME is a library consists of
> Map-Reduce programs, which are used to do raw data pre-processing,
> transformation, SNA metrics and structures calculation, and graph / network
> visualization. The library could be integrated with other Hadoop based data
> warehouses (e.g., HIVE) to build more comprehensive solutions.
>
> *Currently Supported SNA Metrics and Structures*
> vertex degree statistics
> weakly connected components (WCC)
> strongly connected components (SCC)
> bi-connected components (BCC)
> ego-centric density
> bread first search / single source shortest path (BFS/SSSP)
> K-core
> maximal cliques
> pagerank
> hyperlink-induced topic search (HITS)
> minimal spanning tree (MST)
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: FYI X-RIME: Hadoop based large scale social network analysis released

Posted by "Edward J. Yoon" <ed...@apache.org>.
That's really cool. BTW, Have you tried these algorithms on the
distributed environment?

On Mon, Aug 3, 2009 at 6:33 PM, Bin Cai<ca...@gmail.com> wrote:
> *X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
> network analysis*
> *
> Motivation*
> Today's telecom service providers and Internet-based social network sites
> possess huge user communities. They hold large amount of data about their
> users and want to generate core competency from the data. A key enabler for
> this is a cost efficient solution for social data management and social
> network analysis (SNA).
>
> Such a solution faces a few challenges. The most important one is that the
> solution should be able to handle massive and heterogeneous data sets.
> Facing this challenge, the traditional data warehouse based solutions are
> usually not cost efficient enough. On the other hand, existing SNA tools are
> mostly used in single workstation mode, and not scalable enough. To this
> end, low cost and highly scalable data management and processing
> technologies from cloud computing society should be brought in to help.
>
> However, most of existing cloud based data analysis solutions are trying to
> provide SQL-like general purpose query languages, and do not directly
> support social network analysis. This makes them hard to optimize and hard
> to use for SNA users. So, we came up with X-RIME to fix this gap.
>
> So, briefly speaking, X-RIME wants to provide a few value-added layers on
> top of existing cloud infrastructure, to support smart decision loops based
> on massive data sets and SNA. To end users, X-RIME is a library consists of
> Map-Reduce programs, which are used to do raw data pre-processing,
> transformation, SNA metrics and structures calculation, and graph / network
> visualization. The library could be integrated with other Hadoop based data
> warehouses (e.g., HIVE) to build more comprehensive solutions.
>
> *Currently Supported SNA Metrics and Structures*
> vertex degree statistics
> weakly connected components (WCC)
> strongly connected components (SCC)
> bi-connected components (BCC)
> ego-centric density
> bread first search / single source shortest path (BFS/SSSP)
> K-core
> maximal cliques
> pagerank
> hyperlink-induced topic search (HITS)
> minimal spanning tree (MST)
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: FYI X-RIME: Hadoop based large scale social network analysis released

Posted by "Edward J. Yoon" <ed...@apache.org>.
That's really cool. BTW, Have you tried these algorithms on the
distributed environment?

On Mon, Aug 3, 2009 at 6:33 PM, Bin Cai<ca...@gmail.com> wrote:
> *X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
> network analysis*
> *
> Motivation*
> Today's telecom service providers and Internet-based social network sites
> possess huge user communities. They hold large amount of data about their
> users and want to generate core competency from the data. A key enabler for
> this is a cost efficient solution for social data management and social
> network analysis (SNA).
>
> Such a solution faces a few challenges. The most important one is that the
> solution should be able to handle massive and heterogeneous data sets.
> Facing this challenge, the traditional data warehouse based solutions are
> usually not cost efficient enough. On the other hand, existing SNA tools are
> mostly used in single workstation mode, and not scalable enough. To this
> end, low cost and highly scalable data management and processing
> technologies from cloud computing society should be brought in to help.
>
> However, most of existing cloud based data analysis solutions are trying to
> provide SQL-like general purpose query languages, and do not directly
> support social network analysis. This makes them hard to optimize and hard
> to use for SNA users. So, we came up with X-RIME to fix this gap.
>
> So, briefly speaking, X-RIME wants to provide a few value-added layers on
> top of existing cloud infrastructure, to support smart decision loops based
> on massive data sets and SNA. To end users, X-RIME is a library consists of
> Map-Reduce programs, which are used to do raw data pre-processing,
> transformation, SNA metrics and structures calculation, and graph / network
> visualization. The library could be integrated with other Hadoop based data
> warehouses (e.g., HIVE) to build more comprehensive solutions.
>
> *Currently Supported SNA Metrics and Structures*
> vertex degree statistics
> weakly connected components (WCC)
> strongly connected components (SCC)
> bi-connected components (BCC)
> ego-centric density
> bread first search / single source shortest path (BFS/SSSP)
> K-core
> maximal cliques
> pagerank
> hyperlink-induced topic search (HITS)
> minimal spanning tree (MST)
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Re: FYI X-RIME: Hadoop based large scale social network analysis released

Posted by Bin Cai <ca...@gmail.com>.
Hi, Edward J. Yoon
    Sorry that I found I was not in hama-user maillist. Just joined[?]
    Xrime is based on Map/Reduce using HDFS to store graph information. It
is distributed and parallel. We will release some documents and examples
recently.
    I also noticed project Hamburg. It is interesting. The approach to store
graph in HBase would be helpful to solve some issues we found in our
solution.
    Thank you for your attention.

Best Regards
Cai Bin

> That's really cool. BTW, Have you tried these algorithms on the
> distributed environment?
>
> On Mon, Aug 3, 2009 at 6:33 PM, Bin Cai<ca...@gmail.com> wrote:
> > *X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
> > network analysis*
> > *
> > Motivation*
> > Today's telecom service providers and Internet-based social network sites
> > possess huge user communities. They hold large amount of data about their
> > users and want to generate core competency from the data. A key enabler for
> > this is a cost efficient solution for social data management and social
> > network analysis (SNA).
> >
> > Such a solution faces a few challenges. The most important one is that the
> > solution should be able to handle massive and heterogeneous data sets.
> > Facing this challenge, the traditional data warehouse based solutions are
> > usually not cost efficient enough. On the other hand, existing SNA tools are
> > mostly used in single workstation mode, and not scalable enough. To this
> > end, low cost and highly scalable data management and processing
> > technologies from cloud computing society should be brought in to help.
> >
> > However, most of existing cloud based data analysis solutions are trying to
> > provide SQL-like general purpose query languages, and do not directly
> > support social network analysis. This makes them hard to optimize and hard
> > to use for SNA users. So, we came up with X-RIME to fix this gap.
> >
> > So, briefly speaking, X-RIME wants to provide a few value-added layers on
> > top of existing cloud infrastructure, to support smart decision loops based
> > on massive data sets and SNA. To end users, X-RIME is a library consists of
> > Map-Reduce programs, which are used to do raw data pre-processing,
> > transformation, SNA metrics and structures calculation, and graph / network
> > visualization. The library could be integrated with other Hadoop based data
> > warehouses (e.g., HIVE) to build more comprehensive solutions.>
> >
> > *Currently Supported SNA Metrics and Structures*
> > vertex degree statistics
> > weakly connected components (WCC)
> > strongly connected components (SCC)
> > bi-connected components (BCC)
> > ego-centric density
> > bread first search / single source shortest path (BFS/SSSP)
> > K-core
> > maximal cliques
> > pagerank
> > hyperlink-induced topic search (HITS)
> > minimal spanning tree (MST)
> >

Re: Re: FYI X-RIME: Hadoop based large scale social network analysis released

Posted by Bin Cai <ca...@gmail.com>.
Hi, Edward J. Yoon
    Sorry that I found I was not in hama-user maillist. Just joined[?]
    Xrime is based on Map/Reduce using HDFS to store graph information. It
is distributed and parallel. We will release some documents and examples
recently.
    I also noticed project Hamburg. It is interesting. The approach to store
graph in HBase would be helpful to solve some issues we found in our
solution.
    Thank you for your attention.

Best Regards
Cai Bin

> That's really cool. BTW, Have you tried these algorithms on the
> distributed environment?
>
> On Mon, Aug 3, 2009 at 6:33 PM, Bin Cai<ca...@gmail.com> wrote:
> > *X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
> > network analysis*
> > *
> > Motivation*
> > Today's telecom service providers and Internet-based social network sites
> > possess huge user communities. They hold large amount of data about their
> > users and want to generate core competency from the data. A key enabler for
> > this is a cost efficient solution for social data management and social
> > network analysis (SNA).
> >
> > Such a solution faces a few challenges. The most important one is that the
> > solution should be able to handle massive and heterogeneous data sets.
> > Facing this challenge, the traditional data warehouse based solutions are
> > usually not cost efficient enough. On the other hand, existing SNA tools are
> > mostly used in single workstation mode, and not scalable enough. To this
> > end, low cost and highly scalable data management and processing
> > technologies from cloud computing society should be brought in to help.
> >
> > However, most of existing cloud based data analysis solutions are trying to
> > provide SQL-like general purpose query languages, and do not directly
> > support social network analysis. This makes them hard to optimize and hard
> > to use for SNA users. So, we came up with X-RIME to fix this gap.
> >
> > So, briefly speaking, X-RIME wants to provide a few value-added layers on
> > top of existing cloud infrastructure, to support smart decision loops based
> > on massive data sets and SNA. To end users, X-RIME is a library consists of
> > Map-Reduce programs, which are used to do raw data pre-processing,
> > transformation, SNA metrics and structures calculation, and graph / network
> > visualization. The library could be integrated with other Hadoop based data
> > warehouses (e.g., HIVE) to build more comprehensive solutions.>
> >
> > *Currently Supported SNA Metrics and Structures*
> > vertex degree statistics
> > weakly connected components (WCC)
> > strongly connected components (SCC)
> > bi-connected components (BCC)
> > ego-centric density
> > bread first search / single source shortest path (BFS/SSSP)
> > K-core
> > maximal cliques
> > pagerank
> > hyperlink-induced topic search (HITS)
> > minimal spanning tree (MST)
> >

Re: Re: FYI X-RIME: Hadoop based large scale social network analysis released

Posted by Bin Cai <ca...@gmail.com>.
Hi, Edward J. Yoon
    Sorry that I found I was not in hama-user maillist. Just joined[?]
    Xrime is based on Map/Reduce using HDFS to store graph information. It
is distributed and parallel. We will release some documents and examples
recently.
    I also noticed project Hamburg. It is interesting. The approach to store
graph in HBase would be helpful to solve some issues we found in our
solution.
    Thank you for your attention.

Best Regards
Cai Bin

> That's really cool. BTW, Have you tried these algorithms on the
> distributed environment?
>
> On Mon, Aug 3, 2009 at 6:33 PM, Bin Cai<ca...@gmail.com> wrote:
> > *X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
> > network analysis*
> > *
> > Motivation*
> > Today's telecom service providers and Internet-based social network sites
> > possess huge user communities. They hold large amount of data about their
> > users and want to generate core competency from the data. A key enabler for
> > this is a cost efficient solution for social data management and social
> > network analysis (SNA).
> >
> > Such a solution faces a few challenges. The most important one is that the
> > solution should be able to handle massive and heterogeneous data sets.
> > Facing this challenge, the traditional data warehouse based solutions are
> > usually not cost efficient enough. On the other hand, existing SNA tools are
> > mostly used in single workstation mode, and not scalable enough. To this
> > end, low cost and highly scalable data management and processing
> > technologies from cloud computing society should be brought in to help.
> >
> > However, most of existing cloud based data analysis solutions are trying to
> > provide SQL-like general purpose query languages, and do not directly
> > support social network analysis. This makes them hard to optimize and hard
> > to use for SNA users. So, we came up with X-RIME to fix this gap.
> >
> > So, briefly speaking, X-RIME wants to provide a few value-added layers on
> > top of existing cloud infrastructure, to support smart decision loops based
> > on massive data sets and SNA. To end users, X-RIME is a library consists of
> > Map-Reduce programs, which are used to do raw data pre-processing,
> > transformation, SNA metrics and structures calculation, and graph / network
> > visualization. The library could be integrated with other Hadoop based data
> > warehouses (e.g., HIVE) to build more comprehensive solutions.>
> >
> > *Currently Supported SNA Metrics and Structures*
> > vertex degree statistics
> > weakly connected components (WCC)
> > strongly connected components (SCC)
> > bi-connected components (BCC)
> > ego-centric density
> > bread first search / single source shortest path (BFS/SSSP)
> > K-core
> > maximal cliques
> > pagerank
> > hyperlink-induced topic search (HITS)
> > minimal spanning tree (MST)
> >

Re: FYI X-RIME: Hadoop based large scale social network analysis released

Posted by "Edward J. Yoon" <ed...@apache.org>.
That's really cool. BTW, Have you tried these algorithms on the
distributed environment?

On Mon, Aug 3, 2009 at 6:33 PM, Bin Cai<ca...@gmail.com> wrote:
> *X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
> network analysis*
> *
> Motivation*
> Today's telecom service providers and Internet-based social network sites
> possess huge user communities. They hold large amount of data about their
> users and want to generate core competency from the data. A key enabler for
> this is a cost efficient solution for social data management and social
> network analysis (SNA).
>
> Such a solution faces a few challenges. The most important one is that the
> solution should be able to handle massive and heterogeneous data sets.
> Facing this challenge, the traditional data warehouse based solutions are
> usually not cost efficient enough. On the other hand, existing SNA tools are
> mostly used in single workstation mode, and not scalable enough. To this
> end, low cost and highly scalable data management and processing
> technologies from cloud computing society should be brought in to help.
>
> However, most of existing cloud based data analysis solutions are trying to
> provide SQL-like general purpose query languages, and do not directly
> support social network analysis. This makes them hard to optimize and hard
> to use for SNA users. So, we came up with X-RIME to fix this gap.
>
> So, briefly speaking, X-RIME wants to provide a few value-added layers on
> top of existing cloud infrastructure, to support smart decision loops based
> on massive data sets and SNA. To end users, X-RIME is a library consists of
> Map-Reduce programs, which are used to do raw data pre-processing,
> transformation, SNA metrics and structures calculation, and graph / network
> visualization. The library could be integrated with other Hadoop based data
> warehouses (e.g., HIVE) to build more comprehensive solutions.
>
> *Currently Supported SNA Metrics and Structures*
> vertex degree statistics
> weakly connected components (WCC)
> strongly connected components (SCC)
> bi-connected components (BCC)
> ego-centric density
> bread first search / single source shortest path (BFS/SSSP)
> K-core
> maximal cliques
> pagerank
> hyperlink-induced topic search (HITS)
> minimal spanning tree (MST)
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org

Re: Re: FYI X-RIME: Hadoop based large scale social network analysis released

Posted by Bin Cai <ca...@gmail.com>.
Hi, Edward J. Yoon
    Sorry that I found I was not in hama-user maillist. Just joined[?]
    Xrime is based on Map/Reduce using HDFS to store graph information. It
is distributed and parallel. We will release some documents and examples
recently.
    I also noticed project Hamburg. It is interesting. The approach to store
graph in HBase would be helpful to solve some issues we found in our
solution.
    Thank you for your attention.

Best Regards
Cai Bin

> That's really cool. BTW, Have you tried these algorithms on the
> distributed environment?
>
> On Mon, Aug 3, 2009 at 6:33 PM, Bin Cai<ca...@gmail.com> wrote:
> > *X-RIM**E**(http://xrime.sourceforge.net/): Hadoop based large scale social
> > network analysis*
> > *
> > Motivation*
> > Today's telecom service providers and Internet-based social network sites
> > possess huge user communities. They hold large amount of data about their
> > users and want to generate core competency from the data. A key enabler for
> > this is a cost efficient solution for social data management and social
> > network analysis (SNA).
> >
> > Such a solution faces a few challenges. The most important one is that the
> > solution should be able to handle massive and heterogeneous data sets.
> > Facing this challenge, the traditional data warehouse based solutions are
> > usually not cost efficient enough. On the other hand, existing SNA tools are
> > mostly used in single workstation mode, and not scalable enough. To this
> > end, low cost and highly scalable data management and processing
> > technologies from cloud computing society should be brought in to help.
> >
> > However, most of existing cloud based data analysis solutions are trying to
> > provide SQL-like general purpose query languages, and do not directly
> > support social network analysis. This makes them hard to optimize and hard
> > to use for SNA users. So, we came up with X-RIME to fix this gap.
> >
> > So, briefly speaking, X-RIME wants to provide a few value-added layers on
> > top of existing cloud infrastructure, to support smart decision loops based
> > on massive data sets and SNA. To end users, X-RIME is a library consists of
> > Map-Reduce programs, which are used to do raw data pre-processing,
> > transformation, SNA metrics and structures calculation, and graph / network
> > visualization. The library could be integrated with other Hadoop based data
> > warehouses (e.g., HIVE) to build more comprehensive solutions.>
> >
> > *Currently Supported SNA Metrics and Structures*
> > vertex degree statistics
> > weakly connected components (WCC)
> > strongly connected components (SCC)
> > bi-connected components (BCC)
> > ego-centric density
> > bread first search / single source shortest path (BFS/SSSP)
> > K-core
> > maximal cliques
> > pagerank
> > hyperlink-induced topic search (HITS)
> > minimal spanning tree (MST)
> >