You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@s2graph.apache.org by 유석종 <se...@gmail.com> on 2017/05/14 01:49:22 UTC

Design issue for biological network data

Hi, everyone!


It's very nice to e-meet you guys. I'm a researcher and developer on
bioinformatics field which is an interdisciplinary field that develops
methods and software tools for understanding biological data.
As you might know, biological research faced handling a huge amount of data
from high throughput experimentation including next-generation
sequencing (NGS) and mass spectrometry experiment.

So my team is developing a new analysis system by constructing a biological
database that is collecting all kind of biological information and relation
between the genes, protein, metabolite, and other biological entity (ex,
drug, disease, etc.). And also we want to integrate unstructured data, for
example, experimental research papers by extracting a biological entity and
interaction with text-mining tools. So final goal of our project is
creating a new analysis system by searching a biological relations with
some query that user interested in and analyze the biological network with
bioinformatics tools in our service. Because of complexity and diversity of
biological information, we realized that we have to employ a graph
database. Few project has been launched to construct a biological
interaction database with distributed graph database. One of the projects
is Bio4J (http://bio4j.com) but it's not a good solution that we expect. So
we survey another database and found the s2graph project. Now we are
converting our database that is developed with HBase into s2graph.

Actually, I'm not an expert in the database field, so I'm struggling how to
nice design the biological network database with s2graph.
As far as I know, s2graph supports index of edge, how can I index
properties value for each edge. In our case, edges have several types and a
lot of property information. So we want to store the property information
and search those property values and we try to move some property value
that we want to search for into a label. Is that ok if we store long
property information in the label? Can we search edges only with
serviceName information in srcVertices? If we have several services or
database how can we search each service at the same time? And we also want
to search vertex information with several properties information, so can we
search the property value in vertex? Do you have a plan to support
TinkerPop and Gremlin?

Thanks

Seok Jong Yu.




Seok Jong Yu, Ph.D.
Korea Institute of Science and Technology Information
phone : +82-10-5357-9547
E-mail : codegen@kisti.re.kr / seokjongyu@gmail.com

한국과학기술정보연구원
융합기술연구본부
생명의료HPC연구센터
생명의료융합기술연구실
유석종