You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@marmotta.apache.org by Dileepa Jayakody <di...@gmail.com> on 2014/03/10 21:35:36 UTC

[GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Hi All,

I'm Dileepa a research student from University of Moratuwa, Sri Lanka with
keen interest in the linked-data and semantic-web domains. I have worked
with linked-data related projects such as Apache Stanbol and I'm
experienced with related technologies like RDF, SPARQL, FOAF etc. I'm very
much interested in applying for GSoC this year with Apache Marmotta.

I would like to open up a discussion on the OpenRefine integration project
idea [1]. AFAIU, the goal of this project is to import data to Marmotta
triple store (to Kiwi triple-store by default) from OpenRefine after the
data has been refined and exported.

I did some background reading on Marmotta data import process [2] which
explains different ways to import RDF data to back-end triple store.
Currently OpenRefine exports data in several formats: csv, tsv, xsl, html
tables. So I think the main task of this project will be to convert this
exported data into RDF format and make it compatible to Marmotta data
import process. I did a quick research on how to do so and there are
several options to convert such data to RDF.

They are,
1. RDF extension to OpenRefine : https://github.com/sparkica/rdf-extension
2. RDF refine : http://refine.deri.ie/
3. D2R server http://d2rq.org/d2r-server (if OpenRefine data is imported
from a SQL database)

Apart from the data conversion process from OpenRefine to RDF, what are the
other tasks to be done in this project?
Appreciate your thoughts and suggestions.

Thanks,
Dileepa

[1] https://issues.apache.org/jira/browse/MARMOTTA-202
[2] http://wiki.apache.org/marmotta/ImportData
[3]
https://github.com/OpenRefine/OpenRefine/wiki/Exporters#exporting-projects

Re: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Posted by Dileepa Jayakody <di...@gmail.com>.
Hi Raffaele et All,

I did some background reading on DCAT vocab and RDF-Refine tool asked RDF
Refine developers if they support DCAT for export RDF feature (waiting for
their response).

I also see that there is a feature in RDF Refine's export RDF "Use your own
vocabulary or import existing ones. With that, I feel DCAT vocabulary can
be mapped to the exporting data and can be exported as a separate RDF file
from the RDF-Refine tool itself. So exporting a data set will be 2 fold; 1.
Exporting the data set in RDF format and 2.Exporting the DCAT metadata
relevant to that data set as  RDF file.

So when importing the dataset to Marmotta we will have to process both the
dataset RDF and the associated DCAT metadata RDF.

WDYT?

Thanks,
Dileepa


[1] http://refine.deri.ie/


On Sat, Mar 15, 2014 at 5:44 PM, Raffaele Palmieri <
raffaele.palmieri@gmail.com> wrote:

> Hi to all, any feedback for Open Refine's integration? I particularly ask
> to Sergio who initiated Jira issue.
> Cheers,
> Raffaele.
>
> ---------- Messaggio inoltrato ----------
> Da: *Raffaele Palmieri* <ra...@gmail.com>
> Data: giovedì 13 marzo 2014
> Oggetto: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine
> A: dev@marmotta.apache.org
>
>
> Hi Dileepa,
>
>
> On 13 March 2014 20:28, Dileepa Jayakody
> <dileepajayakody@gmail.com<javascript:_e(%7B%7D,'cvml','
> dileepajayakody@gmail.com');>
> > wrote:
>
> > Hi Raffaele,
> >
> > Thanks again for your suggestions.
> > I think it will be a great addition to this project, to make the data
> > imported from Openrefine interoperable with other datasets in Marmotta. I
> > will followup with OpenRefine community to check whether they support
> DCAT
> > vocab in there latest release. If it doesn't support DCAT do you think
> > implementing DCAT support in OpenRefine a task within this project scope?
> >
> >
> Basically yes, It could be a task within this project scope. I think that a
> preliminary check is needed within RDF Refine.
>
>
> > On Thu, Mar 13, 2014 at 4:21 PM, Raffaele Palmieri <
> > raffaele.palmieri@gmail.com<javascript:_e(%7B%7D,'cvml','
> raffaele.palmieri@gmail.com');>>
> > wrote:
> >
> > > Hi Dileepa,
> > > some thoughts that I also share with other Marmotta's team members
> > > regarding integration with Open Refine.
> > > For the second level of integration, that fundamentally exports towards
> > > Marmotta both CSV and other data to produce RDF, it would be
> interesting
> > > try to add the functionality in Open Refine to supply additional data
> to
> > > dataset, using for example DCAT Vocabulary [1].
> > > I don't remember if this feature is covered by GRefine RDF Extension,
> of
> > > that it's present a new release(ALPHA 0.9.0) [2]
> > > If dataset is supplied with DCAT metadata, Marmotta could expose it to
> > > facilitate its interoperability with other datasets.
> > > To do that, Marmotta needs to store also structured datasets, not
> > > necessarily instantiated in RDF triples.
> > >
> >
> > I think Marmotta's Kiwi Tripple Store can be connected to RDBMS back ends
> > (MySQL, Postgres, H2), therefore above requirement of storing structured
> > data in Marmotta's backend is fulfilled. Please correct me if I'm wrong.
> >
> >
>   No, Kiwi Triple Store doesn't manage simple structured files(e.g. CSV),
> but rightly only instances of triples.
>   The storage I mean is quite simple, also file system could be used,
> rightly to retrieve it from Marmotta at a later time using for example
> dcat:downloadUrl. Clearly this dataset is a copy of that tooled with
> Refine, that could be overwritten anytime.
>
>
>
> > In summary I think we are looking at 2 main tasks now.
> > 1. Ability to import data from OpenRefine process
> >
>
>  Yes, in addition to linked dataset(4&5 stars) also structured dataset with
> simpler formats(CSV, etc.) furnished for example of DCAT metadata.
>
>
> > 2. Ability to configure the imported OpenRefine data inter-operable with
> > other datasets in Marmotta (potentially using DCAT vocab)
> >
>
> Yes, with the possibility to retrieve them from Marmotta, so also 3 stars
> datasets.
>
>
> >
> > More ideas, suggestions are mostly welcome.
> >
> >
> Before that you prepare the proposal, we should seek advice to the
> Marmotta's team.
>
>
> >
> > Thanks,
> > Dileepa
> >
> >
> Regards,
> Raffaele.
>
>
>
>
> > What do you think about?
> > Regards,
> > Raffaele.
> >
> >
> > [1] http://www.w3.org/TR/vocab-dcat/
> > [2] https://github.com/fadmaa/grefine-rdf-extension/releases/tag/v0.9.0
> >
> >
> > On 11 March 2014 10:29, Dileepa Jayakody <di...@gmail.com>
> > wrote:
> >
> > > Thank you very much Raffaele for the detailed explaination.
> > >
> > > I will do some more background research on Marmotta data import and
> > > OpenRefine and come up with questions and ideas I get.
> > >
> > > Also any new suggestions, directions to evolve this project idea are
> > > welcome.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > On Tue, Mar 11, 2014 at 3:14 AM, Raffaele Palmieri <
> > > raffaele.palmieri@gmail.com> wrote:
> > >
> > > > Hi Dileepa,
> > > > pleased to meet you and know your interest for contributing to
> > Marmotta.
> > > > As discussed in Marmotta's mailing list, this integration could be
> > > reached
> > > > at various levels.
> > > > A first level is reached refining your messy data with Refine tools,
> > > using
> > > > RDF extension, that already offers a graphical UI to model RDF data
> > > > producing an RDF skeleton and then import new data in Marmotta,
> > compliant
> > > > to the created skeleton .
> > > > This integration mode has been implemented in the past using [1] but
> > > needs
> > > > to be updated because:
> > > > 1)Google Refine became Open Refine
> > > > 2)LMF became Marmotta in its linked-data core functionalities
> > > > This update also requires work about project configuration, because
> > Open
> > > > Refine has a different configuration than Apache Marmotta.
> > > > Whatever kind of integration could be achieved, I think that work
> about
> > > > project configuration is required.
> > > > A second level of integration is reached if you break up RDF in CSV
> and
> > > set
> > > > of RDF mappings(aka RDF skeleton).
> > > > So, starting from exported project that contains CSV and related
> > actions
> > > to
> > > > produce RDF Skeleton, the integration expects to produce final RDF in
> > > > Marmotta's world, probably performing similar steps as GRefine RDF
> > > > Extension.
> > > > For that second level of integration, export functionality and RDF
> > > skeleton
> > > > should be explored to verify what is easily exportable.
> > > > At the moment, these are the hypothesis of integration, clearly the
> > > second
> > > > appears to be more complex, but also the first brings non-trivial
> work.
> > > > Since you have experience on other projects related to Semantic Web,
> as
> > > > Apache Stanbol, feel free to propose other hypothesis of integration,
> > > > regards,
> > > > Raffaele.
> > > >
> > > > [1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension
> > > >
> > > >
> > > >
> > > >
> > > > On 10 March 2014 21:35, Dileepa Jayakody <di...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'm Dileepa a research student from University of Moratuwa, Sri
> Lanka
> > > > with
> > > > > keen interest in the linked-data and semantic-web domains. I have
> > > worked
> > > > > with linked-data related projects such as Apache Stanbol and I'm
> > > > > experienced with related technologies like RDF, SPARQL, FOAF etc.
> I'm
> > > > very
> > > > > much interested in applying for GSoC this year with Apache
> Marmotta.
> > > > >
> > > > > I would like to open up a discussion on the OpenRefine integration
> > > > project
> > > > > idea [1]. AFAIU, the goal of this project is to impo
>

Fwd: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Posted by Raffaele Palmieri <ra...@gmail.com>.
Hi to all, any feedback for Open Refine's integration? I particularly ask
to Sergio who initiated Jira issue.
Cheers,
Raffaele.

---------- Messaggio inoltrato ----------
Da: *Raffaele Palmieri* <ra...@gmail.com>
Data: giovedì 13 marzo 2014
Oggetto: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine
A: dev@marmotta.apache.org


Hi Dileepa,


On 13 March 2014 20:28, Dileepa Jayakody
<dileepajayakody@gmail.com<javascript:_e(%7B%7D,'cvml','dileepajayakody@gmail.com');>
> wrote:

> Hi Raffaele,
>
> Thanks again for your suggestions.
> I think it will be a great addition to this project, to make the data
> imported from Openrefine interoperable with other datasets in Marmotta. I
> will followup with OpenRefine community to check whether they support DCAT
> vocab in there latest release. If it doesn't support DCAT do you think
> implementing DCAT support in OpenRefine a task within this project scope?
>
>
Basically yes, It could be a task within this project scope. I think that a
preliminary check is needed within RDF Refine.


> On Thu, Mar 13, 2014 at 4:21 PM, Raffaele Palmieri <
> raffaele.palmieri@gmail.com<javascript:_e(%7B%7D,'cvml','raffaele.palmieri@gmail.com');>>
> wrote:
>
> > Hi Dileepa,
> > some thoughts that I also share with other Marmotta's team members
> > regarding integration with Open Refine.
> > For the second level of integration, that fundamentally exports towards
> > Marmotta both CSV and other data to produce RDF, it would be interesting
> > try to add the functionality in Open Refine to supply additional data to
> > dataset, using for example DCAT Vocabulary [1].
> > I don't remember if this feature is covered by GRefine RDF Extension, of
> > that it's present a new release(ALPHA 0.9.0) [2]
> > If dataset is supplied with DCAT metadata, Marmotta could expose it to
> > facilitate its interoperability with other datasets.
> > To do that, Marmotta needs to store also structured datasets, not
> > necessarily instantiated in RDF triples.
> >
>
> I think Marmotta's Kiwi Tripple Store can be connected to RDBMS back ends
> (MySQL, Postgres, H2), therefore above requirement of storing structured
> data in Marmotta's backend is fulfilled. Please correct me if I'm wrong.
>
>
  No, Kiwi Triple Store doesn't manage simple structured files(e.g. CSV),
but rightly only instances of triples.
  The storage I mean is quite simple, also file system could be used,
rightly to retrieve it from Marmotta at a later time using for example
dcat:downloadUrl. Clearly this dataset is a copy of that tooled with
Refine, that could be overwritten anytime.



> In summary I think we are looking at 2 main tasks now.
> 1. Ability to import data from OpenRefine process
>

 Yes, in addition to linked dataset(4&5 stars) also structured dataset with
simpler formats(CSV, etc.) furnished for example of DCAT metadata.


> 2. Ability to configure the imported OpenRefine data inter-operable with
> other datasets in Marmotta (potentially using DCAT vocab)
>

Yes, with the possibility to retrieve them from Marmotta, so also 3 stars
datasets.


>
> More ideas, suggestions are mostly welcome.
>
>
Before that you prepare the proposal, we should seek advice to the
Marmotta's team.


>
> Thanks,
> Dileepa
>
>
Regards,
Raffaele.




> What do you think about?
> Regards,
> Raffaele.
>
>
> [1] http://www.w3.org/TR/vocab-dcat/
> [2] https://github.com/fadmaa/grefine-rdf-extension/releases/tag/v0.9.0
>
>
> On 11 March 2014 10:29, Dileepa Jayakody <di...@gmail.com>
> wrote:
>
> > Thank you very much Raffaele for the detailed explaination.
> >
> > I will do some more background research on Marmotta data import and
> > OpenRefine and come up with questions and ideas I get.
> >
> > Also any new suggestions, directions to evolve this project idea are
> > welcome.
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Tue, Mar 11, 2014 at 3:14 AM, Raffaele Palmieri <
> > raffaele.palmieri@gmail.com> wrote:
> >
> > > Hi Dileepa,
> > > pleased to meet you and know your interest for contributing to
> Marmotta.
> > > As discussed in Marmotta's mailing list, this integration could be
> > reached
> > > at various levels.
> > > A first level is reached refining your messy data with Refine tools,
> > using
> > > RDF extension, that already offers a graphical UI to model RDF data
> > > producing an RDF skeleton and then import new data in Marmotta,
> compliant
> > > to the created skeleton .
> > > This integration mode has been implemented in the past using [1] but
> > needs
> > > to be updated because:
> > > 1)Google Refine became Open Refine
> > > 2)LMF became Marmotta in its linked-data core functionalities
> > > This update also requires work about project configuration, because
> Open
> > > Refine has a different configuration than Apache Marmotta.
> > > Whatever kind of integration could be achieved, I think that work
about
> > > project configuration is required.
> > > A second level of integration is reached if you break up RDF in CSV
and
> > set
> > > of RDF mappings(aka RDF skeleton).
> > > So, starting from exported project that contains CSV and related
> actions
> > to
> > > produce RDF Skeleton, the integration expects to produce final RDF in
> > > Marmotta's world, probably performing similar steps as GRefine RDF
> > > Extension.
> > > For that second level of integration, export functionality and RDF
> > skeleton
> > > should be explored to verify what is easily exportable.
> > > At the moment, these are the hypothesis of integration, clearly the
> > second
> > > appears to be more complex, but also the first brings non-trivial
work.
> > > Since you have experience on other projects related to Semantic Web,
as
> > > Apache Stanbol, feel free to propose other hypothesis of integration,
> > > regards,
> > > Raffaele.
> > >
> > > [1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension
> > >
> > >
> > >
> > >
> > > On 10 March 2014 21:35, Dileepa Jayakody <di...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'm Dileepa a research student from University of Moratuwa, Sri
Lanka
> > > with
> > > > keen interest in the linked-data and semantic-web domains. I have
> > worked
> > > > with linked-data related projects such as Apache Stanbol and I'm
> > > > experienced with related technologies like RDF, SPARQL, FOAF etc.
I'm
> > > very
> > > > much interested in applying for GSoC this year with Apache Marmotta.
> > > >
> > > > I would like to open up a discussion on the OpenRefine integration
> > > project
> > > > idea [1]. AFAIU, the goal of this project is to impo

Re: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Posted by Raffaele Palmieri <ra...@gmail.com>.
Hi Dileepa,


On 13 March 2014 20:28, Dileepa Jayakody <di...@gmail.com> wrote:

> Hi Raffaele,
>
> Thanks again for your suggestions.
> I think it will be a great addition to this project, to make the data
> imported from Openrefine interoperable with other datasets in Marmotta. I
> will followup with OpenRefine community to check whether they support DCAT
> vocab in there latest release. If it doesn't support DCAT do you think
> implementing DCAT support in OpenRefine a task within this project scope?
>
>
Basically yes, It could be a task within this project scope. I think that a
preliminary check is needed within RDF Refine.


> On Thu, Mar 13, 2014 at 4:21 PM, Raffaele Palmieri <
> raffaele.palmieri@gmail.com> wrote:
>
> > Hi Dileepa,
> > some thoughts that I also share with other Marmotta's team members
> > regarding integration with Open Refine.
> > For the second level of integration, that fundamentally exports towards
> > Marmotta both CSV and other data to produce RDF, it would be interesting
> > try to add the functionality in Open Refine to supply additional data to
> > dataset, using for example DCAT Vocabulary [1].
> > I don't remember if this feature is covered by GRefine RDF Extension, of
> > that it's present a new release(ALPHA 0.9.0) [2]
> > If dataset is supplied with DCAT metadata, Marmotta could expose it to
> > facilitate its interoperability with other datasets.
> > To do that, Marmotta needs to store also structured datasets, not
> > necessarily instantiated in RDF triples.
> >
>
> I think Marmotta's Kiwi Tripple Store can be connected to RDBMS back ends
> (MySQL, Postgres, H2), therefore above requirement of storing structured
> data in Marmotta's backend is fulfilled. Please correct me if I'm wrong.
>
>
  No, Kiwi Triple Store doesn't manage simple structured files(e.g. CSV),
but rightly only instances of triples.
  The storage I mean is quite simple, also file system could be used,
rightly to retrieve it from Marmotta at a later time using for example
dcat:downloadUrl. Clearly this dataset is a copy of that tooled with
Refine, that could be overwritten anytime.



> In summary I think we are looking at 2 main tasks now.
> 1. Ability to import data from OpenRefine process
>

 Yes, in addition to linked dataset(4&5 stars) also structured dataset with
simpler formats(CSV, etc.) furnished for example of DCAT metadata.


> 2. Ability to configure the imported OpenRefine data inter-operable with
> other datasets in Marmotta (potentially using DCAT vocab)
>

Yes, with the possibility to retrieve them from Marmotta, so also 3 stars
datasets.


>
> More ideas, suggestions are mostly welcome.
>
>
Before that you prepare the proposal, we should seek advice to the
Marmotta's team.


>
> Thanks,
> Dileepa
>
>
Regards,
Raffaele.



>
> > What do you think about?
> > Regards,
> > Raffaele.
> >
> >
> > [1] http://www.w3.org/TR/vocab-dcat/
> > [2] https://github.com/fadmaa/grefine-rdf-extension/releases/tag/v0.9.0
> >
> >
> > On 11 March 2014 10:29, Dileepa Jayakody <di...@gmail.com>
> > wrote:
> >
> > > Thank you very much Raffaele for the detailed explaination.
> > >
> > > I will do some more background research on Marmotta data import and
> > > OpenRefine and come up with questions and ideas I get.
> > >
> > > Also any new suggestions, directions to evolve this project idea are
> > > welcome.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > On Tue, Mar 11, 2014 at 3:14 AM, Raffaele Palmieri <
> > > raffaele.palmieri@gmail.com> wrote:
> > >
> > > > Hi Dileepa,
> > > > pleased to meet you and know your interest for contributing to
> > Marmotta.
> > > > As discussed in Marmotta's mailing list, this integration could be
> > > reached
> > > > at various levels.
> > > > A first level is reached refining your messy data with Refine tools,
> > > using
> > > > RDF extension, that already offers a graphical UI to model RDF data
> > > > producing an RDF skeleton and then import new data in Marmotta,
> > compliant
> > > > to the created skeleton .
> > > > This integration mode has been implemented in the past using [1] but
> > > needs
> > > > to be updated because:
> > > > 1)Google Refine became Open Refine
> > > > 2)LMF became Marmotta in its linked-data core functionalities
> > > > This update also requires work about project configuration, because
> > Open
> > > > Refine has a different configuration than Apache Marmotta.
> > > > Whatever kind of integration could be achieved, I think that work
> about
> > > > project configuration is required.
> > > > A second level of integration is reached if you break up RDF in CSV
> and
> > > set
> > > > of RDF mappings(aka RDF skeleton).
> > > > So, starting from exported project that contains CSV and related
> > actions
> > > to
> > > > produce RDF Skeleton, the integration expects to produce final RDF in
> > > > Marmotta's world, probably performing similar steps as GRefine RDF
> > > > Extension.
> > > > For that second level of integration, export functionality and RDF
> > > skeleton
> > > > should be explored to verify what is easily exportable.
> > > > At the moment, these are the hypothesis of integration, clearly the
> > > second
> > > > appears to be more complex, but also the first brings non-trivial
> work.
> > > > Since you have experience on other projects related to Semantic Web,
> as
> > > > Apache Stanbol, feel free to propose other hypothesis of integration,
> > > > regards,
> > > > Raffaele.
> > > >
> > > > [1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension
> > > >
> > > >
> > > >
> > > >
> > > > On 10 March 2014 21:35, Dileepa Jayakody <di...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'm Dileepa a research student from University of Moratuwa, Sri
> Lanka
> > > > with
> > > > > keen interest in the linked-data and semantic-web domains. I have
> > > worked
> > > > > with linked-data related projects such as Apache Stanbol and I'm
> > > > > experienced with related technologies like RDF, SPARQL, FOAF etc.
> I'm
> > > > very
> > > > > much interested in applying for GSoC this year with Apache
> Marmotta.
> > > > >
> > > > > I would like to open up a discussion on the OpenRefine integration
> > > > project
> > > > > idea [1]. AFAIU, the goal of this project is to import data to
> > Marmotta
> > > > > triple store (to Kiwi triple-store by default) from OpenRefine
> after
> > > the
> > > > > data has been refined and exported.
> > > > >
> > > > > I did some background reading on Marmotta data import process [2]
> > which
> > > > > explains different ways to import RDF data to back-end triple
> store.
> > > > > Currently OpenRefine exports data in several formats: csv, tsv,
> xsl,
> > > html
> > > > > tables. So I think the main task of this project will be to convert
> > > this
> > > > > exported data into RDF format and make it compatible to Marmotta
> data
> > > > > import process. I did a quick research on how to do so and there
> are
> > > > > several options to convert such data to RDF.
> > > > >
> > > > > They are,
> > > > > 1. RDF extension to OpenRefine :
> > > > https://github.com/sparkica/rdf-extension
> > > > > 2. RDF refine : http://refine.deri.ie/
> > > > > 3. D2R server http://d2rq.org/d2r-server (if OpenRefine data is
> > > imported
> > > > > from a SQL database)
> > > > >
> > > > > Apart from the data conversion process from OpenRefine to RDF, what
> > are
> > > > the
> > > > > other tasks to be done in this project?
> > > > > Appreciate your thoughts and suggestions.
> > > > >
> > > > > Thanks,
> > > > > Dileepa
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/MARMOTTA-202
> > > > > [2] http://wiki.apache.org/marmotta/ImportData
> > > > > [3]
> > > > >
> > > >
> > >
> >
> https://github.com/OpenRefine/OpenRefine/wiki/Exporters#exporting-projects
> > > > >
> > > >
> > >
> >
>

Re: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Posted by Dileepa Jayakody <di...@gmail.com>.
Hi Raffaele,

Thanks again for your suggestions.
I think it will be a great addition to this project, to make the data
imported from Openrefine interoperable with other datasets in Marmotta. I
will followup with OpenRefine community to check whether they support DCAT
vocab in there latest release. If it doesn't support DCAT do you think
implementing DCAT support in OpenRefine a task within this project scope?

On Thu, Mar 13, 2014 at 4:21 PM, Raffaele Palmieri <
raffaele.palmieri@gmail.com> wrote:

> Hi Dileepa,
> some thoughts that I also share with other Marmotta's team members
> regarding integration with Open Refine.
> For the second level of integration, that fundamentally exports towards
> Marmotta both CSV and other data to produce RDF, it would be interesting
> try to add the functionality in Open Refine to supply additional data to
> dataset, using for example DCAT Vocabulary [1].
> I don't remember if this feature is covered by GRefine RDF Extension, of
> that it's present a new release(ALPHA 0.9.0) [2]
> If dataset is supplied with DCAT metadata, Marmotta could expose it to
> facilitate its interoperability with other datasets.
> To do that, Marmotta needs to store also structured datasets, not
> necessarily instantiated in RDF triples.
>

I think Marmotta's Kiwi Tripple Store can be connected to RDBMS back ends
(MySQL, Postgres, H2), therefore above requirement of storing structured
data in Marmotta's backend is fulfilled. Please correct me if I'm wrong.

In summary I think we are looking at 2 main tasks now.
1. Ability to import data from OpenRefine process
2. Ability to configure the imported OpenRefine data inter-operable with
other datasets in Marmotta (potentially using DCAT vocab)

More ideas, suggestions are mostly welcome.


Thanks,
Dileepa


> What do you think about?
> Regards,
> Raffaele.
>
>
> [1] http://www.w3.org/TR/vocab-dcat/
> [2] https://github.com/fadmaa/grefine-rdf-extension/releases/tag/v0.9.0
>
>
> On 11 March 2014 10:29, Dileepa Jayakody <di...@gmail.com>
> wrote:
>
> > Thank you very much Raffaele for the detailed explaination.
> >
> > I will do some more background research on Marmotta data import and
> > OpenRefine and come up with questions and ideas I get.
> >
> > Also any new suggestions, directions to evolve this project idea are
> > welcome.
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Tue, Mar 11, 2014 at 3:14 AM, Raffaele Palmieri <
> > raffaele.palmieri@gmail.com> wrote:
> >
> > > Hi Dileepa,
> > > pleased to meet you and know your interest for contributing to
> Marmotta.
> > > As discussed in Marmotta's mailing list, this integration could be
> > reached
> > > at various levels.
> > > A first level is reached refining your messy data with Refine tools,
> > using
> > > RDF extension, that already offers a graphical UI to model RDF data
> > > producing an RDF skeleton and then import new data in Marmotta,
> compliant
> > > to the created skeleton .
> > > This integration mode has been implemented in the past using [1] but
> > needs
> > > to be updated because:
> > > 1)Google Refine became Open Refine
> > > 2)LMF became Marmotta in its linked-data core functionalities
> > > This update also requires work about project configuration, because
> Open
> > > Refine has a different configuration than Apache Marmotta.
> > > Whatever kind of integration could be achieved, I think that work about
> > > project configuration is required.
> > > A second level of integration is reached if you break up RDF in CSV and
> > set
> > > of RDF mappings(aka RDF skeleton).
> > > So, starting from exported project that contains CSV and related
> actions
> > to
> > > produce RDF Skeleton, the integration expects to produce final RDF in
> > > Marmotta's world, probably performing similar steps as GRefine RDF
> > > Extension.
> > > For that second level of integration, export functionality and RDF
> > skeleton
> > > should be explored to verify what is easily exportable.
> > > At the moment, these are the hypothesis of integration, clearly the
> > second
> > > appears to be more complex, but also the first brings non-trivial work.
> > > Since you have experience on other projects related to Semantic Web, as
> > > Apache Stanbol, feel free to propose other hypothesis of integration,
> > > regards,
> > > Raffaele.
> > >
> > > [1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension
> > >
> > >
> > >
> > >
> > > On 10 March 2014 21:35, Dileepa Jayakody <di...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'm Dileepa a research student from University of Moratuwa, Sri Lanka
> > > with
> > > > keen interest in the linked-data and semantic-web domains. I have
> > worked
> > > > with linked-data related projects such as Apache Stanbol and I'm
> > > > experienced with related technologies like RDF, SPARQL, FOAF etc. I'm
> > > very
> > > > much interested in applying for GSoC this year with Apache Marmotta.
> > > >
> > > > I would like to open up a discussion on the OpenRefine integration
> > > project
> > > > idea [1]. AFAIU, the goal of this project is to import data to
> Marmotta
> > > > triple store (to Kiwi triple-store by default) from OpenRefine after
> > the
> > > > data has been refined and exported.
> > > >
> > > > I did some background reading on Marmotta data import process [2]
> which
> > > > explains different ways to import RDF data to back-end triple store.
> > > > Currently OpenRefine exports data in several formats: csv, tsv, xsl,
> > html
> > > > tables. So I think the main task of this project will be to convert
> > this
> > > > exported data into RDF format and make it compatible to Marmotta data
> > > > import process. I did a quick research on how to do so and there are
> > > > several options to convert such data to RDF.
> > > >
> > > > They are,
> > > > 1. RDF extension to OpenRefine :
> > > https://github.com/sparkica/rdf-extension
> > > > 2. RDF refine : http://refine.deri.ie/
> > > > 3. D2R server http://d2rq.org/d2r-server (if OpenRefine data is
> > imported
> > > > from a SQL database)
> > > >
> > > > Apart from the data conversion process from OpenRefine to RDF, what
> are
> > > the
> > > > other tasks to be done in this project?
> > > > Appreciate your thoughts and suggestions.
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > > [1] https://issues.apache.org/jira/browse/MARMOTTA-202
> > > > [2] http://wiki.apache.org/marmotta/ImportData
> > > > [3]
> > > >
> > >
> >
> https://github.com/OpenRefine/OpenRefine/wiki/Exporters#exporting-projects
> > > >
> > >
> >
>

Re: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Posted by Raffaele Palmieri <ra...@gmail.com>.
Hi Dileepa,
some thoughts that I also share with other Marmotta's team members
regarding integration with Open Refine.
For the second level of integration, that fundamentally exports towards
Marmotta both CSV and other data to produce RDF, it would be interesting
try to add the functionality in Open Refine to supply additional data to
dataset, using for example DCAT Vocabulary [1].
I don't remember if this feature is covered by GRefine RDF Extension, of
that it's present a new release(ALPHA 0.9.0) [2]
If dataset is supplied with DCAT metadata, Marmotta could expose it to
facilitate its interoperability with other datasets.
To do that, Marmotta needs to store also structured datasets, not
necessarily instantiated in RDF triples.
What do you think about?
Regards,
Raffaele.


[1] http://www.w3.org/TR/vocab-dcat/
[2] https://github.com/fadmaa/grefine-rdf-extension/releases/tag/v0.9.0


On 11 March 2014 10:29, Dileepa Jayakody <di...@gmail.com> wrote:

> Thank you very much Raffaele for the detailed explaination.
>
> I will do some more background research on Marmotta data import and
> OpenRefine and come up with questions and ideas I get.
>
> Also any new suggestions, directions to evolve this project idea are
> welcome.
>
> Thanks,
> Dileepa
>
>
> On Tue, Mar 11, 2014 at 3:14 AM, Raffaele Palmieri <
> raffaele.palmieri@gmail.com> wrote:
>
> > Hi Dileepa,
> > pleased to meet you and know your interest for contributing to Marmotta.
> > As discussed in Marmotta's mailing list, this integration could be
> reached
> > at various levels.
> > A first level is reached refining your messy data with Refine tools,
> using
> > RDF extension, that already offers a graphical UI to model RDF data
> > producing an RDF skeleton and then import new data in Marmotta, compliant
> > to the created skeleton .
> > This integration mode has been implemented in the past using [1] but
> needs
> > to be updated because:
> > 1)Google Refine became Open Refine
> > 2)LMF became Marmotta in its linked-data core functionalities
> > This update also requires work about project configuration, because Open
> > Refine has a different configuration than Apache Marmotta.
> > Whatever kind of integration could be achieved, I think that work about
> > project configuration is required.
> > A second level of integration is reached if you break up RDF in CSV and
> set
> > of RDF mappings(aka RDF skeleton).
> > So, starting from exported project that contains CSV and related actions
> to
> > produce RDF Skeleton, the integration expects to produce final RDF in
> > Marmotta's world, probably performing similar steps as GRefine RDF
> > Extension.
> > For that second level of integration, export functionality and RDF
> skeleton
> > should be explored to verify what is easily exportable.
> > At the moment, these are the hypothesis of integration, clearly the
> second
> > appears to be more complex, but also the first brings non-trivial work.
> > Since you have experience on other projects related to Semantic Web, as
> > Apache Stanbol, feel free to propose other hypothesis of integration,
> > regards,
> > Raffaele.
> >
> > [1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension
> >
> >
> >
> >
> > On 10 March 2014 21:35, Dileepa Jayakody <di...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I'm Dileepa a research student from University of Moratuwa, Sri Lanka
> > with
> > > keen interest in the linked-data and semantic-web domains. I have
> worked
> > > with linked-data related projects such as Apache Stanbol and I'm
> > > experienced with related technologies like RDF, SPARQL, FOAF etc. I'm
> > very
> > > much interested in applying for GSoC this year with Apache Marmotta.
> > >
> > > I would like to open up a discussion on the OpenRefine integration
> > project
> > > idea [1]. AFAIU, the goal of this project is to import data to Marmotta
> > > triple store (to Kiwi triple-store by default) from OpenRefine after
> the
> > > data has been refined and exported.
> > >
> > > I did some background reading on Marmotta data import process [2] which
> > > explains different ways to import RDF data to back-end triple store.
> > > Currently OpenRefine exports data in several formats: csv, tsv, xsl,
> html
> > > tables. So I think the main task of this project will be to convert
> this
> > > exported data into RDF format and make it compatible to Marmotta data
> > > import process. I did a quick research on how to do so and there are
> > > several options to convert such data to RDF.
> > >
> > > They are,
> > > 1. RDF extension to OpenRefine :
> > https://github.com/sparkica/rdf-extension
> > > 2. RDF refine : http://refine.deri.ie/
> > > 3. D2R server http://d2rq.org/d2r-server (if OpenRefine data is
> imported
> > > from a SQL database)
> > >
> > > Apart from the data conversion process from OpenRefine to RDF, what are
> > the
> > > other tasks to be done in this project?
> > > Appreciate your thoughts and suggestions.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > > [1] https://issues.apache.org/jira/browse/MARMOTTA-202
> > > [2] http://wiki.apache.org/marmotta/ImportData
> > > [3]
> > >
> >
> https://github.com/OpenRefine/OpenRefine/wiki/Exporters#exporting-projects
> > >
> >
>

Re: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Posted by Dileepa Jayakody <di...@gmail.com>.
Thank you very much Raffaele for the detailed explaination.

I will do some more background research on Marmotta data import and
OpenRefine and come up with questions and ideas I get.

Also any new suggestions, directions to evolve this project idea are
welcome.

Thanks,
Dileepa


On Tue, Mar 11, 2014 at 3:14 AM, Raffaele Palmieri <
raffaele.palmieri@gmail.com> wrote:

> Hi Dileepa,
> pleased to meet you and know your interest for contributing to Marmotta.
> As discussed in Marmotta's mailing list, this integration could be reached
> at various levels.
> A first level is reached refining your messy data with Refine tools, using
> RDF extension, that already offers a graphical UI to model RDF data
> producing an RDF skeleton and then import new data in Marmotta, compliant
> to the created skeleton .
> This integration mode has been implemented in the past using [1] but needs
> to be updated because:
> 1)Google Refine became Open Refine
> 2)LMF became Marmotta in its linked-data core functionalities
> This update also requires work about project configuration, because Open
> Refine has a different configuration than Apache Marmotta.
> Whatever kind of integration could be achieved, I think that work about
> project configuration is required.
> A second level of integration is reached if you break up RDF in CSV and set
> of RDF mappings(aka RDF skeleton).
> So, starting from exported project that contains CSV and related actions to
> produce RDF Skeleton, the integration expects to produce final RDF in
> Marmotta's world, probably performing similar steps as GRefine RDF
> Extension.
> For that second level of integration, export functionality and RDF skeleton
> should be explored to verify what is easily exportable.
> At the moment, these are the hypothesis of integration, clearly the second
> appears to be more complex, but also the first brings non-trivial work.
> Since you have experience on other projects related to Semantic Web, as
> Apache Stanbol, feel free to propose other hypothesis of integration,
> regards,
> Raffaele.
>
> [1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension
>
>
>
>
> On 10 March 2014 21:35, Dileepa Jayakody <di...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I'm Dileepa a research student from University of Moratuwa, Sri Lanka
> with
> > keen interest in the linked-data and semantic-web domains. I have worked
> > with linked-data related projects such as Apache Stanbol and I'm
> > experienced with related technologies like RDF, SPARQL, FOAF etc. I'm
> very
> > much interested in applying for GSoC this year with Apache Marmotta.
> >
> > I would like to open up a discussion on the OpenRefine integration
> project
> > idea [1]. AFAIU, the goal of this project is to import data to Marmotta
> > triple store (to Kiwi triple-store by default) from OpenRefine after the
> > data has been refined and exported.
> >
> > I did some background reading on Marmotta data import process [2] which
> > explains different ways to import RDF data to back-end triple store.
> > Currently OpenRefine exports data in several formats: csv, tsv, xsl, html
> > tables. So I think the main task of this project will be to convert this
> > exported data into RDF format and make it compatible to Marmotta data
> > import process. I did a quick research on how to do so and there are
> > several options to convert such data to RDF.
> >
> > They are,
> > 1. RDF extension to OpenRefine :
> https://github.com/sparkica/rdf-extension
> > 2. RDF refine : http://refine.deri.ie/
> > 3. D2R server http://d2rq.org/d2r-server (if OpenRefine data is imported
> > from a SQL database)
> >
> > Apart from the data conversion process from OpenRefine to RDF, what are
> the
> > other tasks to be done in this project?
> > Appreciate your thoughts and suggestions.
> >
> > Thanks,
> > Dileepa
> >
> > [1] https://issues.apache.org/jira/browse/MARMOTTA-202
> > [2] http://wiki.apache.org/marmotta/ImportData
> > [3]
> >
> https://github.com/OpenRefine/OpenRefine/wiki/Exporters#exporting-projects
> >
>

Re: [GSoC 2014] MARMOTTA-202 : OpenRefine import engine

Posted by Raffaele Palmieri <ra...@gmail.com>.
Hi Dileepa,
pleased to meet you and know your interest for contributing to Marmotta.
As discussed in Marmotta's mailing list, this integration could be reached
at various levels.
A first level is reached refining your messy data with Refine tools, using
RDF extension, that already offers a graphical UI to model RDF data
producing an RDF skeleton and then import new data in Marmotta, compliant
to the created skeleton .
This integration mode has been implemented in the past using [1] but needs
to be updated because:
1)Google Refine became Open Refine
2)LMF became Marmotta in its linked-data core functionalities
This update also requires work about project configuration, because Open
Refine has a different configuration than Apache Marmotta.
Whatever kind of integration could be achieved, I think that work about
project configuration is required.
A second level of integration is reached if you break up RDF in CSV and set
of RDF mappings(aka RDF skeleton).
So, starting from exported project that contains CSV and related actions to
produce RDF Skeleton, the integration expects to produce final RDF in
Marmotta's world, probably performing similar steps as GRefine RDF
Extension.
For that second level of integration, export functionality and RDF skeleton
should be explored to verify what is easily exportable.
At the moment, these are the hypothesis of integration, clearly the second
appears to be more complex, but also the first brings non-trivial work.
Since you have experience on other projects related to Semantic Web, as
Apache Stanbol, feel free to propose other hypothesis of integration,
regards,
Raffaele.

[1]https://code.google.com/p/lmf/wiki/GoogleRefineExtension




On 10 March 2014 21:35, Dileepa Jayakody <di...@gmail.com> wrote:

> Hi All,
>
> I'm Dileepa a research student from University of Moratuwa, Sri Lanka with
> keen interest in the linked-data and semantic-web domains. I have worked
> with linked-data related projects such as Apache Stanbol and I'm
> experienced with related technologies like RDF, SPARQL, FOAF etc. I'm very
> much interested in applying for GSoC this year with Apache Marmotta.
>
> I would like to open up a discussion on the OpenRefine integration project
> idea [1]. AFAIU, the goal of this project is to import data to Marmotta
> triple store (to Kiwi triple-store by default) from OpenRefine after the
> data has been refined and exported.
>
> I did some background reading on Marmotta data import process [2] which
> explains different ways to import RDF data to back-end triple store.
> Currently OpenRefine exports data in several formats: csv, tsv, xsl, html
> tables. So I think the main task of this project will be to convert this
> exported data into RDF format and make it compatible to Marmotta data
> import process. I did a quick research on how to do so and there are
> several options to convert such data to RDF.
>
> They are,
> 1. RDF extension to OpenRefine : https://github.com/sparkica/rdf-extension
> 2. RDF refine : http://refine.deri.ie/
> 3. D2R server http://d2rq.org/d2r-server (if OpenRefine data is imported
> from a SQL database)
>
> Apart from the data conversion process from OpenRefine to RDF, what are the
> other tasks to be done in this project?
> Appreciate your thoughts and suggestions.
>
> Thanks,
> Dileepa
>
> [1] https://issues.apache.org/jira/browse/MARMOTTA-202
> [2] http://wiki.apache.org/marmotta/ImportData
> [3]
> https://github.com/OpenRefine/OpenRefine/wiki/Exporters#exporting-projects
>