You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@marmotta.apache.org by Sergio Fernández <wi...@apache.org> on 2014/05/08 18:38:57 UTC

Re: Possible contribution

Hi Fabian,

first of all thanks for the interest in the project.

On 02/05/14 08:54, Fabian Cretton wrote:
> So I did prepare a document that can be found here:
> https://dl.dropboxusercontent.com/u/852552/Marmotta_OverLOD%20Surfer%20presentation_0.2.pdf
>
> I hope it will help you to see if, as you say, our project's features
> fit in your roadmap.

Well, I'm not saying it does not fit, but also take into account that
does not require to: you can build your project using Marmotta as
platform with your own agenda. But let's see

For the document, I'd like to quote a sentence:

"Before initiating this discussion with the Marmotta team, our
intention was to develop our own tool based on node.js/Sesame-OWLIM
for the back-end, and HTML 5 for the front-end."

Such plan would be still possible: you just replace OWLIM by Marmotta,
and in case you would like to reuse the same runtime, node.js by JavaEE.
The rest pieces can remain the same.

Actually looking to the idea, technologically talking it does not look
so different to what the Fusepool P3 FP7 project tries to do. So,
although still in a very early stage, you could take that platform [1]
as example of usage/extension of Marmotta.

Going deeper into the components of the project:

* Many components are UI one. Since it's one of the weakness of the
project, I'd say that the results would be relevant for the project for
sure.

* For me the "OverLOD Referencer" has a big potential of reusing the
infrastructure provided by LDClient [2] and LDCache [3].

* For sure you may benefit of some of the other infrastructure (LDP,
SPARQL and so on) which does not make sense you implement from scratch.

* The analytics part is something interesting to see.

* And for sure, all Linked Data apps built of top will be always well
received.

So, from my personal point of view, I'd welcome OverLOD to be built on
top of Marmotta and start to join the community with relevant
contributions. I can't promise so much effort besides my support when
possible, but we'll manage somehow.

Cheers,

[1] https://github.com/fusepoolP3/platform
[2] http://marmotta.apache.org/ldclient
[3] http://marmotta.apache.org/ldcache

--
Sergio Fernández
Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria
T: +43 662 2288 318 | M: +43 660 2747 925
sergio.fernandez@salzburgresearch.at
http://www.salzburgresearch.at

Rép. : Re: Possible contribution

Posted by Fabian Cretton <Fa...@hevs.ch>.

Thank you for fixing the pom problems for version 3.3.0

If we developp a module without touching the existing Marmotta project,
would you recommand me to work with the last stable version (3.2.1), or
is it better to work with the 3.3.0 anyway ?

Eventhough I can build the project with maven (outside of Eclipse and
in eclispe with m2e), I still have some strange errors in eclipse.
For instance, the "marmotta-commons" module, file "ModelCommons" does
have this include:
import javolution.util.function.Predicate;
and it can't be resolved, even after the Maven Install you recommanded
me.

In the pom.xml, i do find:
        <!-- TODO: for now we use the source code in ext/ because it
contains some bug fixes -->
        <!--
        <dependency>
            <groupId>org.javolution</groupId>
            <artifactId>javolution-core-java</artifactId>
            <version>6.0.1-SNAPSHOT</version>
        </dependency>
        -->

Is this the cause of the problem ?
if yes, am I able to fixe it ?

Thank you
Fabian

>>> Sergio Fernández<wi...@apache.org> 27.08.2014 16:31 >>>
Hi Fabian,

On 27/08/14 14:49, Fabian Cretton wrote:
> My first goal was: to build the all project locally, run my locally
> built Marmotta, and then start adding components.
> But my first concern now that I am digging deeper, is that Marmotta
is
> a pretty big project (about 80 projects), and so you might recommand
me
> not to import the main "pom.xml" in my eclipse environment, but
start
> smaller ?

Then start from the platform modules.

> If there is already a documentation about how to procede, thank you
to
> point me there, I didn't find any by myself.

Well, the overall build process is entirely manage by Maven, check 
http://marmotta.apache.org/installation#source

> Nevertheless, I do have problems and errors in Eclipse, and hope you
> can help me about that.

Eclipse should be able to manage such king of size of modules with
Maven.

> The first problems I do have, are with many "Plugin execution not
> covered by lifecycle configuration" errors.

Some plugin lifecycles might not be supported inside Eclipse. Just 
ignore it, you should not need them.

> Than I do have 6-7 : "Project build error: Non-resolvable parent
POM:
> Could not find artifact
> org.apache.marmotta:marmotta-parent:pom:3.2.1-SNAPSHOT and
> 'parent.relativePath' points at wrong local POM pom.xml
> /marmotta-backend-sparql line 23 Maven pom Loading Problem"
> and here I am pretty confused: it seems that some POM files are not
> up-to-date in this 3.3.0 current version, as they do still point to
a
> "3.2.1" parent POM file, but the parent is already in its "3.3.0"
> version ?

Sorry for the error. Those modules are out of the default profile, so 
the release plugin did not update the versions accordingly. It's
already 
fixed in the develop branch; please update your fork.

> Then, apart from those Maven errors, I do have a few java errors
with
> many "imports" or "types" which can't be resolved, and this seems
very
> strange to me. But maybe solving the main Maven problems here above
> would correct that ?

All dependencies are available from Maven central. Try to make a "maven

install" from the root.

> A first goal for me would be to update the Marmotta's main menu so
that
> under "Others", next to "Linked Data Caching", I could have a
"External
> Data Sources" menu and then work an that new module as discussed
earlier
> with you.

Then you need to create a custom module and add it to your custom
webapp 
launcher. All the process is supported by Maven artifacts, as described
at:

http://wiki.apache.org/marmotta/Customizing#Modules

Hope that helps.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Rép. : Marmotta 3.2.1 - Having problem to trigger the LDCache

Posted by Fabian Cretton <Fa...@hevs.ch>.

One more information.
Today I tried from scratch, reinstalling Marmotta, creating the end-point from the load sample.
And then querying "http://dbpedia.org/resource/Europe" with LDPath.
 
LDPath result was: resource http://dbpedia.org/resource/Europe does not exist
 
But I had one error message in the console:

07:38:52.962 ERROR - the HTTP request failed (status: HTTP/1.1 406 Unacceptable)
07:38:52.963 ERROR - HTTP client error while trying to retrieve resource http://dbpedia.org/resource/Europe: the HTTP request failed (status: HTTP/1.1 406 Unacceptable)
 
After that I tried again, but this message didn't come up again.
 
I don't think it comes from some firewall settings we have internally, but I'll check about that.
 
Thank you for any help
Fabian
 
 

>>> "Fabian Cretton" <Fa...@hevs.ch> 08.09.2014 15:50 >>>
Hi,
 
Today I tried without success to setup the LDCache from the Marmotta interface.
 
A first thing is that it is not clearly stated, in my opinion, when the LDCache is triggered.
 
The only further information I found was here:
http://markmail.org/message/m3ft65du5kbuzy5z#query:+page:1+mid:j347sjyegwshvhc7+state:results
and thus I understood that the SPARQL interface don't interact with LDCache, but the LDPath functionality does.
Is there anything else then running a LPath that would trigger the cache ?
 
First of all, I see this error when launching Marmotta 3.2.1, could it be related to my problems ?
INFO: Server startup in 11332 ms
15:25:45.495 WARN  - pattern (C:\marmotta_inst\marmotta-home/resources|http://wh55650:8080/marmotta/resource/|urn:).* is not a valid regular expression; disabling reader/writer filesystem (message was Illegal/unsupported escape sequence near index 4
(C:\marmotta_inst\marmotta-home/resources|http://wh55650:8080/marmotta/resource/|urn:).*    ^)
 
My "ldcache.enabled" parameter is checked.
 
I guess my problem is with setting up the end-point correctly.
 
I first tried the provided sample "load sample -> dbpedia", which gives in the interface:
kind: ld cache
Prefix: http://dbpedia\.org.*
Endpoint: http://dbpedia.org/sparql
 
But a few things are already not clear for me:
* kind: ld cache
When referring to a sparql end-point, shouldn't the kind be "sparkl" ? (I did try changing this, but with no success)
By the way, what is the meaning of the "cache" mode, I can't understand from reading the explanation. Is it a server that servers "linked data" ? if so, what is the difference with "linked data".
* prefix: 
What are the "rules" to define those prefixes ? for instance I see in the example that "." are escaped ?
Defining a prefix as "http://dbpedia\.org ( http://dbpedia/.org ).*" will thus mean that a URI as http://dbpedia.org/resource/Europe is bound to that end-point, right ?
*endpoint:
The explanation says that for a SPARQL end-point, some parameters have to be specified:
"http://sparql.sindice.com/sparql?default-graph-uri=&query={query}&format={contenttype}"
Is this mandatory ? (as the provided sample don't specify them but just specifies "http://dbpedia.org/sparql"), are there default values ?
*mimetype:
Not filled by the sample, are their default values depending on the "mode of operation" ?
 
If it is possible to get a few full examples of those settings, for the different operation modes, it would be very helpful I guess.
 
Than, after trying different configurations, I tried from the LDPath interface to add the URI "http://dbpedia.org/resource/Europe" and the LDPath "rdfs:label :: xsd:string".
When hitting "Test", a message appears next to the button: resource http://dbpedia.org/resource/Europe does not exist.
 
It seems I don't see any corresponding message neither in the console nor in the marmotta-main.log.
 
I tried so many different configurations, on different SPARQL end-point, also trying a "linked data" as DBPedia does provide the content negociation mecanism,  but without success.
 
Finally I tried to run a:

curl http://wh55650:8080/marmotta/cache/live?uri=http://dbpedia.org/resource/Europe
and it did return the rdf/xml file. But what I discovered then is that even without configuring any 'end-point' in Marmotta, this command was succeeding every time, so I think this command is not a sign that my end-point is well configured ? Except maybe if I was dealing with data that are not provided as derefencable "linked data" ?
I also tried on my local version 3.3.0, but with the same results. And also with the same WARNING when starting Marmotta.
Thank you for any help.
Fabian

Re: Marmotta 3.2.1 - Having problem to trigger the LDCache

Posted by Jakob Frank <ja...@apache.org>.

Hi Fabian,

On 2014-09-08 15:50, Fabian Cretton wrote:
> A first thing is that it is not clearly stated, in my opinion, when the
> LDCache is triggered.
>  
> The only further information I found was here:
> http://markmail.org/message/m3ft65du5kbuzy5z#query:+page:1+mid:j347sjyegwshvhc7+state:results
> and thus I understood that the SPARQL interface don't interact with
> LDCache, but the LDPath functionality does.
> Is there anything else then running a LPath that would trigger the cache ?
LDCache is triggered everytime a triple-pattern like "<uri> * *" is
requested from the triple store. Then LDCache tries to load <uri> from
remote.

LDPath is using such queries quite often, SPARQL only in few cases
because many requests are directly issued to the database for performance.
Even viewing the resource in the resource browser
(http://wh55650:8080/marmotta/resource/*) should trigger LDCaching.


> First of all, I see this error when launching Marmotta 3.2.1, could it
> be related to my problems ?
> INFO: Server startup in 11332 ms
> 15:25:45.495 WARN  - pattern
> (C:\marmotta_inst\marmotta-home/resources|http://wh55650:8080/marmotta/resource/|urn:).*
> is not a valid regular expression; disabling reader/writer filesystem
> (message was Illegal/unsupported escape sequence near index 4
> (C:\marmotta_inst\marmotta-home/resources|http://wh55650:8080/marmotta/resource/|urn:).*   
> ^)
As the log says, it's only a warning and should not affect you unless
you are storing (binary) content in Marmotta.
Background and workaround for this issue was discussed on
users@marmotta.a.o [1]

> My "ldcache.enabled" parameter is checked.
>  
> I guess my problem is with setting up the end-point correctly.
>  
> I first tried the provided sample "load sample -> dbpedia", which gives
> in the interface:
> kind: ld cache
> Prefix: http://dbpedia\.org.*
> Endpoint: http://dbpedia.org/sparql

Looks like there is an error in the sample: Provider (kind) "ld cache"
does not work with an SPARQL endpoint - for that you need to select
"sparql" as kind.

But as you noticed: since DBPedia provides valid Linked Data, no extra
configuation is required - just request the resource and it will be fetched:
<http://wh55650:8080/marmotta/resource/?uri=http://dbpedia.org/resource/Europe>


> But a few things are already not clear for me:
> * kind: ld cache
> When referring to a sparql end-point, shouldn't the kind be "sparkl" ?
> (I did try changing this, but with no success)
> By the way, what is the meaning of the "cache" mode, I can't understand
> from reading the explanation. Is it a server that servers "linked data"
> ? if so, what is the difference with "linked data".
> * prefix:
> What are the "rules" to define those prefixes ? for instance I see in
> the example that "." are escaped ?
> Defining a prefix as "http://dbpedia\.org <http://dbpedia/.org>.*" will
> thus mean that a URI as http://dbpedia.org/resource/Europe is bound to
> that end-point, right ?
> *endpoint:
> The explanation says that for a SPARQL end-point, some parameters have
> to be specified:
> "http://sparql.sindice.com/sparql?default-graph-uri=&query={query}&format={contenttype}"
> Is this mandatory ? (as the provided sample don't specify them but just
> specifies "http://dbpedia.org/sparql"), are there default values ?
> *mimetype:
> Not filled by the sample, are their default values depending on the
> "mode of operation" ?
>  
> If it is possible to get a few full examples of those settings, for the
> different operation modes, it would be very helpful I guess.
Sergio just now updated the documentation for the LDCache module [2],
please have a look and check if anything is missing or unclear.


Best,
Jakob

[1]
http://mail-archives.apache.org/mod_mbox/marmotta-users/201409.mbox/%3C730127A8D040CB4A976BE07F089997068BCBB097@EXC-MBX03.tsn.tno.nl%3E
[2] http://marmotta.apache.org/platform/ldcache-module.html

Marmotta 3.2.1 - Having problem to trigger the LDCache

Posted by Fabian Cretton <Fa...@hevs.ch>.

Hi,
 
Today I tried without success to setup the LDCache from the Marmotta interface.
 
A first thing is that it is not clearly stated, in my opinion, when the LDCache is triggered.
 
The only further information I found was here:
http://markmail.org/message/m3ft65du5kbuzy5z#query:+page:1+mid:j347sjyegwshvhc7+state:results
and thus I understood that the SPARQL interface don't interact with LDCache, but the LDPath functionality does.
Is there anything else then running a LPath that would trigger the cache ?
 
First of all, I see this error when launching Marmotta 3.2.1, could it be related to my problems ?
INFO: Server startup in 11332 ms
15:25:45.495 WARN  - pattern (C:\marmotta_inst\marmotta-home/resources|http://wh55650:8080/marmotta/resource/|urn:).* is not a valid regular expression; disabling reader/writer filesystem (message was Illegal/unsupported escape sequence near index 4
(C:\marmotta_inst\marmotta-home/resources|http://wh55650:8080/marmotta/resource/|urn:).*    ^)
 
My "ldcache.enabled" parameter is checked.
 
I guess my problem is with setting up the end-point correctly.
 
I first tried the provided sample "load sample -> dbpedia", which gives in the interface:
kind: ld cache
Prefix: http://dbpedia\.org.*
Endpoint: http://dbpedia.org/sparql
 
But a few things are already not clear for me:
* kind: ld cache
When referring to a sparql end-point, shouldn't the kind be "sparkl" ? (I did try changing this, but with no success)
By the way, what is the meaning of the "cache" mode, I can't understand from reading the explanation. Is it a server that servers "linked data" ? if so, what is the difference with "linked data".
* prefix: 
What are the "rules" to define those prefixes ? for instance I see in the example that "." are escaped ?
Defining a prefix as "http://dbpedia\.org ( http://dbpedia/.org ).*" will thus mean that a URI as http://dbpedia.org/resource/Europe is bound to that end-point, right ?
*endpoint:
The explanation says that for a SPARQL end-point, some parameters have to be specified:
"http://sparql.sindice.com/sparql?default-graph-uri=&query={query}&format={contenttype}"
Is this mandatory ? (as the provided sample don't specify them but just specifies "http://dbpedia.org/sparql"), are there default values ?
*mimetype:
Not filled by the sample, are their default values depending on the "mode of operation" ?
 
If it is possible to get a few full examples of those settings, for the different operation modes, it would be very helpful I guess.
 
Than, after trying different configurations, I tried from the LDPath interface to add the URI "http://dbpedia.org/resource/Europe" and the LDPath "rdfs:label :: xsd:string".
When hitting "Test", a message appears next to the button: resource http://dbpedia.org/resource/Europe does not exist.
 
It seems I don't see any corresponding message neither in the console nor in the marmotta-main.log.
 
I tried so many different configurations, on different SPARQL end-point, also trying a "linked data" as DBPedia does provide the content negociation mecanism,  but without success.
 
Finally I tried to run a:

curl http://wh55650:8080/marmotta/cache/live?uri=http://dbpedia.org/resource/Europe
and it did return the rdf/xml file. But what I discovered then is that even without configuring any 'end-point' in Marmotta, this command was succeeding every time, so I think this command is not a sign that my end-point is well configured ? Except maybe if I was dealing with data that are not provided as derefencable "linked data" ?
I also tried on my local version 3.3.0, but with the same results. And also with the same WARNING when starting Marmotta.
Thank you for any help.
Fabian

Rép. : Re: New module similar to LDCache

Posted by Fabian Cretton <Fa...@hevs.ch>.

Thank you Sergio,
 
I know you are busy, but I hope we can clarify this these days, so that
we can come up with a clear idea of what we will developp, and how.
 
Here is what I understood and what I want to talk about: "All that
infrastructure is provided by the current LDCache module", and, if I get
it right, we would just have to implement new LDClients for new sources,

 
So there is something I don't understand here.
 
I'll try to clarify in more details the differences between
LDCLient\LDCache, and the functionality we want in overLOD, according to
my shallow understanding so far. I haven't played fully with
LDClient\LDCache so far, that is what I will do first thing next
monday.
 
* LDclient(s) are different clients for different data structures (aka
RDFizers), for instance one LDClient for RDFa, one for XML, etc. and
then also for specific data sources and their structures: one for
youtube data, one for facebook. The LDClient knows how to access the
data and how to import them in Marmotta, requiring some transformation
if needed (for instance translate XML to RDF, etc.). This is where we
would, in our project, implement a LDClient for microdata based on
schema.org.
But the LDClient is not specific to one source (maybe except for
facebook/youtube clients and the like). 
In our case, we want to define a LDClient for a specific 'Kind' of
source, let say some rdf files, but then we want to define pointers to
very specific RDF files, for instance 20 of them. To me that would be
done in the higher level, maybe LDCache.
Or do you mean that we create a generic LDClient that is able to import
one kind of data, and then we instanciate 20 times LDClient, one for
each specific source ?
I give here examples for situations we want to handle (and certainly
mainy coming apps based on linked data):
- an way is defined to publish a catalog on the web (ontologies)
-> here, if 30 providers publish according to that way, we want to have
a reference to the 20 of them, and retrieve their data locally (cache)
- or: we could want to import different sets of data from an end-point:
the french cities from DBPedia, the elevations from geoNames, but not
only data where the resources are subject, they might be object too.
 
* Then the LDCache is an automatic and transparent functionality which
will, during queries on the triple store, see that information about a
resource can't be found in the triple store, but should be found on the
web in order to answer the query.  That's what I understand from my
local page here: http://localhost:8080/marmotta/cache/admin/about.html.
and from the LDCache descriptions I found here:
http://marmotta.apache.org/ldcache/index.html.
In order for the LDCache to work, the administrator has to define
LD-Cache endpoints. The LD-Cache will rely on the ressources "prefix" in
order to know which resource to find on which endpoint.
Another information that is different from what we want: "SPARQL
(access to a resource via a SPARQL endpoint): retrieves the triples of a
resource from a defined SPARQL endpoint by issuing a query for all
triples with the resource as subject"
-> we are not solely interested in triples where the resource is
subject
Finally, the LD-Cache will save the triples concerning the resource in
the LD-Cache context. I guess all the triples retrieved by
LD-Cache\LD-Client are saved in a same and single context ?
If so, I do have a question here: how does LD cache identify which
triples in one single cache context needs to be updated ? -> maybe it
takes all the triples which have a specific resource as subject ? or the
timeout is specific to an endpoint ?
 
And so, the functionalities I did describe in my former email, which I
will further developp here, seems to me not implemented yet, and of
great importance for further use of linked data outside of "research"
projects:
Firstly, define precisely data we want to cache (RDF  or any data
handled by a LDClient) to be cached in the server.
It will not be the user SPARQL queries that influence the cached data
(which seems the case with LDCache), but an administrator can very
specifically choose which data to cache (and have control them) -> then
only those validated data are available for the end-user apps based on
that instance of Marmotta.
 
Then we want to have for instance the all content of a file, or the
result of a SPARQL Construct
-> and not just triples where a resource is subject
and, to manage the update of those triples, I think we need one context
per source (otherwise how to know which triples where removed from the
source, etc..)
This seems to me very pretty different from the current management of
LD-Cache endpoints.
 
Another exemple: the LD-Cache endpoints allows to say that resources
with the prefix http://dbpedia.org/resource/
should be found on a specific server. When such a resource is met (in a
SPARQL query or maybe in triples uploaded in the store ?), the LDCache
will query the DBPedia end-point and retrieve all triples where this
resource is subject.
In overLOD: we might want to configure the referencer, so that a SPARQL
construct is run on DBPedia to retrieve all the cities from a specific
country, only the city and its label in french and english (not all
labels!), its population, but also an information where the city is
object of a triple (which might be needed when no inverse properties are
defined)
->those triples will be savec in a specific context
-> they need to be validated
-> we need to know when those information have changed on the server,
and update the cache.
So we can't just set a 'timeout', as we can cache some files that are
never updated (and so no need to reload the data), but also some other
files which are regularly updated.
This 'update' mechanism is a functionality I would hope to talk about
with the Marmotta team, it seems to me more efficient then the current
LD-Cache and its timeout, but I am not sure yet.
 
Aren't there some big difference here, eventhoug the background is
similar ? 
Are those functionalities part of the LDClient (as you suggested) ?
Seems to me some could be implemented in LDClient, but some others in
LDCache (or a new LDCache created for overLOD, which I called last time
'External Data Sources').

Thank you to help me move forward
Fabian

>>> Sergio Fernández<wi...@apache.org> 05.09.2014 09:36 >>>
Hi Fabian,

On 02/09/14 14:09, Fabian Cretton wrote:
> So that would be the goal of the "External data sources" module,
which
> was originaly called "overLOD Referencer" in the document [1]:
> - define precisely RDF data to be cached in the server: that could be
a
> RDF File, a SPARQL CONSTRUCT on a end-point, etc.
> - find a way to validate the content of that data -> here we might
not
> want to reason in an open world assumption, but if a property is
defined
> with a certain range, we would want to check that the objects in the
> file ARE effectively instances from that defined class (for instance
> using SPARQL queries to validate the content, instead of a
reasoner).
> - find a way to manage automatically the updates: it could be a
'pull'
> from Marmotta depending on some VoID data provided by the source, or
the
> source could put in place a "ping" to marmotta, RSS-like features,
like
> it was done by Ping-The-Semantic-Web or Sindice

All that infrastructure is provided by the current LDCache module. If I

got it right, where you actually need to plug-in into this 
infrastructure is at the LDClient level:

   * you can define new LDClient Data Providers for your specific
sources
   * which can wrap all the validation logic you need
   * then LDCache will transparently make use of your LDClient
provider
   * to avoid conflicts with the default providers, they can be
disabled
If that setup fit with your ideas and needs, I'd recommend you to take
a 
look to the current providers:

   https://github.com/apache/marmotta/tree/master/libraries/ldclient

Some of them just do data lifting from others formats (e.g., XML), some

wrap APIs to get RDF out of them (e.g., Facebook), and some do other 
kind of validations and fixes (e.g., the Freebase provides does RDF 
syntax fixing before parsing).

Hope that helps. I guess we have to provide better documentations and 
diagrams to understand the infrastructure LDClient+LDCache provide.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: New module similar to LDCache

Posted by Sergio Fernández <wi...@apache.org>.

Hi Fabian,

On 02/09/14 14:09, Fabian Cretton wrote:
> So that would be the goal of the "External data sources" module, which
> was originaly called "overLOD Referencer" in the document [1]:
> - define precisely RDF data to be cached in the server: that could be a
> RDF File, a SPARQL CONSTRUCT on a end-point, etc.
> - find a way to validate the content of that data -> here we might not
> want to reason in an open world assumption, but if a property is defined
> with a certain range, we would want to check that the objects in the
> file ARE effectively instances from that defined class (for instance
> using SPARQL queries to validate the content, instead of a reasoner).
> - find a way to manage automatically the updates: it could be a 'pull'
> from Marmotta depending on some VoID data provided by the source, or the
> source could put in place a "ping" to marmotta, RSS-like features, like
> it was done by Ping-The-Semantic-Web or Sindice

All that infrastructure is provided by the current LDCache module. If I 
got it right, where you actually need to plug-in into this 
infrastructure is at the LDClient level:

   * you can define new LDClient Data Providers for your specific sources
   * which can wrap all the validation logic you need
   * then LDCache will transparently make use of your LDClient provider
   * to avoid conflicts with the default providers, they can be disabled
If that setup fit with your ideas and needs, I'd recommend you to take a 
look to the current providers:

   https://github.com/apache/marmotta/tree/master/libraries/ldclient

Some of them just do data lifting from others formats (e.g., XML), some 
wrap APIs to get RDF out of them (e.g., Facebook), and some do other 
kind of validations and fixes (e.g., the Freebase provides does RDF 
syntax fixing before parsing).

Hope that helps. I guess we have to provide better documentations and 
diagrams to understand the infrastructure LDClient+LDCache provide.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Rép. : Re: New module similar to LDCache

Posted by Fabian Cretton <Fa...@hevs.ch>.

Jakob,

Thank you for your quick answer.
This discussion is important for me, I hope we can clarify things
together. If you confirm that this feature is a plus for Marmotta, then
I could work on it, and we could take decisions together about the best
ways to implement the different functionalities.

The goal of the functionality, as described in the document I did
prepare to talk with the Marmotta team [1], seems to me different then
the LDCache, eventhough pretty similar.

That goal would be to setup a triple store for a specific purpose, and
build apps over those "controlled and validated" data. But the data
would mainly come from external sources, distributed sources. For
instance, when creating an app for engineers in the building field, we
could want to base that app on data coming from different building
material providers. Those providers publishing their catalog in RDF
(publishing is not our concern in this project). They can publish it as
an .rdf file, RDFa directly on their webside, or even a sparql
end-point. We then define the data sources we want to include (those
catalogues for instance), and the system helps an administrator to
validate those data (they should not contain unwanted or unexpected
data) and keep them up-to-date (as soon as the original data is updated,
the system must know it and do something automatically or
semi-automatically).

As I understand the LDCache, it is a functionality to transparently
cache data from the LOD when a triple contains a reference to a URI that
can be reached by one of the defined "LD Cache Endpoints". There is not
much control about which information is precisely retrived, how to
validate the content, or how that information is automatically updated
(I don't know yet how the expiry time is handeld).

This functionality, in our opinion, is a mandatory functionality to
bring the LOD to its full potential, for real world applications (and
not just for research purpose) -> here you need to know which data you
work on, know they are reliable, etc.

So that would be the goal of the "External data sources" module, which
was originaly called "overLOD Referencer" in the document [1]:
- define precisely RDF data to be cached in the server: that could be a
RDF File, a SPARQL CONSTRUCT on a end-point, etc.
- find a way to validate the content of that data -> here we might not
want to reason in an open world assumption, but if a property is defined
with a certain range, we would want to check that the objects in the
file ARE effectively instances from that defined class (for instance
using SPARQL queries to validate the content, instead of a reasoner).
- find a way to manage automatically the updates: it could be a 'pull'
from Marmotta depending on some VoID data provided by the source, or the
source could put in place a "ping" to marmotta, RSS-like features, like
it was done by Ping-The-Semantic-Web or Sindice 

Please refer to [1] for more detailed information, and let me know if
the purpose of this is really not clear ?
I hope you will be able to tell me if I did misunderstand LDCache and
finally it can play that exact role ?
If LDCache can not do that right now, do you think I should work on a
new module, or just add some functionalities to LDCache ?

Hope we can have an interesting discussion
Thank you for your help
Fabian

[1]
https://dl.dropboxusercontent.com/u/852552/Marmotta_OverLOD%20Surfer%20presentation_0.2.pdf

>>> Jakob Frank <ja...@apache.org> 02.09.2014 12:45 >>>
Hi Fabian,

looks like you chose a big one for starting ;-)

LDCache plugs into the Sesame-Sail stack to automatically retrieve
remote resources that are available in the local triple store.

Sesame does not use CDI but the build-in Java ServiceLoader [1], so
plugging in there is not as easy.

On the other hand: why do you want to implement module similar to
LDCache? What feature do you need that can't be solved using LDCache -
for me, your "exteral datasource" module sounds exactly like LDCache
in action...

Best,
Jakob

[1]
http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html

On 2 September 2014 09:06, Fabian Cretton <Fa...@hevs.ch>
wrote:
> Hi,
>
> I would like to implement a module that is similar to the LDCache
(following
> the previous discussions with Sergio about the overLOD project).
> I am currently reading about the LDCache functionalities here
> http://marmotta.apache.org/ldcache/
> and having a look at the code.
>
> (It is a pretty steep curve for me to apprehend the Marmotta project,
but I
> think it is worse it instead of starting from scratch)
>
> As I am new to this kind of project infrastructure, is there anything
I
> should read to better understand the all framework ? Maybe Java JEE
> tutorials as the project description says "The Apache Marmotta
Platform is
> implemented as a light-weight Service-Oriented Architecture (SOA)
using the
> CDI/Weld service framework (i.e. the core components of Java EE 6).
"
>
> Then, to create the new module, would it be a good idea to duplicate
the
> LDCache files (libraries and platform I guess) and modify them, or
should I
> better start from a new empty module as described here:
> http://wiki.apache.org/marmotta/Customizing#Modules
>
> Thank you for any help
> Fabian
>
>
>>>> Sergio Fernández<wi...@apache.org> 27.08.2014 16:31 >>>
> Hi Fabian,
>
> On 27/08/14 14:49, Fabian Cretton wrote:
>> My first goal was: to build the all project locally, run my locally
>> built Marmotta, and then start adding components.
>> But my first concern now that I am digging deeper, is that Marmotta
is
>> a pretty big project (about 80 projects), and so you might recommand
me
>> not to import the main "pom.xml" in my eclipse environment, but
start
>> smaller ?
>
> Then start from the platform modules.
>
>> If there is already a documentation about how to procede, thank you
to
>> point me there, I didn't find any by myself.
>
> Well, the overall build process is entirely manage by Maven, check
> http://marmotta.apache.org/installation#source
>
>> Nevertheless, I do have problems and errors in Eclipse, and hope
you
>> can help me about that.
>
> Eclipse should be able to manage such king of size of modules with
Maven.
>
>> The first problems I do have, are with many "Plugin execution not
>> covered by lifecycle configuration" errors.
>
> Some plugin lifecycles might not be supported inside Eclipse. Just
> ignore it, you should not need them.
>
>> Than I do have 6-7 : "Project build error: Non-resolvable parent
POM:
>> Could not find artifact
>> org.apache.marmotta:marmotta-parent:pom:3.2.1-SNAPSHOT and
>> 'parent.relativePath' points at wrong local POM pom.xml
>> /marmotta-backend-sparql line 23 Maven pom Loading Problem"
>> and here I am pretty confused: it seems that some POM files are not
>> up-to-date in this 3.3.0 current version, as they do still point to
a
>> "3.2.1" parent POM file, but the parent is already in its "3.3.0"
>> version ?
>
> Sorry for the error. Those modules are out of the default profile,
so
> the release plugin did not update the versions accordingly. It's
already
> fixed in the develop branch; please update your fork.
>
>> Then, apart from those Maven errors, I do have a few java errors
with
>> many "imports" or "types" which can't be resolved, and this seems
very
>> strange to me. But maybe solving the main Maven problems here above
>> would correct that ?
>
> All dependencies are available from Maven central. Try to make a
"maven
> install" from the root.
>
>> A first goal for me would be to update the Marmotta's main menu so
that
>> under "Others", next to "Linked Data Caching", I could have a
"External
>> Data Sources" menu and then work an that new module as discussed
earlier
>> with you.
>
> Then you need to create a custom module and add it to your custom
webapp
> launcher. All the process is supported by Maven artifacts, as
described at:
>
> http://wiki.apache.org/marmotta/Customizing#Modules
>
> Hope that helps.
>
> Cheers,
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 660 2747 925
> e: sergio.fernandez@redlink.co
> w: http://redlink.co

Re: New module similar to LDCache

Posted by Jakob Frank <ja...@apache.org>.

Hi Fabian,

looks like you chose a big one for starting ;-)

LDCache plugs into the Sesame-Sail stack to automatically retrieve
remote resources that are available in the local triple store.

Sesame does not use CDI but the build-in Java ServiceLoader [1], so
plugging in there is not as easy.

On the other hand: why do you want to implement module similar to
LDCache? What feature do you need that can't be solved using LDCache -
for me, your "exteral datasource" module sounds exactly like LDCache
in action...

Best,
Jakob

[1] http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html

On 2 September 2014 09:06, Fabian Cretton <Fa...@hevs.ch> wrote:
> Hi,
>
> I would like to implement a module that is similar to the LDCache (following
> the previous discussions with Sergio about the overLOD project).
> I am currently reading about the LDCache functionalities here
> http://marmotta.apache.org/ldcache/
> and having a look at the code.
>
> (It is a pretty steep curve for me to apprehend the Marmotta project, but I
> think it is worse it instead of starting from scratch)
>
> As I am new to this kind of project infrastructure, is there anything I
> should read to better understand the all framework ? Maybe Java JEE
> tutorials as the project description says "The Apache Marmotta Platform is
> implemented as a light-weight Service-Oriented Architecture (SOA) using the
> CDI/Weld service framework (i.e. the core components of Java EE 6). "
>
> Then, to create the new module, would it be a good idea to duplicate the
> LDCache files (libraries and platform I guess) and modify them, or should I
> better start from a new empty module as described here:
> http://wiki.apache.org/marmotta/Customizing#Modules
>
> Thank you for any help
> Fabian
>
>
>>>> Sergio Fernández<wi...@apache.org> 27.08.2014 16:31 >>>
> Hi Fabian,
>
> On 27/08/14 14:49, Fabian Cretton wrote:
>> My first goal was: to build the all project locally, run my locally
>> built Marmotta, and then start adding components.
>> But my first concern now that I am digging deeper, is that Marmotta is
>> a pretty big project (about 80 projects), and so you might recommand me
>> not to import the main "pom.xml" in my eclipse environment, but start
>> smaller ?
>
> Then start from the platform modules.
>
>> If there is already a documentation about how to procede, thank you to
>> point me there, I didn't find any by myself.
>
> Well, the overall build process is entirely manage by Maven, check
> http://marmotta.apache.org/installation#source
>
>> Nevertheless, I do have problems and errors in Eclipse, and hope you
>> can help me about that.
>
> Eclipse should be able to manage such king of size of modules with Maven.
>
>> The first problems I do have, are with many "Plugin execution not
>> covered by lifecycle configuration" errors.
>
> Some plugin lifecycles might not be supported inside Eclipse. Just
> ignore it, you should not need them.
>
>> Than I do have 6-7 : "Project build error: Non-resolvable parent POM:
>> Could not find artifact
>> org.apache.marmotta:marmotta-parent:pom:3.2.1-SNAPSHOT and
>> 'parent.relativePath' points at wrong local POM pom.xml
>> /marmotta-backend-sparql line 23 Maven pom Loading Problem"
>> and here I am pretty confused: it seems that some POM files are not
>> up-to-date in this 3.3.0 current version, as they do still point to a
>> "3.2.1" parent POM file, but the parent is already in its "3.3.0"
>> version ?
>
> Sorry for the error. Those modules are out of the default profile, so
> the release plugin did not update the versions accordingly. It's already
> fixed in the develop branch; please update your fork.
>
>> Then, apart from those Maven errors, I do have a few java errors with
>> many "imports" or "types" which can't be resolved, and this seems very
>> strange to me. But maybe solving the main Maven problems here above
>> would correct that ?
>
> All dependencies are available from Maven central. Try to make a "maven
> install" from the root.
>
>> A first goal for me would be to update the Marmotta's main menu so that
>> under "Others", next to "Linked Data Caching", I could have a "External
>> Data Sources" menu and then work an that new module as discussed earlier
>> with you.
>
> Then you need to create a custom module and add it to your custom webapp
> launcher. All the process is supported by Maven artifacts, as described at:
>
> http://wiki.apache.org/marmotta/Customizing#Modules
>
> Hope that helps.
>
> Cheers,
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 660 2747 925
> e: sergio.fernandez@redlink.co
> w: http://redlink.co

New module similar to LDCache

Posted by Fabian Cretton <Fa...@hevs.ch>.

Hi,

I would like to implement a module that is similar to the LDCache
(following the previous discussions with Sergio about the overLOD
project). 
I am currently reading about the LDCache functionalities here
http://marmotta.apache.org/ldcache/
and having a look at the code.

(It is a pretty steep curve for me to apprehend the Marmotta project,
but I think it is worse it instead of starting from scratch)

As I am new to this kind of project infrastructure, is there anything I
should read to better understand the all framework ? Maybe Java JEE
tutorials as the project description says "The Apache Marmotta Platform
is implemented as a light-weight Service-Oriented Architecture (SOA)
using the CDI/Weld service framework (i.e. the core components of Java
EE 6). "

Then, to create the new module, would it be a good idea to duplicate
the LDCache files (libraries and platform I guess) and modify them, or
should I better start from a new empty module as described here:
http://wiki.apache.org/marmotta/Customizing#Modules

Thank you for any help
Fabian

>>> Sergio Fernández<wi...@apache.org> 27.08.2014 16:31 >>>
Hi Fabian,

On 27/08/14 14:49, Fabian Cretton wrote:
> My first goal was: to build the all project locally, run my locally
> built Marmotta, and then start adding components.
> But my first concern now that I am digging deeper, is that Marmotta
is
> a pretty big project (about 80 projects), and so you might recommand
me
> not to import the main "pom.xml" in my eclipse environment, but
start
> smaller ?

Then start from the platform modules.

> If there is already a documentation about how to procede, thank you
to
> point me there, I didn't find any by myself.

Well, the overall build process is entirely manage by Maven, check 
http://marmotta.apache.org/installation#source

> Nevertheless, I do have problems and errors in Eclipse, and hope you
> can help me about that.

Eclipse should be able to manage such king of size of modules with
Maven.

> The first problems I do have, are with many "Plugin execution not
> covered by lifecycle configuration" errors.

Some plugin lifecycles might not be supported inside Eclipse. Just 
ignore it, you should not need them.

> Than I do have 6-7 : "Project build error: Non-resolvable parent
POM:
> Could not find artifact
> org.apache.marmotta:marmotta-parent:pom:3.2.1-SNAPSHOT and
> 'parent.relativePath' points at wrong local POM pom.xml
> /marmotta-backend-sparql line 23 Maven pom Loading Problem"
> and here I am pretty confused: it seems that some POM files are not
> up-to-date in this 3.3.0 current version, as they do still point to
a
> "3.2.1" parent POM file, but the parent is already in its "3.3.0"
> version ?

Sorry for the error. Those modules are out of the default profile, so 
the release plugin did not update the versions accordingly. It's
already 
fixed in the develop branch; please update your fork.

> Then, apart from those Maven errors, I do have a few java errors
with
> many "imports" or "types" which can't be resolved, and this seems
very
> strange to me. But maybe solving the main Maven problems here above
> would correct that ?

All dependencies are available from Maven central. Try to make a "maven

install" from the root.

> A first goal for me would be to update the Marmotta's main menu so
that
> under "Others", next to "Linked Data Caching", I could have a
"External
> Data Sources" menu and then work an that new module as discussed
earlier
> with you.

Then you need to create a custom module and add it to your custom
webapp 
launcher. All the process is supported by Maven artifacts, as described
at:

http://wiki.apache.org/marmotta/Customizing#Modules

Hope that helps.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: Possible contribution

Posted by Sergio Fernández <wi...@apache.org>.

Hi Fabian,

On 27/08/14 14:49, Fabian Cretton wrote:
> My first goal was: to build the all project locally, run my locally
> built Marmotta, and then start adding components.
> But my first concern now that I am digging deeper, is that Marmotta is
> a pretty big project (about 80 projects), and so you might recommand me
> not to import the main "pom.xml" in my eclipse environment, but start
> smaller ?

Then start from the platform modules.

> If there is already a documentation about how to procede, thank you to
> point me there, I didn't find any by myself.

Well, the overall build process is entirely manage by Maven, check 
http://marmotta.apache.org/installation#source

> Nevertheless, I do have problems and errors in Eclipse, and hope you
> can help me about that.

Eclipse should be able to manage such king of size of modules with Maven.

> The first problems I do have, are with many "Plugin execution not
> covered by lifecycle configuration" errors.

Some plugin lifecycles might not be supported inside Eclipse. Just 
ignore it, you should not need them.

> Than I do have 6-7 : "Project build error: Non-resolvable parent POM:
> Could not find artifact
> org.apache.marmotta:marmotta-parent:pom:3.2.1-SNAPSHOT and
> 'parent.relativePath' points at wrong local POM pom.xml
> /marmotta-backend-sparql line 23 Maven pom Loading Problem"
> and here I am pretty confused: it seems that some POM files are not
> up-to-date in this 3.3.0 current version, as they do still point to a
> "3.2.1" parent POM file, but the parent is already in its "3.3.0"
> version ?

Sorry for the error. Those modules are out of the default profile, so 
the release plugin did not update the versions accordingly. It's already 
fixed in the develop branch; please update your fork.

> Then, apart from those Maven errors, I do have a few java errors with
> many "imports" or "types" which can't be resolved, and this seems very
> strange to me. But maybe solving the main Maven problems here above
> would correct that ?

All dependencies are available from Maven central. Try to make a "maven 
install" from the root.

> A first goal for me would be to update the Marmotta's main menu so that
> under "Others", next to "Linked Data Caching", I could have a "External
> Data Sources" menu and then work an that new module as discussed earlier
> with you.

Then you need to create a custom module and add it to your custom webapp 
launcher. All the process is supported by Maven artifacts, as described at:

http://wiki.apache.org/marmotta/Customizing#Modules

Hope that helps.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: Rép. : Re: Possible contribution

Posted by Fabian Cretton <Fa...@hevs.ch>.

Hi Sergio,

Thank you for your answers.

I am currently trying to set up my eclipse environment so that I can
start working with Marmotta's code.
As I said, "git, maven, Apache way" are all new things for me, so
please excuse my "newbie" questions.

My first goal was: to build the all project locally, run my locally
built Marmotta, and then start adding components.
But my first concern now that I am digging deeper, is that Marmotta is
a pretty big project (about 80 projects), and so you might recommand me
not to import the main "pom.xml" in my eclipse environment, but start
smaller ? 
If there is already a documentation about how to procede, thank you to
point me there, I didn't find any by myself.

Nevertheless, I do have problems and errors in Eclipse, and hope you
can help me about that.

First, I did fork Marmotta, and create a new branch "dev-overLOD" from
the "develop" branch.
You said in the previous mail that this might be an unnecessary step as
we are not touching the core of Marmotta, but actually we might
contribute to some changes or improvement if needed. I did "fork" and
"branch" just following instructions you gave to QiHong, so if that is
not "wrong", I'll go on from there.

I did install Eclipse Luna that comes with m2e 1.5 (Maven 3.2.1), and
Java 7.
I import an "existing Maven project" -> by pointing to the main folder
of my "dev-overLOD" branch.

The first problems I do have, are with many "Plugin execution not
covered by lifecycle configuration" errors.
Is there a way to correct that ? I see that QiHong and other people on
the Marmotta forum did have the same problems, and you just told them to
ignore that as they don't need a full build. But my goal was to be able
to produce a full build.
So far, I did change the Eclipse setting for Maven "Errors/Warning",
and set the "Plugin execution not covered by lifecycle configuration" to
"warnings" instead of "errors".

Than I do have 6-7 : "Project build error: Non-resolvable parent POM:
Could not find artifact
org.apache.marmotta:marmotta-parent:pom:3.2.1-SNAPSHOT and
'parent.relativePath' points at wrong local POM pom.xml
/marmotta-backend-sparql line 23 Maven pom Loading Problem"

and here I am pretty confused: it seems that some POM files are not
up-to-date in this 3.3.0 current version, as they do still point to a
"3.2.1" parent POM file, but the parent is already in its "3.3.0"
version ?

Then, apart from those Maven errors, I do have a few java errors with
many "imports" or "types" which can't be resolved, and this seems very
strange to me. But maybe solving the main Maven problems here above
would correct that ?
Here is an example "Category cannot be resolved to a type
AtomParser.java
/marmotta-rio-rss/src/main/java/org/apache/marmotta/commons/sesame/rio/rss
line 190 Java Problem"
A first goal for me would be to update the Marmotta's main menu so that
under "Others", next to "Linked Data Caching", I could have a "External
Data Sources" menu and then work an that new module as discussed earlier
with you. By the way, any idea/discussion about how should that module
work is welcome, as it seems to me it will be a main feature for future
real-life applications based on RDF (and other structured data).

Thank you for any help/explanation about how I should go on.

Fabian

>>> Sergio Fernández<wi...@apache.org> 25.08.2014 14:29 >>>
Hi Fabian,

On 22/08/14 09:19, Fabian Cretton wrote:
> I hope you are well, and I am coming back to you as I will start
these
> days  to understand better Marmotta and see (hopefully with your
help)
> how the features we are thinking of in OverLOD could be implemented.
>
> There are quiet a few questions in this email, hope you can answer
them
> so that I can move on more efficiently. Thank you in advance.

The community will try to help us when possible, sure.

> In a former email your were pointing me towards Fusepool, saying
> "Actually looking to the idea, technologically talking it does not
look
> so different to what the Fusepool P3 FP7 project tries to do. "
> I thus had a look at Fusepool, but from what I saw, Fusepool is
about
> "creating" RDF from non-RDF resources, whereas OverLOD is mainly
about
> consuming existing RDF data and have it at disposal for a specific
> platform and use-case.
> So one goal of OverLOD is more about the "next steps" of the
semantic
> web: how to consume efficiently RDF, and then make it easier for
non-RDF
> developpers to use the data.

I may got it wrong. Sure, the goal looks different. And very important

and careless one. Let's see what progress OverLOD is able to do :-)

> About developpment
> Apache developpment is a new world for me, but some colleagues here
> might help me.
> Also I will follow the instruction you gave to QiHong earlier this
> year, and so I will start to fork marmotta if I am not mistaken (git
and
> github are also new to me).
>
> So I will do the same, and thus, in my fork, create a branch from
> 'develop' -> is there a name you recommand me ? do I need to create
> 'Issues' about OverLOD features ?

Wekk, forking the repo from github could be also useful. But since you

are not going to modify the core of Marmotta (right|?), just depending

on the Maven level should be more than enough for you.

Please, report as Jira issues all problems and/or needs you may have.

> As you pointed out "For me the "OverLOD Referencer" has a big
potential
> of reusing the infrastructure provided by LDClient [2 (
> http://marmotta.apache.org/ldclient/ )] and LDCache [3 (
> http://marmotta.apache.org/ldcache/ )]."
> LDClient seems the way to import external data into Marmotta. I
don't
> see LDClient on the "Platform Architecture Overview".
> This external data could be RDF, or data that needs to be RDFized,
> right ?
> The first question is the one already expressed here above: does
> LDClient already handle the automatic update of the data once the
data
> source have been modified ? (I only read about some time-out
features
> but don't know yet what it means).
> Second question is about the RDFizers: there is a nice list of
RDFizer
> listed, but nothing about Microdata/Microformat, would that be
something
> to implement if needed ?
>
> Then, LDCache does handle where to store the incoming data from
> LDClient. But is this storage different from the main Marmotta
storage ?
> Are the imported data part of the default graph and queryable
> transparently with the other data from the LDP ? I guess so, but
from
> what I read I did have some doubts.

Shortly:

* LDClient is a library to access resources as Linked Data, not only 
directly Linked Data resources, but also allowing the transformation of

non-Linked data resource into Linked Data (RDF). It comes with some
data 
providers http://marmotta.apache.org/ldclient/dataproviders, but it can

be extended to support new ones, such as for Microdata.

* LDCache is the infrastructure built on top of LDClient to 
automatically retrieve (cache) data from different sources. Further 
details at: http://marmotta.apache.org/ldcache

These two libraries are use in the Marmotta platform to automatically 
cache in the triple store (under an special named graph / context)
data, 
and keep it updated according the configuration.

Hope that helps.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: Rép. : Re: Possible contribution

Posted by Sergio Fernández <wi...@apache.org>.

Hi Fabian,

On 22/08/14 09:19, Fabian Cretton wrote:
> I hope you are well, and I am coming back to you as I will start these
> days  to understand better Marmotta and see (hopefully with your help)
> how the features we are thinking of in OverLOD could be implemented.
>
> There are quiet a few questions in this email, hope you can answer them
> so that I can move on more efficiently. Thank you in advance.

The community will try to help us when possible, sure.

> In a former email your were pointing me towards Fusepool, saying
> "Actually looking to the idea, technologically talking it does not look
> so different to what the Fusepool P3 FP7 project tries to do. "
> I thus had a look at Fusepool, but from what I saw, Fusepool is about
> "creating" RDF from non-RDF resources, whereas OverLOD is mainly about
> consuming existing RDF data and have it at disposal for a specific
> platform and use-case.
> So one goal of OverLOD is more about the "next steps" of the semantic
> web: how to consume efficiently RDF, and then make it easier for non-RDF
> developpers to use the data.

I may got it wrong. Sure, the goal looks different. And very important 
and careless one. Let's see what progress OverLOD is able to do :-)

> About developpment
> Apache developpment is a new world for me, but some colleagues here
> might help me.
> Also I will follow the instruction you gave to QiHong earlier this
> year, and so I will start to fork marmotta if I am not mistaken (git and
> github are also new to me).
>
> So I will do the same, and thus, in my fork, create a branch from
> 'develop' -> is there a name you recommand me ? do I need to create
> 'Issues' about OverLOD features ?

Wekk, forking the repo from github could be also useful. But since you 
are not going to modify the core of Marmotta (right|?), just depending 
on the Maven level should be more than enough for you.

Please, report as Jira issues all problems and/or needs you may have.

> As you pointed out "For me the "OverLOD Referencer" has a big potential
> of reusing the infrastructure provided by LDClient [2 (
> http://marmotta.apache.org/ldclient/ )] and LDCache [3 (
> http://marmotta.apache.org/ldcache/ )]."
> LDClient seems the way to import external data into Marmotta. I don't
> see LDClient on the "Platform Architecture Overview".
> This external data could be RDF, or data that needs to be RDFized,
> right ?
> The first question is the one already expressed here above: does
> LDClient already handle the automatic update of the data once the data
> source have been modified ? (I only read about some time-out features
> but don't know yet what it means).
> Second question is about the RDFizers: there is a nice list of RDFizer
> listed, but nothing about Microdata/Microformat, would that be something
> to implement if needed ?
>
> Then, LDCache does handle where to store the incoming data from
> LDClient. But is this storage different from the main Marmotta storage ?
> Are the imported data part of the default graph and queryable
> transparently with the other data from the LDP ? I guess so, but from
> what I read I did have some doubts.

Shortly:

* LDClient is a library to access resources as Linked Data, not only 
directly Linked Data resources, but also allowing the transformation of 
non-Linked data resource into Linked Data (RDF). It comes with some data 
providers http://marmotta.apache.org/ldclient/dataproviders, but it can 
be extended to support new ones, such as for Microdata.

* LDCache is the infrastructure built on top of LDClient to 
automatically retrieve (cache) data from different sources. Further 
details at: http://marmotta.apache.org/ldcache

These two libraries are use in the Marmotta platform to automatically 
cache in the triple store (under an special named graph / context) data, 
and keep it updated according the configuration.

Hope that helps.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Rép. : Re: Possible contribution

Posted by Fabian Cretton <Fa...@hevs.ch>.

Sergio,

I hope you are well.
I hope you are well, and I am coming back to you as I will start these
days  to understand better Marmotta and see (hopefully with your help)
how the features we are thinking of in OverLOD could be implemented.

There are quiet a few questions in this email, hope you can answer them
so that I can move on more efficiently. Thank you in advance.

In a former email your were pointing me towards Fusepool, saying
"Actually looking to the idea, technologically talking it does not look
so different to what the Fusepool P3 FP7 project tries to do. "
I thus had a look at Fusepool, but from what I saw, Fusepool is about
"creating" RDF from non-RDF resources, whereas OverLOD is mainly about
consuming existing RDF data and have it at disposal for a specific
platform and use-case.
So one goal of OverLOD is more about the "next steps" of the semantic
web: how to consume efficiently RDF, and then make it easier for non-RDF
developpers to use the data.

For instance, one use case could be to develop a software for
engineers, the software being based on construction material data from
different construction material providers. The providers publish there
catalogs in RDF, and this instance of OverLOD keep a read-only copy of
all the catalogs, managing updates efficiently and automatically,
eventually handling data validation and so on. The software might need
other information as some regulations, maybe meteo data, etc. And all
those data are maintained in the OverLOD triple store for the softwares
based on that instance of OverLOD.

This feature is called "OverLOD Referencer" in the document "OverLOD
Surfer – Marmotta discussion"
https://dl.dropboxusercontent.com/u/852552/Marmotta_OverLOD%20Surfer%20presentation_0.2.pdf

OverLOD triple store (i.e. Marmotta) should be able to handle its own
RDF data BUT ALSO including RDF data from other sources on the web (RDF
files, RDF from SPARQL construct on different end-points, eventually
RDFa, Microformat/Microdata with rdfization here). And there here quite
some job to do, we haven't decided yet how far we will get, as there are
other features we need to implement.

If I am not mistaken, neither Marmotta, nor Fusepool or LDP does have
this feature, isn't it ? (I am also reading the LDP specification right
now)
At the end of the document "OverLOD Surfer – Marmotta discussion", I
came up with some questions which haven't been answered if I am not
mistaken, here I copy them:
Both Marmotta and OverLOD handle LOD data with a local copy of the
data. How does Marmotta plan to put in place automatic updates once the
original data is modified ?
Data validation: does Marmotta plan to validate the local data (in a
CWA manner, à la SPIN maybe) ?
Data "chunks": does Marmotta provide ways to import only a part of a
data source, for instance running SPARQL Construct queries on a
end-point, or on the content of an imported RDF file ?
It is to be noted that OverLOD was written before the W3C advancement
on LDP and JSON-LD. But now want to make good use of those
specifications.

About developpment
Apache developpment is a new world for me, but some colleagues here
might help me. 
Also I will follow the instruction you gave to QiHong earlier this
year, and so I will start to fork marmotta if I am not mistaken (git and
github are also new to me).
You told him:

·        Fork our mirror there [1] and give me (wikier) admin
permissions.
·        Create, at least,  a branch from 'develop' for your project;  
   according our development guidelines [2], I'd recommend you to use
the issue [3] as name for the branch: MARMOTTA-444.
·        I'll closely follow your development there, using the comments
on the code committed to provide you early feedback.
·        Create issues there for internal issues of the project.

So I will do the same, and thus, in my fork, create a branch from
'develop' -> is there a name you recommand me ? do I need to create
'Issues' about OverLOD features ?

As you pointed out "For me the "OverLOD Referencer" has a big potential
of reusing the infrastructure provided by LDClient [2 (
http://marmotta.apache.org/ldclient/ )] and LDCache [3 (
http://marmotta.apache.org/ldcache/ )]."
LDClient seems the way to import external data into Marmotta. I don't
see LDClient on the "Platform Architecture Overview".
This external data could be RDF, or data that needs to be RDFized,
right ?
The first question is the one already expressed here above: does
LDClient already handle the automatic update of the data once the data
source have been modified ? (I only read about some time-out features
but don't know yet what it means).
Second question is about the RDFizers: there is a nice list of RDFizer
listed, but nothing about Microdata/Microformat, would that be something
to implement if needed ?

Then, LDCache does handle where to store the incoming data from
LDClient. But is this storage different from the main Marmotta storage ?
Are the imported data part of the default graph and queryable
transparently with the other data from the LDP ? I guess so, but from
what I read I did have some doubts.

That's all for now, thank you again
Fabian

>>> Sergio Fernández<wi...@apache.org> 18.07.2014 10:53 >>>
Hi Fabian.

On 13/07/14 07:00, Fabian Cretton wrote:
> As I said, we are not really working on that until mid-august.
However,
> what document would you recommand me to read until then, in order to
> really understand Marmotta ?

The platform description is a good starting point:

   http://marmotta.apache.org/platform

Just let us know whatever we can help.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: Possible contribution

Posted by Sergio Fernández <wi...@apache.org>.

Hi Fabian.

On 13/07/14 07:00, Fabian Cretton wrote:
> As I said, we are not really working on that until mid-august. However,
> what document would you recommand me to read until then, in order to
> really understand Marmotta ?

The platform description is a good starting point:

   http://marmotta.apache.org/platform

Just let us know whatever we can help.

Cheers,

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Rép. : Re: Rép. : Re: Possible contribution

Posted by Fabian Cretton <Fa...@hevs.ch>.

Sergio,

Thank you very much for the precisions.
And very nice to hear that Marmotta is based on Sesame, that is a good
new for us.

As I said, we are not really working on that until mid-august. However,
what document would you recommand me to read until then, in order to
really understand Marmotta ?

Thank you for any pointer
Fabian

>>> Sergio Fernández<wi...@apache.org> 22/05/14 14:37 >>>
Hi Fabian,

On 19/05/14 10:57, Fabian Cretton wrote:
> Thank you very much for your answer. As we will finally start the
> developpment only at the end of the summer, we still have time to
> discuss the options.

Great.

> In a few weeks, I might come back with more precise questions about
how
> could the overLOD features be implemented effectively.
> Do I understand correctly that you say, according to our first
thoughts
> about "html5/node.js+sesame-owlim" we could implement
> "html5/node.js+marmotta-kiwi". With that configuration, would be the
> goal of node.js just to "hide" marmotta from the clients ? and thus
also
> bring an abstraction layer: with that configuration, having node.js
> interacting with an LDP platform over HTTP, on the server, it would
make
> the architecture pretty flexible if there is any need to move from one
> LDP to another, that could be interesting.

Well, first you have to decide what backend you need to use for your 
frontend. If JavaEE is enough, you can reuse the same than Marmotta, 
which will be more efficient than have it running on a different runtime

(node.js or whatever).

Marmotta admin interface is extensible, so you can easily plug-in new 
modules. But you can have any other HTML UI on top. Check the Fusepool 
P3's platform early implementation to have a simple example how:

   https://github.com/fusepoolP3/platform

> But about efficiency and response time, would it be better to simply
> have "html5/marmotta", I think that so far Marmotta is implemented
like
> this with its client interfaces, no ?

Exactly, the Marmotta admin interface works in that way.

> is Marmotta/Kiwi comparable to the couple Sesame/OWLIM ?

Well, not really, but I'll try to explain some things:

* Marmotta is based on Sesame
* Marmotta allows to use different backends (triplestores) based on the 
Sesame Sail API: http://marmotta.apache.org/platform/backends
* KiWi is the default backend
* There are some other backend available out there (Sesame Native, 
BigData, Virtuoso, Titan, etc)
* More can be easily added since it's a matter of implementing a Java 
interface and a compatible Maven module
* Therefore you could easily add a OWLIM backend for Marmotta

Hope with these things you have now a clearer picture.

> Also, do you have any information about Kiwi performances ?

Not really... MARMOTTA-177 is one of our long-time unresolved tasks. The

project would benefit a lot if you have resources in overLOD for such 
kind on performance evaluations.

Cheers,

-- 
Sergio Fernández
Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria
T: +43 662 2288 318 | M: +43 660 2747 925
sergio.fernandez@salzburgresearch.at
http://www.salzburgresearch.at

Re: Rép. : Re: Possible contribution

Posted by Sergio Fernández <wi...@apache.org>.

Hi Fabian,

On 19/05/14 10:57, Fabian Cretton wrote:
> Thank you very much for your answer. As we will finally start the
> developpment only at the end of the summer, we still have time to
> discuss the options.

Great.

> In a few weeks, I might come back with more precise questions about how
> could the overLOD features be implemented effectively.
> Do I understand correctly that you say, according to our first thoughts
> about "html5/node.js+sesame-owlim" we could implement
> "html5/node.js+marmotta-kiwi". With that configuration, would be the
> goal of node.js just to "hide" marmotta from the clients ? and thus also
> bring an abstraction layer: with that configuration, having node.js
> interacting with an LDP platform over HTTP, on the server, it would make
> the architecture pretty flexible if there is any need to move from one
> LDP to another, that could be interesting.

Well, first you have to decide what backend you need to use for your 
frontend. If JavaEE is enough, you can reuse the same than Marmotta, 
which will be more efficient than have it running on a different runtime 
(node.js or whatever).

Marmotta admin interface is extensible, so you can easily plug-in new 
modules. But you can have any other HTML UI on top. Check the Fusepool 
P3's platform early implementation to have a simple example how:

   https://github.com/fusepoolP3/platform

> But about efficiency and response time, would it be better to simply
> have "html5/marmotta", I think that so far Marmotta is implemented like
> this with its client interfaces, no ?

Exactly, the Marmotta admin interface works in that way.

> is Marmotta/Kiwi comparable to the couple Sesame/OWLIM ?

Well, not really, but I'll try to explain some things:

* Marmotta is based on Sesame
* Marmotta allows to use different backends (triplestores) based on the 
Sesame Sail API: http://marmotta.apache.org/platform/backends
* KiWi is the default backend
* There are some other backend available out there (Sesame Native, 
BigData, Virtuoso, Titan, etc)
* More can be easily added since it's a matter of implementing a Java 
interface and a compatible Maven module
* Therefore you could easily add a OWLIM backend for Marmotta

Hope with these things you have now a clearer picture.

> Also, do you have any information about Kiwi performances ?

Not really... MARMOTTA-177 is one of our long-time unresolved tasks. The 
project would benefit a lot if you have resources in overLOD for such 
kind on performance evaluations.

Cheers,

-- 
Sergio Fernández
Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria
T: +43 662 2288 318 | M: +43 660 2747 925
sergio.fernandez@salzburgresearch.at
http://www.salzburgresearch.at

Rép. : Re: Possible contribution

Posted by Fabian Cretton <Fa...@hevs.ch>.

Sergio,

Thank you very much for your answer. As we will finally start the
developpment only at the end of the summer, we still have time to
discuss the options.

In a few weeks, I might come back with more precise questions about how
could the overLOD features be implemented effectively.
Do I understand correctly that you say, according to our first thoughts
about "html5/node.js+sesame-owlim" we could implement
"html5/node.js+marmotta-kiwi". With that configuration, would be the
goal of node.js just to "hide" marmotta from the clients ? and thus also
bring an abstraction layer: with that configuration, having node.js
interacting with an LDP platform over HTTP, on the server, it would make
the architecture pretty flexible if there is any need to move from one
LDP to another, that could be interesting.
But about efficiency and response time, would it be better to simply
have "html5/marmotta", I think that so far Marmotta is implemented like
this with its client interfaces, no ?

is Marmotta/Kiwi comparable to the couple Sesame/OWLIM ?

Also, do you have any information about Kiwi performances ?

Thank you
Fabian

>>> Sergio Fernández<wi...@apache.org> 11/05/14 18:31 >>>
Hi Fabian,

first of all thanks for the interest in the project.

On 02/05/14 08:54, Fabian Cretton wrote:
> So I did prepare a document that can be found here:
>
https://dl.dropboxusercontent.com/u/852552/Marmotta_OverLOD%20Surfer%20presentation_0.2.pdf
>
> I hope it will help you to see if, as you say, our project's features
> fit in your roadmap.

Well, I'm not saying it does not fit, but also take into account that
does not require to: you can build your project using Marmotta as
platform with your own agenda. But let's see

For the document, I'd like to quote a sentence:

"Before initiating this discussion with the Marmotta team, our
intention was to develop our own tool based on node.js/Sesame-OWLIM
for the back-end, and HTML 5 for the front-end."

Such plan would be still possible: you just replace OWLIM by Marmotta,
and in case you would like to reuse the same runtime, node.js by JavaEE.

The rest pieces can remain the same.

Actually looking to the idea, technologically talking it does not look
so different to what the Fusepool P3 FP7 project tries to do. So,
although still in a very early stage, you could take that platform [1]
as example of usage/extension of Marmotta.

Going deeper into the components of the project:

* Many components are UI one. Since it's one of the weakness of the
project, I'd say that the results would be relevant for the project for
sure.

* For me the "OverLOD Referencer" has a big potential of reusing the
infrastructure provided by LDClient [2] and LDCache [3].

* For sure you may benefit of some of the other infrastructure (LDP,
SPARQL and so on) which does not make sense you implement from scratch.

* The analytics part is something interesting to see.

* And for sure, all Linked Data apps built of top will be always well
received.

So, from my personal point of view, I'd welcome OverLOD to be built on
top of Marmotta and start to join the community with relevant
contributions. I can't promise so much effort besides my support when
possible, but we'll manage somehow.

Cheers,

[1] https://github.com/fusepoolP3/platform
[2] http://marmotta.apache.org/ldclient
[3] http://marmotta.apache.org/ldcache

--
Sergio Fernández
Senior Researcher
Knowledge and Media Technologies
Salzburg Research Forschungsgesellschaft mbH
Jakob-Haringer-Straße 5/3 | 5020 Salzburg, Austria
T: +43 662 2288 318 | M: +43 660 2747 925
sergio.fernandez@salzburgresearch.at
http://www.salzburgresearch.at