You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Rene Nederhand <re...@nederhand.net> on 2012/10/09 14:44:57 UTC

Help creating a custom vocabulary

Hi,


I am trying to create a custom vocabulary using
webdatacommons<http://webdatacommons.org/>RDFa data [1]. To do this I
am following this
tutorial <http://stanbol.apache.org/docs/trunk/customvocabulary.html> [2].

I've installed the indexer tool without any problems, editing the config
file and I am now working on the mapping.txt file. However, I am clueless
on what I should change in this file.

An example of the data is
here<http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq>[3]:

head -n 5 ../resources/rdfdata/ccrdf.html-rdfa.sample.nq
<
http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/>
<http://creativecommons.org/ns#attributionURL> <http://turcanu.net> <
http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/>
  .
<
http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/>
<http://creativecommons.org/ns#attributionName> "Sergiu Turcanu" <
http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/>
  .
<http://www.telemac0.net/marketing-50/> <
http://purl.org/dc/elements/1.1/type> <http://purl.org/dc/dcmitype/Text> <
http://www.telemac0.net/marketing-50/>   .
<http://www.telemac0.net/marketing-50/> <
http://purl.org/dc/elements/1.1/title> "telemac0" <
http://www.telemac0.net/marketing-50/>   .
<http://www.telemac0.net/marketing-50/> <
http://creativecommons.org/ns#attributionURL> <http://telemac0.net> <
http://www.telemac0.net/marketing-50/>

Could anyone point me in de the right direction?

Cheers,

René Nederhand


[1] http://webdatacommons.org/
[2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
[3] http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq

Re: Help creating a custom vocabulary

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi René,

BTW I finished the work on STANBOL-765 today. See first comment for
the documentation on how to enable indexing of Bnodes.

best
Rupert

On Thu, Oct 11, 2012 at 10:54 PM, Rene Nederhand <re...@nederhand.net> wrote:
> Hi Rupert,
>
> Thank you very much for all the work. I'd expected this would take much
> longer :)
>
> Probably this weekend, I will try to get some of the CommonCrawl data
> imported into Stanbol and see how this works out.
>
> In addition, I will try the Apache any23 tool (thx. A. Soroka).
>
> Best,
> René
>
> On Wed, Oct 10, 2012 at 11:39 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi Rene,
>>
>> With STANBOL-764 the indexing tool now supports importing quads.
>> However you will still have problems to work with the CommonCrawl data.
>>
>> 1. Because a lot of the data do use BNodes and those are ignored by
>> the Entityhub. As indexing of Bnodes was already requested several
>> times from I created STANBOL-765 to address this. While this will not
>> allow the Entityhub to handle BNodes it will allow users to specify
>> if/how Bnodes are converted to dereferable URIs.
>>
>> 2. I got a parse exception with Jena Riot in the test data file
>> refered by your original mail [3].
>>
>>     Caused by: org.openjena.riot.RiotException: [line: 3931, col: 124]
>> expected "_:"
>>         at
>> org.openjena.riot.ErrorHandlerLib$ErrorHandlerStd.fatal(ErrorHandlerLib.java:97)
>>
>> This was caused by a literal using a country specific language tag
>>
>> <http://bearhungfactory.mysinablog.com/index.php>
>> <http://creativecommons.org/ns#attributionName>
>> "\u6D2A\u96C4\u718A"@zh_tw
>> <http://bearhungfactory.mysinablog.com/index.php>   .
>>
>> changing "@zh_tw" to "@zh" fixed the problem. This is a bug in the
>> used Jena version.
>>
>>     com.hp.hpl.jena:jena:2.6.3
>>     com.hp.hpl.jena:arq:2.8.5
>>     com.hp.hpl.jena:tdb:0.8.7
>>
>> Maybe upgrading to a newer Jena version could solve this. However this
>> would previously require Clerezza to adopt the newer version (see
>> STANBOL-621).
>>
>> best
>> Rupert
>>
>> On Tue, Oct 9, 2012 at 10:34 PM, Rene Nederhand <re...@nederhand.net>
>> wrote:
>> > Hi Rupert,
>> >
>> > It would be great if we could make it possible to use CommonCrawl data
>> even
>> > if we would lose some information. As I remember well, this was one of
>> the
>> > requests that came up in the validation reports quite frequently.
>> Freebase
>> > is an alternative.
>> >
>> > So, if this involves importing N-quads then I would appreciate adding
>> this
>> > feature. No need for hurry and I am more than happy to help. Thanks!
>> >
>> > Best,
>> > René
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Oct 9, 2012 at 10:02 PM, Rupert Westenthaler <
>> > rupert.westenthaler@gmail.com> wrote:
>> >
>> >> Hi Rene,
>> >>
>> >> The problem ist that the files of this dataset do use N-Quads and not
>> >> NTriples (basically SPOC (Subject, Predicate, Object, Context) instead
>> >> of SPO.
>> >>
>> >> I can try to add support for importing N-Quads, but because the
>> >> importing tool does not use named graphs you might even than lose some
>> >> quads ( multiple Quads with the same SPO values).
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> On Tue, Oct 9, 2012 at 2:44 PM, Rene Nederhand <re...@nederhand.net>
>> wrote:
>> >> > Hi,
>> >> >
>> >> >
>> >> > I am trying to create a custom vocabulary using
>> >> > webdatacommons<http://webdatacommons.org/>RDFa data [1]. To do this I
>> >> > am following this
>> >> > tutorial <http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>> >> [2].
>> >> >
>> >> > I've installed the indexer tool without any problems, editing the
>> config
>> >> > file and I am now working on the mapping.txt file. However, I am
>> clueless
>> >> > on what I should change in this file.
>> >> >
>> >> > An example of the data is
>> >> > here<http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
>> >> >[3]:
>> >> >
>> >> > head -n 5 ../resources/rdfdata/ccrdf.html-rdfa.sample.nq
>> >> > <
>> >> >
>> >>
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>> >> >
>> >> > <http://creativecommons.org/ns#attributionURL> <http://turcanu.net> <
>> >> >
>> >>
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>> >> >
>> >> >   .
>> >> > <
>> >> >
>> >>
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>> >> >
>> >> > <http://creativecommons.org/ns#attributionName> "Sergiu Turcanu" <
>> >> >
>> >>
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>> >> >
>> >> >   .
>> >> > <http://www.telemac0.net/marketing-50/> <
>> >> > http://purl.org/dc/elements/1.1/type> <
>> http://purl.org/dc/dcmitype/Text>
>> >> <
>> >> > http://www.telemac0.net/marketing-50/>   .
>> >> > <http://www.telemac0.net/marketing-50/> <
>> >> > http://purl.org/dc/elements/1.1/title> "telemac0" <
>> >> > http://www.telemac0.net/marketing-50/>   .
>> >> > <http://www.telemac0.net/marketing-50/> <
>> >> > http://creativecommons.org/ns#attributionURL> <http://telemac0.net> <
>> >> > http://www.telemac0.net/marketing-50/>
>> >> >
>> >> > Could anyone point me in de the right direction?
>> >> >
>> >> > Cheers,
>> >> >
>> >> > René Nederhand
>> >> >
>> >> >
>> >> > [1] http://webdatacommons.org/
>> >> > [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
>> >> > [3] http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Help creating a custom vocabulary

Posted by Rene Nederhand <re...@nederhand.net>.
Hi Rupert,

Thank you very much for all the work. I'd expected this would take much
longer :)

Probably this weekend, I will try to get some of the CommonCrawl data
imported into Stanbol and see how this works out.

In addition, I will try the Apache any23 tool (thx. A. Soroka).

Best,
René

On Wed, Oct 10, 2012 at 11:39 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Rene,
>
> With STANBOL-764 the indexing tool now supports importing quads.
> However you will still have problems to work with the CommonCrawl data.
>
> 1. Because a lot of the data do use BNodes and those are ignored by
> the Entityhub. As indexing of Bnodes was already requested several
> times from I created STANBOL-765 to address this. While this will not
> allow the Entityhub to handle BNodes it will allow users to specify
> if/how Bnodes are converted to dereferable URIs.
>
> 2. I got a parse exception with Jena Riot in the test data file
> refered by your original mail [3].
>
>     Caused by: org.openjena.riot.RiotException: [line: 3931, col: 124]
> expected "_:"
>         at
> org.openjena.riot.ErrorHandlerLib$ErrorHandlerStd.fatal(ErrorHandlerLib.java:97)
>
> This was caused by a literal using a country specific language tag
>
> <http://bearhungfactory.mysinablog.com/index.php>
> <http://creativecommons.org/ns#attributionName>
> "\u6D2A\u96C4\u718A"@zh_tw
> <http://bearhungfactory.mysinablog.com/index.php>   .
>
> changing "@zh_tw" to "@zh" fixed the problem. This is a bug in the
> used Jena version.
>
>     com.hp.hpl.jena:jena:2.6.3
>     com.hp.hpl.jena:arq:2.8.5
>     com.hp.hpl.jena:tdb:0.8.7
>
> Maybe upgrading to a newer Jena version could solve this. However this
> would previously require Clerezza to adopt the newer version (see
> STANBOL-621).
>
> best
> Rupert
>
> On Tue, Oct 9, 2012 at 10:34 PM, Rene Nederhand <re...@nederhand.net>
> wrote:
> > Hi Rupert,
> >
> > It would be great if we could make it possible to use CommonCrawl data
> even
> > if we would lose some information. As I remember well, this was one of
> the
> > requests that came up in the validation reports quite frequently.
> Freebase
> > is an alternative.
> >
> > So, if this involves importing N-quads then I would appreciate adding
> this
> > feature. No need for hurry and I am more than happy to help. Thanks!
> >
> > Best,
> > René
> >
> >
> >
> >
> >
> > On Tue, Oct 9, 2012 at 10:02 PM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi Rene,
> >>
> >> The problem ist that the files of this dataset do use N-Quads and not
> >> NTriples (basically SPOC (Subject, Predicate, Object, Context) instead
> >> of SPO.
> >>
> >> I can try to add support for importing N-Quads, but because the
> >> importing tool does not use named graphs you might even than lose some
> >> quads ( multiple Quads with the same SPO values).
> >>
> >> best
> >> Rupert
> >>
> >> On Tue, Oct 9, 2012 at 2:44 PM, Rene Nederhand <re...@nederhand.net>
> wrote:
> >> > Hi,
> >> >
> >> >
> >> > I am trying to create a custom vocabulary using
> >> > webdatacommons<http://webdatacommons.org/>RDFa data [1]. To do this I
> >> > am following this
> >> > tutorial <http://stanbol.apache.org/docs/trunk/customvocabulary.html>
> >> [2].
> >> >
> >> > I've installed the indexer tool without any problems, editing the
> config
> >> > file and I am now working on the mapping.txt file. However, I am
> clueless
> >> > on what I should change in this file.
> >> >
> >> > An example of the data is
> >> > here<http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
> >> >[3]:
> >> >
> >> > head -n 5 ../resources/rdfdata/ccrdf.html-rdfa.sample.nq
> >> > <
> >> >
> >>
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
> >> >
> >> > <http://creativecommons.org/ns#attributionURL> <http://turcanu.net> <
> >> >
> >>
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
> >> >
> >> >   .
> >> > <
> >> >
> >>
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
> >> >
> >> > <http://creativecommons.org/ns#attributionName> "Sergiu Turcanu" <
> >> >
> >>
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
> >> >
> >> >   .
> >> > <http://www.telemac0.net/marketing-50/> <
> >> > http://purl.org/dc/elements/1.1/type> <
> http://purl.org/dc/dcmitype/Text>
> >> <
> >> > http://www.telemac0.net/marketing-50/>   .
> >> > <http://www.telemac0.net/marketing-50/> <
> >> > http://purl.org/dc/elements/1.1/title> "telemac0" <
> >> > http://www.telemac0.net/marketing-50/>   .
> >> > <http://www.telemac0.net/marketing-50/> <
> >> > http://creativecommons.org/ns#attributionURL> <http://telemac0.net> <
> >> > http://www.telemac0.net/marketing-50/>
> >> >
> >> > Could anyone point me in de the right direction?
> >> >
> >> > Cheers,
> >> >
> >> > René Nederhand
> >> >
> >> >
> >> > [1] http://webdatacommons.org/
> >> > [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
> >> > [3] http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Help creating a custom vocabulary

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Rene,

With STANBOL-764 the indexing tool now supports importing quads.
However you will still have problems to work with the CommonCrawl data.

1. Because a lot of the data do use BNodes and those are ignored by
the Entityhub. As indexing of Bnodes was already requested several
times from I created STANBOL-765 to address this. While this will not
allow the Entityhub to handle BNodes it will allow users to specify
if/how Bnodes are converted to dereferable URIs.

2. I got a parse exception with Jena Riot in the test data file
refered by your original mail [3].

    Caused by: org.openjena.riot.RiotException: [line: 3931, col: 124]
expected "_:"
        at org.openjena.riot.ErrorHandlerLib$ErrorHandlerStd.fatal(ErrorHandlerLib.java:97)

This was caused by a literal using a country specific language tag

<http://bearhungfactory.mysinablog.com/index.php>
<http://creativecommons.org/ns#attributionName>
"\u6D2A\u96C4\u718A"@zh_tw
<http://bearhungfactory.mysinablog.com/index.php>   .

changing "@zh_tw" to "@zh" fixed the problem. This is a bug in the
used Jena version.

    com.hp.hpl.jena:jena:2.6.3
    com.hp.hpl.jena:arq:2.8.5
    com.hp.hpl.jena:tdb:0.8.7

Maybe upgrading to a newer Jena version could solve this. However this
would previously require Clerezza to adopt the newer version (see
STANBOL-621).

best
Rupert

On Tue, Oct 9, 2012 at 10:34 PM, Rene Nederhand <re...@nederhand.net> wrote:
> Hi Rupert,
>
> It would be great if we could make it possible to use CommonCrawl data even
> if we would lose some information. As I remember well, this was one of the
> requests that came up in the validation reports quite frequently. Freebase
> is an alternative.
>
> So, if this involves importing N-quads then I would appreciate adding this
> feature. No need for hurry and I am more than happy to help. Thanks!
>
> Best,
> René
>
>
>
>
>
> On Tue, Oct 9, 2012 at 10:02 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi Rene,
>>
>> The problem ist that the files of this dataset do use N-Quads and not
>> NTriples (basically SPOC (Subject, Predicate, Object, Context) instead
>> of SPO.
>>
>> I can try to add support for importing N-Quads, but because the
>> importing tool does not use named graphs you might even than lose some
>> quads ( multiple Quads with the same SPO values).
>>
>> best
>> Rupert
>>
>> On Tue, Oct 9, 2012 at 2:44 PM, Rene Nederhand <re...@nederhand.net> wrote:
>> > Hi,
>> >
>> >
>> > I am trying to create a custom vocabulary using
>> > webdatacommons<http://webdatacommons.org/>RDFa data [1]. To do this I
>> > am following this
>> > tutorial <http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>> [2].
>> >
>> > I've installed the indexer tool without any problems, editing the config
>> > file and I am now working on the mapping.txt file. However, I am clueless
>> > on what I should change in this file.
>> >
>> > An example of the data is
>> > here<http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
>> >[3]:
>> >
>> > head -n 5 ../resources/rdfdata/ccrdf.html-rdfa.sample.nq
>> > <
>> >
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>> >
>> > <http://creativecommons.org/ns#attributionURL> <http://turcanu.net> <
>> >
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>> >
>> >   .
>> > <
>> >
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>> >
>> > <http://creativecommons.org/ns#attributionName> "Sergiu Turcanu" <
>> >
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>> >
>> >   .
>> > <http://www.telemac0.net/marketing-50/> <
>> > http://purl.org/dc/elements/1.1/type> <http://purl.org/dc/dcmitype/Text>
>> <
>> > http://www.telemac0.net/marketing-50/>   .
>> > <http://www.telemac0.net/marketing-50/> <
>> > http://purl.org/dc/elements/1.1/title> "telemac0" <
>> > http://www.telemac0.net/marketing-50/>   .
>> > <http://www.telemac0.net/marketing-50/> <
>> > http://creativecommons.org/ns#attributionURL> <http://telemac0.net> <
>> > http://www.telemac0.net/marketing-50/>
>> >
>> > Could anyone point me in de the right direction?
>> >
>> > Cheers,
>> >
>> > René Nederhand
>> >
>> >
>> > [1] http://webdatacommons.org/
>> > [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
>> > [3] http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Help creating a custom vocabulary

Posted by "ajs6f@virginia.edu" <aj...@virginia.edu>.
This may or may not be immediately useful to you, but the Apache Any23 tool:

https://any23.apache.org/

will parse N-quads and output N-triples. I haven't used it for that purpose (haven't had to) but I've used it for other purposes and it works well.

---
A. Soroka
Software & Systems Engineering :: Online Library Environment
the University of Virginia Library

On Oct 9, 2012, at 4:34 PM, Rene Nederhand wrote:

> Hi Rupert,
> 
> It would be great if we could make it possible to use CommonCrawl data even
> if we would lose some information. As I remember well, this was one of the
> requests that came up in the validation reports quite frequently. Freebase
> is an alternative.
> 
> So, if this involves importing N-quads then I would appreciate adding this
> feature. No need for hurry and I am more than happy to help. Thanks!
> 
> Best,
> René
> 
> 
> 
> 
> 
> On Tue, Oct 9, 2012 at 10:02 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
> 
>> Hi Rene,
>> 
>> The problem ist that the files of this dataset do use N-Quads and not
>> NTriples (basically SPOC (Subject, Predicate, Object, Context) instead
>> of SPO.
>> 
>> I can try to add support for importing N-Quads, but because the
>> importing tool does not use named graphs you might even than lose some
>> quads ( multiple Quads with the same SPO values).
>> 
>> best
>> Rupert
>> 
>> On Tue, Oct 9, 2012 at 2:44 PM, Rene Nederhand <re...@nederhand.net> wrote:
>>> Hi,
>>> 
>>> 
>>> I am trying to create a custom vocabulary using
>>> webdatacommons<http://webdatacommons.org/>RDFa data [1]. To do this I
>>> am following this
>>> tutorial <http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>> [2].
>>> 
>>> I've installed the indexer tool without any problems, editing the config
>>> file and I am now working on the mapping.txt file. However, I am clueless
>>> on what I should change in this file.
>>> 
>>> An example of the data is
>>> here<http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
>>> [3]:
>>> 
>>> head -n 5 ../resources/rdfdata/ccrdf.html-rdfa.sample.nq
>>> <
>>> 
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>>> 
>>> <http://creativecommons.org/ns#attributionURL> <http://turcanu.net> <
>>> 
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>>> 
>>>  .
>>> <
>>> 
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>>> 
>>> <http://creativecommons.org/ns#attributionName> "Sergiu Turcanu" <
>>> 
>> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
>>> 
>>>  .
>>> <http://www.telemac0.net/marketing-50/> <
>>> http://purl.org/dc/elements/1.1/type> <http://purl.org/dc/dcmitype/Text>
>> <
>>> http://www.telemac0.net/marketing-50/>   .
>>> <http://www.telemac0.net/marketing-50/> <
>>> http://purl.org/dc/elements/1.1/title> "telemac0" <
>>> http://www.telemac0.net/marketing-50/>   .
>>> <http://www.telemac0.net/marketing-50/> <
>>> http://creativecommons.org/ns#attributionURL> <http://telemac0.net> <
>>> http://www.telemac0.net/marketing-50/>
>>> 
>>> Could anyone point me in de the right direction?
>>> 
>>> Cheers,
>>> 
>>> René Nederhand
>>> 
>>> 
>>> [1] http://webdatacommons.org/
>>> [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
>>> [3] http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
>> 
>> 
>> 
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>> 


Re: Help creating a custom vocabulary

Posted by Rene Nederhand <re...@nederhand.net>.
Hi Rupert,

It would be great if we could make it possible to use CommonCrawl data even
if we would lose some information. As I remember well, this was one of the
requests that came up in the validation reports quite frequently. Freebase
is an alternative.

So, if this involves importing N-quads then I would appreciate adding this
feature. No need for hurry and I am more than happy to help. Thanks!

Best,
René





On Tue, Oct 9, 2012 at 10:02 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Rene,
>
> The problem ist that the files of this dataset do use N-Quads and not
> NTriples (basically SPOC (Subject, Predicate, Object, Context) instead
> of SPO.
>
> I can try to add support for importing N-Quads, but because the
> importing tool does not use named graphs you might even than lose some
> quads ( multiple Quads with the same SPO values).
>
> best
> Rupert
>
> On Tue, Oct 9, 2012 at 2:44 PM, Rene Nederhand <re...@nederhand.net> wrote:
> > Hi,
> >
> >
> > I am trying to create a custom vocabulary using
> > webdatacommons<http://webdatacommons.org/>RDFa data [1]. To do this I
> > am following this
> > tutorial <http://stanbol.apache.org/docs/trunk/customvocabulary.html>
> [2].
> >
> > I've installed the indexer tool without any problems, editing the config
> > file and I am now working on the mapping.txt file. However, I am clueless
> > on what I should change in this file.
> >
> > An example of the data is
> > here<http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
> >[3]:
> >
> > head -n 5 ../resources/rdfdata/ccrdf.html-rdfa.sample.nq
> > <
> >
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
> >
> > <http://creativecommons.org/ns#attributionURL> <http://turcanu.net> <
> >
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
> >
> >   .
> > <
> >
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
> >
> > <http://creativecommons.org/ns#attributionName> "Sergiu Turcanu" <
> >
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/
> >
> >   .
> > <http://www.telemac0.net/marketing-50/> <
> > http://purl.org/dc/elements/1.1/type> <http://purl.org/dc/dcmitype/Text>
> <
> > http://www.telemac0.net/marketing-50/>   .
> > <http://www.telemac0.net/marketing-50/> <
> > http://purl.org/dc/elements/1.1/title> "telemac0" <
> > http://www.telemac0.net/marketing-50/>   .
> > <http://www.telemac0.net/marketing-50/> <
> > http://creativecommons.org/ns#attributionURL> <http://telemac0.net> <
> > http://www.telemac0.net/marketing-50/>
> >
> > Could anyone point me in de the right direction?
> >
> > Cheers,
> >
> > René Nederhand
> >
> >
> > [1] http://webdatacommons.org/
> > [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
> > [3] http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Help creating a custom vocabulary

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Rene,

The problem ist that the files of this dataset do use N-Quads and not
NTriples (basically SPOC (Subject, Predicate, Object, Context) instead
of SPO.

I can try to add support for importing N-Quads, but because the
importing tool does not use named graphs you might even than lose some
quads ( multiple Quads with the same SPO values).

best
Rupert

On Tue, Oct 9, 2012 at 2:44 PM, Rene Nederhand <re...@nederhand.net> wrote:
> Hi,
>
>
> I am trying to create a custom vocabulary using
> webdatacommons<http://webdatacommons.org/>RDFa data [1]. To do this I
> am following this
> tutorial <http://stanbol.apache.org/docs/trunk/customvocabulary.html> [2].
>
> I've installed the indexer tool without any problems, editing the config
> file and I am now working on the mapping.txt file. However, I am clueless
> on what I should change in this file.
>
> An example of the data is
> here<http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq>[3]:
>
> head -n 5 ../resources/rdfdata/ccrdf.html-rdfa.sample.nq
> <
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/>
> <http://creativecommons.org/ns#attributionURL> <http://turcanu.net> <
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/>
>   .
> <
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/>
> <http://creativecommons.org/ns#attributionName> "Sergiu Turcanu" <
> http://turcanu.net/blog/2008/07/16/honglaowai-if-there-were-no-communist-party-then-there-would-be-no-new-china/>
>   .
> <http://www.telemac0.net/marketing-50/> <
> http://purl.org/dc/elements/1.1/type> <http://purl.org/dc/dcmitype/Text> <
> http://www.telemac0.net/marketing-50/>   .
> <http://www.telemac0.net/marketing-50/> <
> http://purl.org/dc/elements/1.1/title> "telemac0" <
> http://www.telemac0.net/marketing-50/>   .
> <http://www.telemac0.net/marketing-50/> <
> http://creativecommons.org/ns#attributionURL> <http://telemac0.net> <
> http://www.telemac0.net/marketing-50/>
>
> Could anyone point me in de the right direction?
>
> Cheers,
>
> René Nederhand
>
>
> [1] http://webdatacommons.org/
> [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
> [3] http://webdatacommons.org/samples/data/ccrdf.html-rdfa.sample.nq



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen