You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Andrea Di Menna <ni...@gmail.com> on 2012/11/16 14:18:32 UTC
Stanbol indexing tool
Hi,
I have a question regarding the different phases of the indexing
process using the indexing tool bundled in Stanbol.
If I am not wrong Stanbol will first create a TDB and after that will
build a Solr index.
The first part of the process seems slower on my machine w.r.t. to
loading triples in a TDB using directly tdbloader2 (Note: I am using
the latest available version of Jena when running tdloader2 standalone
- namely 2.7.4).
Is there any way to skip the TDB creation and jump to the second part,
so that I can create the TDB using the latest Jena available?
Cheers,
Andrea
Re: Stanbol indexing tool
Posted by Andrea Di Menna <ni...@gmail.com>.
Hi Rupert,
I have created a new index and everything seems to work ok.
I guess no changes in the binary data format have occurred between
2.6.3 and 2.7.4.
It took about 78 mins (TDB) + 35 mins (Solr) to process ~ 80M triples.
Moreover I did not have any memory issue w.r.t. to completing the
whole process using the EntityHub indexing tool.
Usually I had to restart the process at least twice because of
OutOfMemory exceptions.
Considering the fact I am using a machine with 16GB it seems there is
something wrong...
cheers
Andrea
2012/11/16 Rupert Westenthaler <ru...@gmail.com>:
> The TDB database is located under
>
> {indexing-working-dir}/indexing/resources/tdb
>
> If you do have an TDB store with the required data, than you can
> provide them under that directory. Just make sure that the
>
> {indexing-working-dir}/indexing/resources/rdfdata
>
> folder is empty when you start the tool. Otherwise the RDF files in
> that folder would get imported.
>
> On Fri, Nov 16, 2012 at 2:18 PM, Andrea Di Menna <ni...@gmail.com> wrote:
>> The first part of the process seems slower on my machine w.r.t. to
>> loading triples in a TDB using directly tdbloader2 (Note: I am using
>> the latest available version of Jena when running tdloader2 standalone
>> - namely 2.7.4).
>
> Yes the indexing tool uses
>
> com.hp.hpl.jena:jena:2.6.3
> com.hp.hpl.jena:arq:2.8.5
> com.hp.hpl.jena:tdb:0.8.7
>
> but you could still try to use your datastore. Maybe they have not
> changed the binary format of the files.
>
> If not let me know and I will try to update the Jena Version used by
> the Indexing Tool
>
> best
> Rupert
>
> --
> | Rupert Westenthaler rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11 ++43-699-11108907
> | A-5500 Bischofshofen
Re: Stanbol indexing tool
Posted by Rupert Westenthaler <ru...@gmail.com>.
The TDB database is located under
{indexing-working-dir}/indexing/resources/tdb
If you do have an TDB store with the required data, than you can
provide them under that directory. Just make sure that the
{indexing-working-dir}/indexing/resources/rdfdata
folder is empty when you start the tool. Otherwise the RDF files in
that folder would get imported.
On Fri, Nov 16, 2012 at 2:18 PM, Andrea Di Menna <ni...@gmail.com> wrote:
> The first part of the process seems slower on my machine w.r.t. to
> loading triples in a TDB using directly tdbloader2 (Note: I am using
> the latest available version of Jena when running tdloader2 standalone
> - namely 2.7.4).
Yes the indexing tool uses
com.hp.hpl.jena:jena:2.6.3
com.hp.hpl.jena:arq:2.8.5
com.hp.hpl.jena:tdb:0.8.7
but you could still try to use your datastore. Maybe they have not
changed the binary format of the files.
If not let me know and I will try to update the Jena Version used by
the Indexing Tool
best
Rupert
--
| Rupert Westenthaler rupert.westenthaler@gmail.com
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen