You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Martin Gütlein <gu...@uni-mainz.de> on 2016/06/07 15:22:34 UTC

how to store a numeric table as triples in our jena database

Hi,

we would like to store some tables with double values in our triple 
store. At the moment we are doing this by creating row and entry 
resources (see code below).
However, it is rather time and space consuming for medium sized tables. 
E.g. a 500x500 table takes 20seconds and the the db size is 156MB. For 
our project we plan to save larger/more data (and add some more semantic 
info).

Is there a more efficient way to do that?

Kind regards,
Martin

Here is our example code:

         int cols = 500;
         int rows = 500;

         double myTable[][] = new double[rows][cols];
         for (int r = 0; r < rows; r++)
             for (int c = 0; c < cols; c++)
                 myTable[r][c] = new Random().nextDouble();

         String directory = "myDB/tdb";
         Dataset dataset = TDBFactory.createDataset(directory);

         Model model = dataset.getDefaultModel();
         String NS = "http://my-namespace/test/";

         Property hasRow = model.createProperty(NS + "hasRow");
         Property rowIndex = model.createProperty(NS + "rowIndex");
         Property hasEntry = model.createProperty(NS + "hasEntry");
         Property colIndex = model.createProperty(NS + "colIndex");
         Property value = model.createProperty(NS + "value");

         Resource table = model.createResource(NS + "table/" + 
UUID.randomUUID());

         for (int r = 0; r < rows; r++)
         {
             Resource row = model.createResource(NS + "row/" + 
UUID.randomUUID());
             table.addProperty(hasRow, row);
             row.addProperty(rowIndex, r + "", XSDDatatype.XSDint);

             for (int c = 0; c < cols; c++)
             {
                 Resource entry = model.createResource(NS + "entry/" + 
UUID.randomUUID());
                 row.addProperty(hasEntry, entry);
                 entry.addProperty(colIndex, c + "", XSDDatatype.XSDint);
                 entry.addProperty(value, myTable[r][c] + "", 
XSDDatatype.XSDdouble);
             }
         }

         dataset.end();
         dataset.close();


-- 
Dr. Martin G�tlein
Phone:
+49 (0)6131 39 23336 (office)
+49 (0)177 623 9499 (mobile)
Email:
guetlein@uni-mainz.de


Re: how to store a numeric table as triples in our jena database

Posted by Martin Gütlein <gu...@posteo.de>.
Hi,

thanks for pointing at the Data Cube vocabulary. That's basically very 
similar to what we have been doing, at least we were not on the wrong 
track then.
Currently, its seems to just about be fast enough for the amount of data 
we have.

Regards,
Martin

Am 08.06.2016 um 09:54 schrieb Dave Reynolds:
> Hi,
>
> There's no fundamental problem with storing numeric data in an RDF 
> database, even though that's not the design centre. After all the Data 
> Cube vocabulary is basically just a way to do that and it gets heavy 
> use for statistical and sensor data.
>
> However, writing ~1MTriples can take ~10s in a live TDB. Bulk load to 
> an empty database is faster.
>
> For real use you should wrap your write block in a 
> dataset.begin/commit block, though that doesn't make much difference 
> to the timing.
>
> Dave
>
> On 08/06/16 08:00, Martin G�tlein wrote:
>> Thanks for your response.
>> We are developing a system that includes prediction models in the
>> cheminformatics domain. So there is a lot of (cheminformatics) data that
>> is stored in our database and fits very well into it. We are not sure
>> about the numeric data below, though. But we want to have it all in a
>> single database (and avoid storing sth on the file system or in an
>> additional sql database, if possible).
>>
>> Kind regards,
>> Martin
>>
>> Am 07.06.2016 um 21:54 schrieb A. Soroka:
>>> This seems like a bit of an odd usage for an RDF store. Could you
>>> explain a bit more why you want to put this data into Jena? Are you
>>> already using Jena for other data in your work?
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>>> On Jun 7, 2016, at 11:22 AM, Martin G�tlein <gu...@uni-mainz.de>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> we would like to store some tables with double values in our triple
>>>> store. At the moment we are doing this by creating row and entry
>>>> resources (see code below).
>>>> However, it is rather time and space consuming for medium sized
>>>> tables. E.g. a 500x500 table takes 20seconds and the the db size is
>>>> 156MB. For our project we plan to save larger/more data (and add some
>>>> more semantic info).
>>>>
>>>> Is there a more efficient way to do that?
>>>>
>>>> Kind regards,
>>>> Martin
>>>>
>>>> Here is our example code:
>>>>
>>>>         int cols = 500;
>>>>         int rows = 500;
>>>>
>>>>         double myTable[][] = new double[rows][cols];
>>>>         for (int r = 0; r < rows; r++)
>>>>             for (int c = 0; c < cols; c++)
>>>>                 myTable[r][c] = new Random().nextDouble();
>>>>
>>>>         String directory = "myDB/tdb";
>>>>         Dataset dataset = TDBFactory.createDataset(directory);
>>>>
>>>>         Model model = dataset.getDefaultModel();
>>>>         String NS = "http://my-namespace/test/";
>>>>
>>>>         Property hasRow = model.createProperty(NS + "hasRow");
>>>>         Property rowIndex = model.createProperty(NS + "rowIndex");
>>>>         Property hasEntry = model.createProperty(NS + "hasEntry");
>>>>         Property colIndex = model.createProperty(NS + "colIndex");
>>>>         Property value = model.createProperty(NS + "value");
>>>>
>>>>         Resource table = model.createResource(NS + "table/" +
>>>> UUID.randomUUID());
>>>>
>>>>         for (int r = 0; r < rows; r++)
>>>>         {
>>>>             Resource row = model.createResource(NS + "row/" +
>>>> UUID.randomUUID());
>>>>             table.addProperty(hasRow, row);
>>>>             row.addProperty(rowIndex, r + "", XSDDatatype.XSDint);
>>>>
>>>>             for (int c = 0; c < cols; c++)
>>>>             {
>>>>                 Resource entry = model.createResource(NS + "entry/" +
>>>> UUID.randomUUID());
>>>>                 row.addProperty(hasEntry, entry);
>>>>                 entry.addProperty(colIndex, c + "", 
>>>> XSDDatatype.XSDint);
>>>>                 entry.addProperty(value, myTable[r][c] + "",
>>>> XSDDatatype.XSDdouble);
>>>>             }
>>>>         }
>>>>
>>>>         dataset.end();
>>>>         dataset.close();
>>>>
>>>>
>>>> -- 
>>>> Dr. Martin G�tlein
>>>> Phone:
>>>> +49 (0)6131 39 23336 (office)
>>>> Email:
>>>> guetlein@uni-mainz.de
>>>>
>>
>>


Re: how to store a numeric table as triples in our jena database

Posted by Dave Reynolds <da...@gmail.com>.
Hi,

There's no fundamental problem with storing numeric data in an RDF 
database, even though that's not the design centre. After all the Data 
Cube vocabulary is basically just a way to do that and it gets heavy use 
for statistical and sensor data.

However, writing ~1MTriples can take ~10s in a live TDB. Bulk load to an 
empty database is faster.

For real use you should wrap your write block in a dataset.begin/commit 
block, though that doesn't make much difference to the timing.

Dave

On 08/06/16 08:00, Martin G�tlein wrote:
> Thanks for your response.
> We are developing a system that includes prediction models in the
> cheminformatics domain. So there is a lot of (cheminformatics) data that
> is stored in our database and fits very well into it. We are not sure
> about the numeric data below, though. But we want to have it all in a
> single database (and avoid storing sth on the file system or in an
> additional sql database, if possible).
>
> Kind regards,
> Martin
>
> Am 07.06.2016 um 21:54 schrieb A. Soroka:
>> This seems like a bit of an odd usage for an RDF store. Could you
>> explain a bit more why you want to put this data into Jena? Are you
>> already using Jena for other data in your work?
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>>> On Jun 7, 2016, at 11:22 AM, Martin G�tlein <gu...@uni-mainz.de>
>>> wrote:
>>>
>>> Hi,
>>>
>>> we would like to store some tables with double values in our triple
>>> store. At the moment we are doing this by creating row and entry
>>> resources (see code below).
>>> However, it is rather time and space consuming for medium sized
>>> tables. E.g. a 500x500 table takes 20seconds and the the db size is
>>> 156MB. For our project we plan to save larger/more data (and add some
>>> more semantic info).
>>>
>>> Is there a more efficient way to do that?
>>>
>>> Kind regards,
>>> Martin
>>>
>>> Here is our example code:
>>>
>>>         int cols = 500;
>>>         int rows = 500;
>>>
>>>         double myTable[][] = new double[rows][cols];
>>>         for (int r = 0; r < rows; r++)
>>>             for (int c = 0; c < cols; c++)
>>>                 myTable[r][c] = new Random().nextDouble();
>>>
>>>         String directory = "myDB/tdb";
>>>         Dataset dataset = TDBFactory.createDataset(directory);
>>>
>>>         Model model = dataset.getDefaultModel();
>>>         String NS = "http://my-namespace/test/";
>>>
>>>         Property hasRow = model.createProperty(NS + "hasRow");
>>>         Property rowIndex = model.createProperty(NS + "rowIndex");
>>>         Property hasEntry = model.createProperty(NS + "hasEntry");
>>>         Property colIndex = model.createProperty(NS + "colIndex");
>>>         Property value = model.createProperty(NS + "value");
>>>
>>>         Resource table = model.createResource(NS + "table/" +
>>> UUID.randomUUID());
>>>
>>>         for (int r = 0; r < rows; r++)
>>>         {
>>>             Resource row = model.createResource(NS + "row/" +
>>> UUID.randomUUID());
>>>             table.addProperty(hasRow, row);
>>>             row.addProperty(rowIndex, r + "", XSDDatatype.XSDint);
>>>
>>>             for (int c = 0; c < cols; c++)
>>>             {
>>>                 Resource entry = model.createResource(NS + "entry/" +
>>> UUID.randomUUID());
>>>                 row.addProperty(hasEntry, entry);
>>>                 entry.addProperty(colIndex, c + "", XSDDatatype.XSDint);
>>>                 entry.addProperty(value, myTable[r][c] + "",
>>> XSDDatatype.XSDdouble);
>>>             }
>>>         }
>>>
>>>         dataset.end();
>>>         dataset.close();
>>>
>>>
>>> --
>>> Dr. Martin G�tlein
>>> Phone:
>>> +49 (0)6131 39 23336 (office)
>>> Email:
>>> guetlein@uni-mainz.de
>>>
>
>

Re: how to store a numeric table as triples in our jena database

Posted by Martin Gütlein <gu...@uni-mainz.de>.
Thanks for your response.
We are developing a system that includes prediction models in the 
cheminformatics domain. So there is a lot of (cheminformatics) data that 
is stored in our database and fits very well into it. We are not sure 
about the numeric data below, though. But we want to have it all in a 
single database (and avoid storing sth on the file system or in an 
additional sql database, if possible).

Kind regards,
Martin

Am 07.06.2016 um 21:54 schrieb A. Soroka:
> This seems like a bit of an odd usage for an RDF store. Could you explain a bit more why you want to put this data into Jena? Are you already using Jena for other data in your work?
>
> ---
> A. Soroka
> The University of Virginia Library
>
>> On Jun 7, 2016, at 11:22 AM, Martin G�tlein <gu...@uni-mainz.de> wrote:
>>
>> Hi,
>>
>> we would like to store some tables with double values in our triple store. At the moment we are doing this by creating row and entry resources (see code below).
>> However, it is rather time and space consuming for medium sized tables. E.g. a 500x500 table takes 20seconds and the the db size is 156MB. For our project we plan to save larger/more data (and add some more semantic info).
>>
>> Is there a more efficient way to do that?
>>
>> Kind regards,
>> Martin
>>
>> Here is our example code:
>>
>>         int cols = 500;
>>         int rows = 500;
>>
>>         double myTable[][] = new double[rows][cols];
>>         for (int r = 0; r < rows; r++)
>>             for (int c = 0; c < cols; c++)
>>                 myTable[r][c] = new Random().nextDouble();
>>
>>         String directory = "myDB/tdb";
>>         Dataset dataset = TDBFactory.createDataset(directory);
>>
>>         Model model = dataset.getDefaultModel();
>>         String NS = "http://my-namespace/test/";
>>
>>         Property hasRow = model.createProperty(NS + "hasRow");
>>         Property rowIndex = model.createProperty(NS + "rowIndex");
>>         Property hasEntry = model.createProperty(NS + "hasEntry");
>>         Property colIndex = model.createProperty(NS + "colIndex");
>>         Property value = model.createProperty(NS + "value");
>>
>>         Resource table = model.createResource(NS + "table/" + UUID.randomUUID());
>>
>>         for (int r = 0; r < rows; r++)
>>         {
>>             Resource row = model.createResource(NS + "row/" + UUID.randomUUID());
>>             table.addProperty(hasRow, row);
>>             row.addProperty(rowIndex, r + "", XSDDatatype.XSDint);
>>
>>             for (int c = 0; c < cols; c++)
>>             {
>>                 Resource entry = model.createResource(NS + "entry/" + UUID.randomUUID());
>>                 row.addProperty(hasEntry, entry);
>>                 entry.addProperty(colIndex, c + "", XSDDatatype.XSDint);
>>                 entry.addProperty(value, myTable[r][c] + "", XSDDatatype.XSDdouble);
>>             }
>>         }
>>
>>         dataset.end();
>>         dataset.close();
>>
>>
>> -- 
>> Dr. Martin G�tlein
>> Phone:
>> +49 (0)6131 39 23336 (office)
>> Email:
>> guetlein@uni-mainz.de
>>


-- 
Dr. Martin G�tlein
Phone:
+49 (0)6131 39 23336 (office)
Email:
guetlein@uni-mainz.de


Re: how to store a numeric table as triples in our jena database

Posted by "A. Soroka" <aj...@virginia.edu>.
This seems like a bit of an odd usage for an RDF store. Could you explain a bit more why you want to put this data into Jena? Are you already using Jena for other data in your work?

---
A. Soroka
The University of Virginia Library

> On Jun 7, 2016, at 11:22 AM, Martin Gütlein <gu...@uni-mainz.de> wrote:
> 
> Hi,
> 
> we would like to store some tables with double values in our triple store. At the moment we are doing this by creating row and entry resources (see code below).
> However, it is rather time and space consuming for medium sized tables. E.g. a 500x500 table takes 20seconds and the the db size is 156MB. For our project we plan to save larger/more data (and add some more semantic info).
> 
> Is there a more efficient way to do that?
> 
> Kind regards,
> Martin
> 
> Here is our example code:
> 
>        int cols = 500;
>        int rows = 500;
> 
>        double myTable[][] = new double[rows][cols];
>        for (int r = 0; r < rows; r++)
>            for (int c = 0; c < cols; c++)
>                myTable[r][c] = new Random().nextDouble();
> 
>        String directory = "myDB/tdb";
>        Dataset dataset = TDBFactory.createDataset(directory);
> 
>        Model model = dataset.getDefaultModel();
>        String NS = "http://my-namespace/test/";
> 
>        Property hasRow = model.createProperty(NS + "hasRow");
>        Property rowIndex = model.createProperty(NS + "rowIndex");
>        Property hasEntry = model.createProperty(NS + "hasEntry");
>        Property colIndex = model.createProperty(NS + "colIndex");
>        Property value = model.createProperty(NS + "value");
> 
>        Resource table = model.createResource(NS + "table/" + UUID.randomUUID());
> 
>        for (int r = 0; r < rows; r++)
>        {
>            Resource row = model.createResource(NS + "row/" + UUID.randomUUID());
>            table.addProperty(hasRow, row);
>            row.addProperty(rowIndex, r + "", XSDDatatype.XSDint);
> 
>            for (int c = 0; c < cols; c++)
>            {
>                Resource entry = model.createResource(NS + "entry/" + UUID.randomUUID());
>                row.addProperty(hasEntry, entry);
>                entry.addProperty(colIndex, c + "", XSDDatatype.XSDint);
>                entry.addProperty(value, myTable[r][c] + "", XSDDatatype.XSDdouble);
>            }
>        }
> 
>        dataset.end();
>        dataset.close();
> 
> 
> -- 
> Dr. Martin Gütlein
> Phone:
> +49 (0)6131 39 23336 (office)
> +49 (0)177 623 9499 (mobile)
> Email:
> guetlein@uni-mainz.de
>