You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Justin <th...@gmail.com> on 2022/04/14 20:42:37 UTC

querying lots of quad files in block storage

Hello,

I am looking to see if Jena is a good fit for querying many billion quads
(in thousands of .nq files) sitting in block storage (like AWS S3). The .nq
files don't change. New .nq files do get added to S3, however. Also update
queries are not needed -- just selects, constructs, asks, etc.

It would be easy to iterate over all the files and produce TDB2s in a
filesystem (on AWS EBS or EFS)...

Has anyone gone down this path and have some wisdom to share?
I understand queries won't be as snappy as querying a single TDB2.

Thanks,
Justin

Re: querying lots of quad files in block storage

Posted by aj...@apache.org.

Some potentially interesting projects:

https://sansa-stack.net/

https://github.com/rdfhdt/hdt-java

RDF HDT has a Jena integration component.

Adam

On Thu, Apr 14, 2022, 4:47 PM Martynas Jusevičius <ma...@atomgraph.com>
wrote:

> There was a related thread
> https://www.mail-archive.com/users@jena.apache.org/msg18577.html
>
> On Thu, 14 Apr 2022 at 22.42, Justin <th...@gmail.com> wrote:
>
> > Hello,
> >
> > I am looking to see if Jena is a good fit for querying many billion quads
> > (in thousands of .nq files) sitting in block storage (like AWS S3). The
> .nq
> > files don't change. New .nq files do get added to S3, however. Also
> update
> > queries are not needed -- just selects, constructs, asks, etc.
> >
> > It would be easy to iterate over all the files and produce TDB2s in a
> > filesystem (on AWS EBS or EFS)...
> >
> > Has anyone gone down this path and have some wisdom to share?
> > I understand queries won't be as snappy as querying a single TDB2.
> >
> > Thanks,
> > Justin
> >
>

Re: querying lots of quad files in block storage

Posted by Martynas Jusevičius <ma...@atomgraph.com>.

There was a related thread
https://www.mail-archive.com/users@jena.apache.org/msg18577.html

On Thu, 14 Apr 2022 at 22.42, Justin <th...@gmail.com> wrote:

> Hello,
>
> I am looking to see if Jena is a good fit for querying many billion quads
> (in thousands of .nq files) sitting in block storage (like AWS S3). The .nq
> files don't change. New .nq files do get added to S3, however. Also update
> queries are not needed -- just selects, constructs, asks, etc.
>
> It would be easy to iterate over all the files and produce TDB2s in a
> filesystem (on AWS EBS or EFS)...
>
> Has anyone gone down this path and have some wisdom to share?
> I understand queries won't be as snappy as querying a single TDB2.
>
> Thanks,
> Justin
>

Re: querying lots of quad files in block storage

Posted by Andy Seaborne <an...@apache.org>.

Justin,

Are the query patterns spanning across the files?
If not, then

Another thought: filter the data in some way. keep the NQ files are the 
primary copy but if there are subsets of the data that make sense, run a 
process to extracts relevant part and build the database on that data.

     Andy

On 14/04/2022 21:42, Justin wrote:
> Hello,
> 
> I am looking to see if Jena is a good fit for querying many billion quads
> (in thousands of .nq files) sitting in block storage (like AWS S3). The .nq
> files don't change. New .nq files do get added to S3, however. Also update
> queries are not needed -- just selects, constructs, asks, etc.
> 
> It would be easy to iterate over all the files and produce TDB2s in a
> filesystem (on AWS EBS or EFS)...
> 
> Has anyone gone down this path and have some wisdom to share?
> I understand queries won't be as snappy as querying a single TDB2.
> 
> Thanks,
> Justin
>