You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Gavin Carothers (JIRA)" <ji...@apache.org> on 2011/01/12 23:30:45 UTC

[jira] Created: (JENA-28) Drop Temporary Tables in Postgres (PATCH)

Drop Temporary Tables in Postgres (PATCH)
-----------------------------------------

                 Key: JENA-28
                 URL: https://issues.apache.org/jira/browse/JENA-28
             Project: Jena
          Issue Type: Improvement
          Components: SDB
         Environment: Effects SDB PostgreSQL
            Reporter: Gavin Carothers


Current behaviour to truncate the temporary tables leaves the table objects, and thanks to MVCC in Postgres the data in those tables around for far longer then expected. This causes HUGE write loads on the database when loading a large number of quads or triples transactionally.

Fix is VERY simple:

Index: src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java
===================================================================
--- src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (revision 8420)
+++ src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (working copy)
@@ -26,7 +26,7 @@
        }
 
        public String[] getCreateTempTable() {
-               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DELETE ROWS" };
+               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DROP" };
        }


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JENA-28) Drop Temporary Tables in Postgres (PATCH)

Posted by "Damian Steer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989820#comment-12989820 ] 

Damian Steer commented on JENA-28:
----------------------------------

Sorry for not following up sooner.

Yes, a full DROP on commit is a problem. The loaders are set up so that the temporary table exists for the duration of the loader. I can think of a few solutions, however it's complicated by the fact that it is possible to manually control transactions. IIRC it's also quite hard to detect the existence of temporary tables cross-database (ah, standards...), but maybe postgres is ok?

That speed up is well worth having, however.

(Note that for large loads the loader will flush and commit at (iirc) 5000 triples -- chunkSize -- hence the problem)

> Drop Temporary Tables in Postgres (PATCH)
> -----------------------------------------
>
>                 Key: JENA-28
>                 URL: https://issues.apache.org/jira/browse/JENA-28
>             Project: Jena
>          Issue Type: Improvement
>          Components: SDB
>         Environment: Effects SDB PostgreSQL
>            Reporter: Gavin Carothers
>            Assignee: Damian Steer
>
> Current behaviour to truncate the temporary tables leaves the table objects, and thanks to MVCC in Postgres the data in those tables around for far longer then expected. This causes HUGE write loads on the database when loading a large number of quads or triples transactionally.
> Fix is VERY simple:
> Index: src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java
> ===================================================================
> --- src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (revision 8420)
> +++ src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (working copy)
> @@ -26,7 +26,7 @@
>         }
>  
>         public String[] getCreateTempTable() {
> -               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DELETE ROWS" };
> +               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DROP" };
>         }

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (JENA-28) Drop Temporary Tables in Postgres (PATCH)

Posted by "Damian Steer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981027#action_12981027 ] 

Damian Steer commented on JENA-28:
----------------------------------

Stupid question, but won't this drop the entire table? Do subsequent add operations work for you?

> Drop Temporary Tables in Postgres (PATCH)
> -----------------------------------------
>
>                 Key: JENA-28
>                 URL: https://issues.apache.org/jira/browse/JENA-28
>             Project: Jena
>          Issue Type: Improvement
>          Components: SDB
>         Environment: Effects SDB PostgreSQL
>            Reporter: Gavin Carothers
>            Assignee: Damian Steer
>
> Current behaviour to truncate the temporary tables leaves the table objects, and thanks to MVCC in Postgres the data in those tables around for far longer then expected. This causes HUGE write loads on the database when loading a large number of quads or triples transactionally.
> Fix is VERY simple:
> Index: src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java
> ===================================================================
> --- src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (revision 8420)
> +++ src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (working copy)
> @@ -26,7 +26,7 @@
>         }
>  
>         public String[] getCreateTempTable() {
> -               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DELETE ROWS" };
> +               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DROP" };
>         }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JENA-28) Drop Temporary Tables in Postgres (PATCH)

Posted by "Gavin Carothers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989824#comment-12989824 ] 

Gavin Carothers commented on JENA-28:
-------------------------------------

The speed up is HUGE. With even moderate (transactional, not bulk) insert loads it's at least one order of magnitude. The issue is really that SDB never cleans up the temporary tables at all. They just sit around forever and with MVCC in Postgres their data sits around as well being written to disk every checkpoint, forever.

 One cross database method for detecting the table would be to do a test select from the temporary table and create the table if it fails? It adds an extra select statement per chunk but that doesn't seem horrible. I don't think that would need any new SQL code in the per database bits of SDB?

> Drop Temporary Tables in Postgres (PATCH)
> -----------------------------------------
>
>                 Key: JENA-28
>                 URL: https://issues.apache.org/jira/browse/JENA-28
>             Project: Jena
>          Issue Type: Improvement
>          Components: SDB
>         Environment: Effects SDB PostgreSQL
>            Reporter: Gavin Carothers
>            Assignee: Damian Steer
>
> Current behaviour to truncate the temporary tables leaves the table objects, and thanks to MVCC in Postgres the data in those tables around for far longer then expected. This causes HUGE write loads on the database when loading a large number of quads or triples transactionally.
> Fix is VERY simple:
> Index: src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java
> ===================================================================
> --- src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (revision 8420)
> +++ src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (working copy)
> @@ -26,7 +26,7 @@
>         }
>  
>         public String[] getCreateTempTable() {
> -               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DELETE ROWS" };
> +               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DROP" };
>         }

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (JENA-28) Drop Temporary Tables in Postgres (PATCH)

Posted by "Damian Steer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JENA-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Damian Steer reassigned JENA-28:
--------------------------------

    Assignee: Damian Steer

> Drop Temporary Tables in Postgres (PATCH)
> -----------------------------------------
>
>                 Key: JENA-28
>                 URL: https://issues.apache.org/jira/browse/JENA-28
>             Project: Jena
>          Issue Type: Improvement
>          Components: SDB
>         Environment: Effects SDB PostgreSQL
>            Reporter: Gavin Carothers
>            Assignee: Damian Steer
>
> Current behaviour to truncate the temporary tables leaves the table objects, and thanks to MVCC in Postgres the data in those tables around for far longer then expected. This causes HUGE write loads on the database when loading a large number of quads or triples transactionally.
> Fix is VERY simple:
> Index: src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java
> ===================================================================
> --- src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (revision 8420)
> +++ src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (working copy)
> @@ -26,7 +26,7 @@
>         }
>  
>         public String[] getCreateTempTable() {
> -               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DELETE ROWS" };
> +               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DROP" };
>         }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JENA-28) Drop Temporary Tables in Postgres (PATCH)

Posted by "Jeff Doran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989639#comment-12989639 ] 

Jeff Doran commented on JENA-28:
--------------------------------

This patch will break bulk uploads.   
Do a bulk dump containing several large graphs.
   curl 'http://goban.oreilly.com:6060/bulk/dump' > 2011_01_21.nq


Then do a restore to another identical version of SDB, but with an empty db directory.
   curl -X POST -v -H 'Content-Type: text/x-nquads' -T 2011_01_21.nq http://sparql-stage:6060/bulk 

Tenuki log reports:
Jan 24, 2011 11:58:27 AM com.sun.jersey.spi.inject.Errors processErrorMessages
WARNING: The following warnings have been detected with resource and/or provider classes:
  WARNING: A sub-resource method, public javax.ws.rs.core.Response com.oreilly.rdf.tenuki.jaxrs.BulkLoadResource.bulkLoad(javax.servlet.http.HttpServletRequest), with URI template, "", is treated as a resource method
2011-01-24 11:58:32 [ERROR] com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes - Error in thread: Exception flushing
com.hp.hpl.jena.sdb.SDBException: Exception flushing
        at com.hp.hpl.jena.sdb.layout2.TupleLoaderBase.flush(TupleLoaderBase.java:220)
        at com.hp.hpl.jena.sdb.layout2.TupleLoaderBase.load(TupleLoaderBase.java:90)
        at com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes.updateOneTuple(LoaderTuplesNodes.java:276)
        at com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes.access$200(LoaderTuplesNodes.java:31)
        at com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes$Commiter.run(LoaderTuplesNodes.java:334)
        at java.lang.Thread.run(Thread.java:636)
Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO NNodeQuads1066955437 VALUES ('-9223352369622332414' , 'urn:x-domain:oreilly.com:product:9780735648425.EPUB' , '' , '' , '2' ) was aborted.  Call getNextException to see the cause.
        at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2569)
        at org.postgresql.core.v3.QueryExecutorImpl$1.handleError(QueryExecutorImpl.java:459)
        at org.postgresql.core.v3.QueryExecutorImpl$ErrorTrackingResultHandler.handleError(QueryExecutorImpl.java:348)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1796)
        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1055)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:398)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2708)
        at org.apache.commons.dbcp.DelegatingStatement.executeBatch(DelegatingStatement.java:297)
        at org.apache.commons.dbcp.DelegatingStatement.executeBatch(DelegatingStatement.java:297)
        at com.hp.hpl.jena.sdb.layout2.TupleLoaderBase.flush(TupleLoaderBase.java:200)
        ... 5 more

postgresql log reports:
  2011-01-24 11:49:22 PST ERROR:  relation "nnodequads658128182" does not exist at character 13
  2011-01-24 11:49:22 PST STATEMENT:  INSERT INTO NNodeQuads658128182 VALUES ($1 , $2 , $3 , $4 , $5 )


 My guess is that a single graph is arbitrarily broken at say 1000 nquads/connection and the temp table it was using is gone for subsequent loads.    

> Drop Temporary Tables in Postgres (PATCH)
> -----------------------------------------
>
>                 Key: JENA-28
>                 URL: https://issues.apache.org/jira/browse/JENA-28
>             Project: Jena
>          Issue Type: Improvement
>          Components: SDB
>         Environment: Effects SDB PostgreSQL
>            Reporter: Gavin Carothers
>            Assignee: Damian Steer
>
> Current behaviour to truncate the temporary tables leaves the table objects, and thanks to MVCC in Postgres the data in those tables around for far longer then expected. This causes HUGE write loads on the database when loading a large number of quads or triples transactionally.
> Fix is VERY simple:
> Index: src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java
> ===================================================================
> --- src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (revision 8420)
> +++ src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (working copy)
> @@ -26,7 +26,7 @@
>         }
>  
>         public String[] getCreateTempTable() {
> -               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DELETE ROWS" };
> +               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DROP" };
>         }

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (JENA-28) Drop Temporary Tables in Postgres (PATCH)

Posted by "Gavin Carothers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981033#action_12981033 ] 

Gavin Carothers commented on JENA-28:
-------------------------------------

This drops the entire temporary table at the end of the commit. Which I -think- is correct? A new add creates a new temporary table.

Everything seems to be working fine with this patch in place, far better then without it in place.


> Drop Temporary Tables in Postgres (PATCH)
> -----------------------------------------
>
>                 Key: JENA-28
>                 URL: https://issues.apache.org/jira/browse/JENA-28
>             Project: Jena
>          Issue Type: Improvement
>          Components: SDB
>         Environment: Effects SDB PostgreSQL
>            Reporter: Gavin Carothers
>            Assignee: Damian Steer
>
> Current behaviour to truncate the temporary tables leaves the table objects, and thanks to MVCC in Postgres the data in those tables around for far longer then expected. This causes HUGE write loads on the database when loading a large number of quads or triples transactionally.
> Fix is VERY simple:
> Index: src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java
> ===================================================================
> --- src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (revision 8420)
> +++ src/com/hp/hpl/jena/sdb/layout2/hash/TupleLoaderHashPGSQL.java      (working copy)
> @@ -26,7 +26,7 @@
>         }
>  
>         public String[] getCreateTempTable() {
> -               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DELETE ROWS" };
> +               return new String[] { "CREATE TEMPORARY TABLE" , "ON COMMIT DROP" };
>         }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.