You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@marmotta.apache.org by Blake Regalia <bl...@gmail.com> on 2016/08/29 22:39:16 UTC

Supporting Bulk Imports of Geometries for PostgreSQL

I needed marmotta-loader-kiwi to support geometry imports for postgres on
MARMOTTA-584, so I coded up a solution - however, I apologize for not
having the time right now to make clean commits/PRs for this.

Here's some jumpstart info in case someone wants to get this feature on the
timeline:

KiWiPostgresHandler uses "COPY [...] FROM STDIN (FORMAT csv)" to bulk load
triples into postgres. So, the "gvalue" column of an insert node needs to
be in a format that PostGIS understands[1]; which happens to be a strict
EWKT (strict because it does not allow spaces after geometry type, e.g.,
"POINT(" is acceptable but "POINT  (" will throw an error.)

The other challenge is setting the CRS/SRID with the serialized geometry
value - somewhere along the import process, the loader needs to replace
GeoSPARQL's CRS format with its corresponding SRID code, e.g. replacing "<
http://www.opengis.net/def/crs/OGC/1.3/CRS84>" with "SRID=4326;" at the
beginning of the WKT string. I think this should be discussed on how to
support various CRS IRI => SRID mappings.

The rest of implementation is pretty straightforward. I'm including below
the diffs of my hacked-together code. Hope this helps!


*commons/marmotta-sesame-tools/marmotta-model-vocabs/src/main/java/org/apache/marmotta/commons/vocabulary/GEOSPARQL.java
*
+package org.apache.marmotta.commons.vocabulary;
+
+import org.openrdf.model.URI;
+import org.openrdf.model.ValueFactory;
+import org.openrdf.model.impl.ValueFactoryImpl;
+
+/**
+ * Created by blake on 8/26/16.
+ */
+public class GEOSPARQL {
+    public static final String NAMESPACE = "
http://www.opengis.net/ont/geosparql#";
+    public static final String PREFIX = "geosparql";
+
+    /**
+     * The Well-Known-Text IRI of geometry literals
+     */
+    public static final URI wktLiteral;
+
+    static {
+        ValueFactory factory = ValueFactoryImpl.getInstance();
+        wktLiteral = factory.createURI(GEOSPARQL.NAMESPACE, "wktLiteral");
+    }
+}

*libraries/kiwi/kiwi-loader/src/main/java/org/apache/marmotta/kiwi/loader/generic/KiWiHandler.java*
+import org.apache.marmotta.commons.vocabulary.GEOSPARQL;
*...*
+                } else if(type.equals(GEOSPARQL.wktLiteral)) {
+                    result = connection.loadLiteral(sanitizeString(value),
rtype);
+
+                    if(result == null) {
+                        result = new
KiWiGeometryLiteral(sanitizeString(value), rtype, importDate);
+                    }
+                    else {
+                        nodesLoaded++;
+                    }


*libraries/kiwi/kiwi-loader/src/main/java/org/apache/marmotta/kiwi/loader/pgsql/PGCopyUtil.java
*
-                log.warn("geometries are not yet supported on bulk
imports");
+                KiWiGeometryLiteral l = (KiWiGeometryLiteral)n;
+                createNodeList(rowArray, l.getId(), l.getClass(),
l.getContent(), null, null, null, null, null, l.getDatatype(),
l.getLocale(), l.getCreated(), l.getContent());
*...*
+        // schema v5
         if (a.length == 12) {
-            a[11] = geom; //schema v5
+            if(geom == null) {
+                a[11] = null;
+            }
+            else {
+                a[11] = geom
+
.replaceFirst("^<http://www\\.opengis\\.net/def/crs/OGC/1\\.3/CRS84>\\s*",
"SRID=4326;")  // convert CRS => SRID
+                        .replaceFirst("\\s+(", "(");  // PostGIS does not
allow spaces after WKT geometry type
+            }
         }

[1] - http://stackoverflow.com/a/11137004/1641160

 - Blake

Re: Supporting Bulk Imports of Geometries for PostgreSQL

Posted by Sergio Fernández <wi...@apache.org>.
Thank for the patch, Blake. When you'll have time, send it as a PR to have
a better code review.

On Aug 30, 2016 12:39 AM, "Blake Regalia" <bl...@gmail.com> wrote:

> I needed marmotta-loader-kiwi to support geometry imports for postgres on
> MARMOTTA-584, so I coded up a solution - however, I apologize for not
> having the time right now to make clean commits/PRs for this.
>
> Here's some jumpstart info in case someone wants to get this feature on the
> timeline:
>
> KiWiPostgresHandler uses "COPY [...] FROM STDIN (FORMAT csv)" to bulk load
> triples into postgres. So, the "gvalue" column of an insert node needs to
> be in a format that PostGIS understands[1]; which happens to be a strict
> EWKT (strict because it does not allow spaces after geometry type, e.g.,
> "POINT(" is acceptable but "POINT  (" will throw an error.)
>
> The other challenge is setting the CRS/SRID with the serialized geometry
> value - somewhere along the import process, the loader needs to replace
> GeoSPARQL's CRS format with its corresponding SRID code, e.g. replacing "<
> http://www.opengis.net/def/crs/OGC/1.3/CRS84>" with "SRID=4326;" at the
> beginning of the WKT string. I think this should be discussed on how to
> support various CRS IRI => SRID mappings.
>
> The rest of implementation is pretty straightforward. I'm including below
> the diffs of my hacked-together code. Hope this helps!
>
>
> *commons/marmotta-sesame-tools/marmotta-model-vocabs/
> src/main/java/org/apache/marmotta/commons/vocabulary/GEOSPARQL.java
> *
> +package org.apache.marmotta.commons.vocabulary;
> +
> +import org.openrdf.model.URI;
> +import org.openrdf.model.ValueFactory;
> +import org.openrdf.model.impl.ValueFactoryImpl;
> +
> +/**
> + * Created by blake on 8/26/16.
> + */
> +public class GEOSPARQL {
> +    public static final String NAMESPACE = "
> http://www.opengis.net/ont/geosparql#";
> +    public static final String PREFIX = "geosparql";
> +
> +    /**
> +     * The Well-Known-Text IRI of geometry literals
> +     */
> +    public static final URI wktLiteral;
> +
> +    static {
> +        ValueFactory factory = ValueFactoryImpl.getInstance();
> +        wktLiteral = factory.createURI(GEOSPARQL.NAMESPACE,
> "wktLiteral");
> +    }
> +}
>
> *libraries/kiwi/kiwi-loader/src/main/java/org/apache/
> marmotta/kiwi/loader/generic/KiWiHandler.java*
> +import org.apache.marmotta.commons.vocabulary.GEOSPARQL;
> *...*
> +                } else if(type.equals(GEOSPARQL.wktLiteral)) {
> +                    result = connection.loadLiteral(
> sanitizeString(value),
> rtype);
> +
> +                    if(result == null) {
> +                        result = new
> KiWiGeometryLiteral(sanitizeString(value), rtype, importDate);
> +                    }
> +                    else {
> +                        nodesLoaded++;
> +                    }
>
>
> *libraries/kiwi/kiwi-loader/src/main/java/org/apache/
> marmotta/kiwi/loader/pgsql/PGCopyUtil.java
> *
> -                log.warn("geometries are not yet supported on bulk
> imports");
> +                KiWiGeometryLiteral l = (KiWiGeometryLiteral)n;
> +                createNodeList(rowArray, l.getId(), l.getClass(),
> l.getContent(), null, null, null, null, null, l.getDatatype(),
> l.getLocale(), l.getCreated(), l.getContent());
> *...*
> +        // schema v5
>          if (a.length == 12) {
> -            a[11] = geom; //schema v5
> +            if(geom == null) {
> +                a[11] = null;
> +            }
> +            else {
> +                a[11] = geom
> +
> .replaceFirst("^<http://www\\.opengis\\.net/def/crs/OGC/1\\.3/CRS84>\\s*",
> "SRID=4326;")  // convert CRS => SRID
> +                        .replaceFirst("\\s+(", "(");  // PostGIS does not
> allow spaces after WKT geometry type
> +            }
>          }
>
> [1] - http://stackoverflow.com/a/11137004/1641160
>
>  - Blake
>