You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sedona.apache.org by ji...@apache.org on 2021/03/08 20:01:17 UTC
[incubator-sedona] branch master updated: [SEDONA-21] Add extension classes for auto registration of UDFs/UDTs (#513)

This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-sedona.git


The following commit(s) were added to refs/heads/master by this push:
     new 34fef45  [SEDONA-21] Add extension classes for auto registration of UDFs/UDTs (#513)
34fef45 is described below

commit 34fef45c638693ca3099f5907d6214c07b656edc
Author: Alex Ott <al...@gmail.com>
AuthorDate: Mon Mar 8 21:00:54 2021 +0100

    [SEDONA-21] Add extension classes for auto registration of UDFs/UDTs (#513)
    
    With this change we can use Sedona UDFs/UDTs from Spark SQL, for example, from `spark-sql`
    or via Thrift server.  Just need to add following to command-line:
    
    ```
    --conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions \
    --conf spark.kryo.registrator=org.apache.spark.serializer.KryoSerializer \
    --conf spark.kryo.registrator=org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator
    ```
---
 docs/tutorial/geospark-sql-python.md               |  2 +
 docs/tutorial/sql-sql.md                           | 66 ++++++++++++++++++++++
 docs/tutorial/sql.md                               |  2 +
 docs/tutorial/viz.md                               |  2 +
 .../apache/sedona/sql/SedonaSqlExtensions.scala    | 32 +++++++++++
 .../sedona/viz/sql/SedonaVizExtensions.scala       | 32 +++++++++++
 6 files changed, 136 insertions(+)

diff --git a/docs/tutorial/geospark-sql-python.md b/docs/tutorial/geospark-sql-python.md
index b5f88e0..1620b98 100644
--- a/docs/tutorial/geospark-sql-python.md
+++ b/docs/tutorial/geospark-sql-python.md
@@ -29,6 +29,8 @@ from sedona.register import SedonaRegistrator
 SedonaRegistrator.registerAll(spark)
 ```
 
+You can also register functions by passing `--conf spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions` to `spark-submit` or `spark-shell`.
+
 ## Writing Application
 
 Use KryoSerializer.getName and SedonaKryoRegistrator.getName class properties to reduce memory impact.
diff --git a/docs/tutorial/sql-sql.md b/docs/tutorial/sql-sql.md
new file mode 100644
index 0000000..d21f1ce
--- /dev/null
+++ b/docs/tutorial/sql-sql.md
@@ -0,0 +1,66 @@
+The page outlines the steps to manage spatial data using SedonaSQL. ==The example code is written in SQL==.
+
+SedonaSQL supports SQL/MM Part3 Spatial SQL Standard. Detailed SedonaSQL APIs are available here: [SedonaSQL API](../api/sql/GeoSparkSQL-Overview.md)
+
+
+## Initiate Session
+
+Start `spark-sql` as following (replace `<VERSION>` with actual version, like, `1.0.1-incubating`):
+
+```sh
+park-sql --packages org.apache.sedona:sedona-python-adapter-3.0_2.12:<VERSION>,org.apache.sedona:sedona-viz-3.0_2.12:<VERSION>,org.datasyslab:geotools-wrapper:geotools-24.0 \
+  --conf spark.kryo.registrator=org.apache.spark.serializer.KryoSerializer \
+  --conf spark.kryo.registrator=org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator \
+  --conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions
+```
+
+
+This will register all User Defined Tyeps, functions and optimizations in SedonaSQL and SedonaViz.
+
+## Load data
+
+Let use data from `examples/sql`.  To load data from CSV file we need to execute two commands:
+
+
+Use the following code to load the data and create a raw DataFrame:
+
+```sql
+CREATE TABLE IF NOT EXISTS pointraw (_c0 string, _c1 string) 
+USING csv 
+OPTIONS(header='false') 
+LOCATION '<some path>/incubator-sedona/examples/sql/src/test/resources/testpoint.csv';
+
+CREATE TABLE IF NOT EXISTS polygonraw (_c0 string, _c1 string, _c2 string, _c3 string) 
+USING csv 
+OPTIONS(header='false') 
+LOCATION '<some path>/incubator-sedona/examples/sql/src/test/resources/testenvelope.csv';
+
+```
+
+## Transform the data
+
+We need to transform our point and polygon data into respective types:
+
+```sql
+CREATE OR REPLACE TEMP VIEW pointdata AS
+  SELECT ST_Point(cast(pointraw._c0 as Decimal(24,20)), cast(pointraw._c1 as Decimal(24,20))) AS pointshape
+  FROM pointraw;
+
+CREATE OR REPLACE TEMP VIEW polygondata AS
+  select ST_PolygonFromEnvelope(cast(polygonraw._c0 as Decimal(24,20)),
+        cast(polygonraw._c1 as Decimal(24,20)), cast(polygonraw._c2 as Decimal(24,20)), 
+        cast(polygonraw._c3 as Decimal(24,20))) AS polygonshape 
+  FROM polygonraw;
+```
+
+## Work with data
+
+For example, let join polygon and test data:
+
+```sql
+SELECT * from polygondata, pointdata 
+WHERE ST_Contains(polygondata.polygonshape, pointdata.pointshape) 
+      AND ST_Contains(ST_PolygonFromEnvelope(1.0,101.0,501.0,601.0), polygondata.polygonshape)
+LIMIT 5;
+```
+
diff --git a/docs/tutorial/sql.md b/docs/tutorial/sql.md
index cd2e6d4..5037dcb 100644
--- a/docs/tutorial/sql.md
+++ b/docs/tutorial/sql.md
@@ -48,6 +48,8 @@ SedonaSQLRegistrator.registerAll(sparkSession)
 
 This function will register Sedona User Defined Type, User Defined Function and optimized join query strategy.
 
+You can also register everything by passing `--conf spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions` to `spark-submit` or `spark-shell`.
+
 ## Load data from files
 
 Assume we have a WKT file, namely `usa-county.tsv`, at Path `/Download/usa-county.tsv` as follows:
diff --git a/docs/tutorial/viz.md b/docs/tutorial/viz.md
index bcd0ca7..a7a96d2 100644
--- a/docs/tutorial/viz.md
+++ b/docs/tutorial/viz.md
@@ -46,6 +46,8 @@ SedonaVizRegistrator.registerAll(sparkSession)
 
 This will register all User Defined Tyeps, functions and optimizations in SedonaSQL and SedonaViz.
 
+You can also register everything by passing `--conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions` to `spark-submit` or `spark-shell`.
+
 ## Create Spatial DataFrame
 
 There is a DataFrame as follows:
diff --git a/sql/src/main/scala/org/apache/sedona/sql/SedonaSqlExtensions.scala b/sql/src/main/scala/org/apache/sedona/sql/SedonaSqlExtensions.scala
new file mode 100644
index 0000000..b08aca3
--- /dev/null
+++ b/sql/src/main/scala/org/apache/sedona/sql/SedonaSqlExtensions.scala
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.sedona.sql
+
+import org.apache.sedona.sql.utils.SedonaSQLRegistrator
+import org.apache.spark.sql.SparkSessionExtensions
+
+
+class SedonaSqlExtensions extends (SparkSessionExtensions => Unit) {
+  def apply(e: SparkSessionExtensions): Unit = {
+    e.injectCheckRule(spark => {
+      SedonaSQLRegistrator.registerAll(spark)
+      _ => Unit
+    })
+  }
+}
\ No newline at end of file
diff --git a/viz/src/main/scala/org/apache/sedona/viz/sql/SedonaVizExtensions.scala b/viz/src/main/scala/org/apache/sedona/viz/sql/SedonaVizExtensions.scala
new file mode 100644
index 0000000..671bf3c
--- /dev/null
+++ b/viz/src/main/scala/org/apache/sedona/viz/sql/SedonaVizExtensions.scala
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.sedona.viz.sql
+
+import org.apache.sedona.viz.sql.utils.SedonaVizRegistrator
+import org.apache.spark.sql.SparkSessionExtensions
+
+
+class SedonaVizExtensions extends (SparkSessionExtensions => Unit) {
+  def apply(e: SparkSessionExtensions): Unit = {
+    e.injectCheckRule(spark => {
+      SedonaVizRegistrator.registerAll(spark)
+      _ => Unit
+    })
+  }
+}