You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sedona.apache.org by GitBox <gi...@apache.org> on 2023/01/05 09:35:47 UTC

[GitHub] [sedona] douglasdennis opened a new pull request, #745: [SEDONA-227] Python Serde Refactor

douglasdennis opened a new pull request, #745:
URL: https://github.com/apache/sedona/pull/745

   
   ## Did you read the Contributor Guide?
   
   - Yes, I have read [Contributor Rules](https://sedona.apache.org/community/rule/) and [Contributor Development Guide](https://sedona.apache.org/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the associated JIRA ticket is https://issues.apache.org/jira/browse/SEDONA-227. The PR name follows the format `[SEDONA-XXX] my subject`.
   
   ## What changes were proposed in this PR?
   
   A refactor of some of the python serialization/deserialization functions is proposed. This is due to the performance regression experienced from the change in serialization formats. Most of the functions were refactored to increase performance. Attempts were made to maintain readability of the code at the same time.
   
   A significant pain point involves interacting with shapely geometry. Most of the shapely methods are too slow and involve repeated calls to native code. Once shapely 2.0 is adopted by Sedona and its users, it may be a good idea to revisit this code and attempt to utilize some of the libgeos connections that shapely exposes.
   
   There are other issues in the current implementation of serde:
   1. Shapely does not respect an M coordinate yet, and so that is not supported in python at the moment.
   2. SRID is not supported in python for a similar reason. Part of Sedona's python library does add a "userData" field to shapely geometry classes at runtime. This could be utilized to store SRIDs and possibly M coordinates.
   3. This PR mostly avoids using the GeometryBuffer class. I don't believe it will be the fastest solution in the future. However, I think it's a nice class for specifically handling the GeometryCollection type.
   4. Should the serialization format allow for empty geometry within collections? They have no WKT representation, so I'm not sure that should be allowed.
   
   Finally, here are performance results between current master and this branch:
   ```
   ############################################################################################
   #     MASTER                                #              REFACTOR                        #
   ############################################################################################
   short line serialize trial:                 #        short line serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 3.1197412          #                Shapely: 1.6295438
                   Sedona: 7.6139485           #                Sedona: 1.624361
                   Factor: 1.440570551172642   #                Factor: -0.00318052205776856
                                               #
   long line serialize trial:                  #        long line serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 4.3532153          #                Shapely: 4.106511
                   Sedona: 53.1161724          #                Sedona: 20.2585903
                   Factor: 11.201595542494763  #                Factor: 3.9332852876809534
                                               #
   point serialize trial:                      #        point serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 4.5569092          #                Shapely: 4.1321635
                   Sedona: 12.9271814          #                Sedona: 3.6287594
                   Factor: 1.8368310257312128  #                Factor: -0.12182579416327549
                                               #
   small polygon serialize trial:              #        small polygon serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 1.9922023          #                Shapely: 1.6930909
                   Sedona: 15.1673936          #                Sedona: 2.5668217
                   Factor: 6.613380227499988   #                Factor: 0.5160566393688608
                                               #
   large polygon serialize trial:              #        large polygon serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 2.5921638          #                Shapely: 2.3090157
                   Sedona: 27.4987078          #                Sedona: 2.8944101
                   Factor: 9.608398975404254   #                Factor: 0.25352551738821005
                                               #
   small multipoint serialize trial:           #        small multipoint serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.0964883          #                Shapely: 0.0957348
                   Sedona: 0.8657994           #                Sedona: 0.5249629
                   Factor: 7.97310243832672    #                Factor: 4.483511742856307
                                               #
   large multipoint serialize trial:           #        large multipoint serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.2059924          #                Shapely: 0.2002637
                   Sedona: 18.6648998          #                Sedona: 13.9681053
                   Factor: 89.60965258912465   #                Factor: 68.74856301965858
                                               #
   small multilinestring serialize trial:      #        small multilinestring serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.1003702          #                Shapely: 0.0984753
                   Sedona: 1.2611056           #                Sedona: 0.5051064
                   Factor: 11.564542065274354  #                Factor: 4.129269979375539
                                               #
   large multilinestring serialize trial:      #        large multilinestring serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.1382676          #                Shapely: 0.1290179
                   Sedona: 14.4627404          #                Sedona: 6.4109233
                   Factor: 103.59963433226584  #                Factor: 48.69018485031922
                                               #
   small multipolygon serialize trial:         #        small multipolygon serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.1074735          #                Shapely: 0.107037
                   Sedona: 2.5492284           #                Sedona: 1.1849261
                   Factor: 22.71959971527865   #                Factor: 10.070247671365976
                                               #
   large multipolygon serialize trial:         #        large multipolygon serialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.218638           #                Shapely: 0.2061548
                   Sedona: 33.7390888          #                Sedona: 16.8197829
                   Factor: 153.31484371426743  #                Factor: 80.58812164451179
                                               #
   short line deserialize trial:               #        short line deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 2.6179912          #                Shapely: 2.3070263
                   Sedona: 1.8462964           #                Sedona: 1.8466384
                   Factor: -0.2947660022692208 #                Factor: -0.19955901673075854
                                               #
   long line deserialize trial:                #        long line deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 3.1905663          #                Shapely: 3.0322167
                   Sedona: 46.6516656          #                Sedona: 3.0514442
                   Factor: 13.62175087851959   #                Factor: 0.006341070544199562
                                               #
   point deserialize trial:                    #        point deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 5.8758644          #                Shapely: 5.7208507
                   Sedona: 5.2898097           #                Sedona: 5.2560053
                   Factor: -0.0997393166527124 #                Factor: -0.08125459383164815
                                               #
   small polygon deserialize trial:            #        small polygon deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 2.4844221          #                Shapely: 2.3755022
                   Sedona: 4.120936            #                Sedona: 3.0382661
                   Factor: 0.6587100879516408  #                Factor: 0.27899948903436084
                                               #
   large polygon deserialize trial:            #        large polygon deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 2.0467411          #                Shapely: 1.9710574
                   Sedona: 20.173081           #                Sedona: 3.0417363
                   Factor: 8.85619578362891    #                Factor: 0.5432002639801358
                                               #
   small multipoint deserialize trial:         #        small multipoint deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.1267147          #                Shapely: 0.1221836
                   Sedona: 0.4915774           #                Sedona: 0.4675109
                   Factor: 2.8794031000349603  #                Factor: 2.8262982920784787
                                               #
   large multipoint deserialize trial:         #        large multipoint deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.2152692          #                Shapely: 0.2037456
                   Sedona: 10.9262919          #                Sedona: 11.1691675
                   Factor: 49.75641057801116   #                Factor: 53.81918382531942
                                               #
   small multilinestring deserialize trial:    #        small multilinestring deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.1260666          #                Shapely: 0.1310238
                   Sedona: 0.4858702           #                Sedona: 0.5045361
                   Factor: 2.8540755441964802  #                Factor: 2.8507210140447765
                                               #
   large multilinestring deserialize trial:    #        large multilinestring deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.1432103          #                Shapely: 0.1417217
                   Sedona: 5.63192             #                Sedona: 5.8515723
                   Factor: 38.3262216474653    #                Factor: 40.28917660457079
                                               #
   small multipolygon deserialize trial:       #        small multipolygon deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.1392047          #                Shapely: 0.12827
                   Sedona: 1.3398053           #                Sedona: 1.1914238
                   Factor: 8.624713102359332   #                Factor: 8.288405706712403
                                               #
   large multipolygon deserialize trial:       #        large multipolygon deserialize trial:
           Total Time (seconds):               #            Total Time (seconds):
                   Shapely: 0.2063552          #                Shapely: 0.2101143
                   Sedona: 19.0151752          #                Sedona: 15.8966164
                   Factor: 91.14778789194554   #                Factor: 74.65699431214344
   ```
   Here is the python script I used to generate these results:
   ```
   from sedona.utils.geometry_serde import serialize, deserialize
   from shapely.geometry import LineString, Point, Polygon, MultiPoint, MultiLineString, MultiPolygon
   from shapely.wkb import dumps, loads
   
   import time
   
   def run_serialize_trial(geom, number_iterations, name):
       print(f"{name} serialize trial:")
   
       start_time = time.perf_counter_ns()
       for _ in range(number_iterations):
           dumps(geom)
       shapely_time = time.perf_counter_ns() - start_time
   
       start_time = time.perf_counter_ns()
       for _ in range(number_iterations):
           serialize(geom)
       sedona_time = time.perf_counter_ns() - start_time
   
       print(f"\tTotal Time (seconds):")
       print(f"\t\tShapely: {shapely_time / 1e9}\n\t\tSedona: {sedona_time / 1e9}\n\t\tFactor: {(sedona_time - shapely_time) / shapely_time}\n")
   
   def run_deserialize_trial(geom, number_iterations, name):
       print(f"{name} deserialize trial:")
   
       shapely_serialized_geom = dumps(geom)
       sedona_serialized_geom = serialize(geom)
   
       start_time = time.perf_counter_ns()
       for _ in range(number_iterations):
           loads(shapely_serialized_geom)
       shapely_time = time.perf_counter_ns() - start_time
   
       start_time = time.perf_counter_ns()
       for _ in range(number_iterations):
           deserialize(sedona_serialized_geom)
       sedona_time = time.perf_counter_ns() - start_time
   
       print(f"\tTotal Time (seconds):")
       print(f"\t\tShapely: {shapely_time / 1e9}\n\t\tSedona: {sedona_time / 1e9}\n\t\tFactor: {(sedona_time - shapely_time) / shapely_time}\n")
   
   short_line_iterations = 200_000
   short_line = LineString([(10.0, 10.0), (20.0, 20.0)])
   
   long_line_iterations = 100_000
   long_line = LineString([(float(n), float(n)) for n in range(1000)])
   
   point_iterations = 500_000
   point = Point(12.3, 45.6)
   
   small_polygon_iterations = 200_000
   small_polygon = Polygon([(10.0, 10.0), (20.0, 10.0), (20.0, 20.0), (10.0, 20.0), (10.0, 10.0)])
   
   large_polygon_iterations = 100_000
   large_polygon = Polygon(
       [(0.0, float(n * 10)) for n in range(100)]
       + [(float(n * 10), 990.0) for n in range(100)]
       + [(990.0, float(n * 10)) for n in reversed(range(100))]
       + [(float(n * 10), 0.0) for n in reversed(range(100))]
   )
   
   small_multipoint_iterations = 10_000
   small_multipoint = MultiPoint([(n, n) for n in range(3)])
   
   large_multipoint_iterations = 10_000
   large_multipoint = MultiPoint([(n, n) for n in range(100)])
   
   small_multilinestring_iterations = 10_000
   small_multilinestring = MultiLineString([[(10.0, 10.0), (20.0, 20.0)] for _ in range(3)])
   
   large_multilinestring_iterations = 5_000
   large_multilinestring = MultiLineString([[(10.0, 10.0), (20.0, 20.0)] for _ in range(100)])
   
   small_multipolygon_iterations = 10_000
   small_multipolygon = MultiPolygon([small_polygon for _ in range(3)])
   
   large_multipolygon_iterations = 5_000
   large_multipolygon = MultiPolygon([small_polygon for _ in range(100)])
   
   run_serialize_trial(short_line, short_line_iterations, "short line")
   run_serialize_trial(long_line, long_line_iterations, "long line")
   run_serialize_trial(point, point_iterations, "point")
   run_serialize_trial(small_polygon, small_polygon_iterations, "small polygon")
   run_serialize_trial(large_polygon, large_polygon_iterations, "large polygon")
   run_serialize_trial(small_multipoint, small_multipoint_iterations, "small multipoint")
   run_serialize_trial(large_multipoint, large_multipoint_iterations, "large multipoint")
   run_serialize_trial(small_multilinestring, small_multilinestring_iterations, "small multilinestring")
   run_serialize_trial(large_multilinestring, large_multilinestring_iterations, "large multilinestring")
   run_serialize_trial(small_multipolygon, small_multipolygon_iterations, "small multipolygon")
   run_serialize_trial(large_multipolygon, large_multipolygon_iterations, "large multipolygon")
   
   run_deserialize_trial(short_line, short_line_iterations, "short line")
   run_deserialize_trial(long_line, long_line_iterations, "long line")
   run_deserialize_trial(point, point_iterations, "point")
   run_deserialize_trial(small_polygon, small_polygon_iterations, "small polygon")
   run_deserialize_trial(large_polygon, large_polygon_iterations, "large polygon")
   run_deserialize_trial(small_multipoint, small_multipoint_iterations, "small multipoint")
   run_deserialize_trial(large_multipoint, large_multipoint_iterations, "large multipoint")
   run_deserialize_trial(small_multilinestring, small_multilinestring_iterations, "small multilinestring")
   run_deserialize_trial(large_multilinestring, large_multilinestring_iterations, "large multilinestring")
   run_deserialize_trial(small_multipolygon, small_multipolygon_iterations, "small multipolygon")
   run_deserialize_trial(large_multipolygon, large_multipolygon_iterations, "large multipolygon")
   ```
   
   ## How was this patch tested?
   
   A parameterized set of unit tests were added that test the serialization/deserialization round trip of a geometry object between python and spark.
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public API so no need to change the docs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] douglasdennis commented on a diff in pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

douglasdennis commented on code in PR #745:
URL: https://github.com/apache/sedona/pull/745#discussion_r1064067248


##########
python/sedona/utils/geometry_serde.py:
##########
@@ -217,70 +233,97 @@ def create_buffer_for_geom(geom_type: int, coord_type: int, size: int, num_coord
     struct.pack_into('i', buffer, 4, num_coords)
     return buffer
 
+def generate_header_bytes(geom_type: int, coord_type: int, num_coords: int) -> bytes:
+    preamble_byte = (geom_type << 4) | (coord_type << 1)
+    return struct.pack(
+        'BBBBi',
+        preamble_byte,
+        0,
+        0,
+        0,
+        num_coords
+    )
+
 
-def put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: CoordinateSequence):
+def put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: ListCoordType) -> int:
     for coord in coords:
-        offset = put_coordinate(buffer, offset, coord_type, coord)
+        struct.pack_into(CoordinateType.unpack_format(coord_type, buffer, offset, *coord))
+        offset += CoordinateType.bytes_per_coord(coord_type)
     return offset
 
 
-def put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: tuple):
-    x = coord[0]
-    y = coord[1]
-    z = coord[2] if len(coord) > 2 else math.nan
-    if coord_type == CoordinateType.XY:
-        struct.pack_into('d', buffer, offset, x)
-        struct.pack_into('d', buffer, offset + 8, y)
-        offset += 16
-    elif coord_type == CoordinateType.XYZ:
-        struct.pack_into('d', buffer, offset, x)
-        struct.pack_into('d', buffer, offset + 8, y)
-        struct.pack_into('d', buffer, offset + 16, z)
-        offset += 24
-    else:
-        # Shapely does not support M dimension for now
-        raise ValueError("Invalid coordinate type: {}".format(coord_type))
+def put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: CoordType) -> int:
+    struct.pack_into(CoordinateType.unpack_format(coord_type, buffer, offset, *coord))
+    offset += CoordinateType.bytes_per_coord(coord_type)
     return offset
 
 
-def get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) -> List[tuple]:
-    coords = []
-    bytes_per_coord = CoordinateType.bytes_per_coord(coord_type)
-    for i in range(num_coords):
-        coord = get_coordinate(buffer, offset, coord_type)
-        coords.append(coord)
-        offset += bytes_per_coord
+def get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) -> Union[np.ndarray, ListCoordType]:
+    if coord_type == CoordinateType.XYM or coord_type == CoordinateType.XYZM:
+        raise NotImplementedError("XYM or XYZM coordinates are not supported")
+
+    if num_coords < 50:
+        coords = [
+            struct.unpack_from(CoordinateType.unpack_format(coord_type), buffer, offset + (i * CoordinateType.bytes_per_coord(coord_type)))
+            for i in range(num_coords)
+        ]
+    else:
+        nums_per_coord = CoordinateType.components_per_coord(coord_type)
+        coords = np.frombuffer(buffer, np.float64, num_coords * nums_per_coord, offset).reshape((num_coords, nums_per_coord))
+
     return coords
 
 
-def get_coordinate(buffer: bytearray, offset: int, coord_type: int) -> tuple:
-    x = struct.unpack_from('d', buffer, offset)[0]
-    y = struct.unpack_from('d', buffer, offset + 8)[0]
-    # Shapely does not support M dimension for now, so we'll simply ignore them
-    if coord_type == CoordinateType.XY or coord_type == CoordinateType.XYM:
-        return x, y
-    elif coord_type == CoordinateType.XYZ or coord_type == CoordinateType.XYZM:
-        z = struct.unpack_from('d', buffer, offset + 16)[0]
-        return x, y, z
-    else:
-        raise NotImplementedError("XYM or XYZM coordinates were not supported")
+def get_coordinate(buffer: bytearray, offset: int, coord_type: int) -> CoordType:
+    # Shapely does not support M dimension for now, so raise if it was passed
+    return struct.unpack_from(CoordinateType.unpack_format(coord_type), buffer, offset)
 
 
-def aligned_offset(offset):
+def aligned_offset(offset: int) -> int:
     return (offset + 7) & ~7
 
 
-def serialize_point(geom: Point) -> bytearray:
-    coords = geom.coords
-    if not coords:
-        return create_buffer_for_geom(GeometryTypeID.POINT, CoordinateType.XY, 8, 0)
-    coord_type = CoordinateType.type_of(coords, geom.has_z)
-    bytes_per_coord = CoordinateType.bytes_per_coord(coord_type)
-    size = 8 + bytes_per_coord
-    buffer = create_buffer_for_geom(GeometryTypeID.POINT, coord_type, size, 1)
-    put_coordinates(buffer, 8, coord_type, coords)
-    return buffer
-
+def serialize_point(geom: Point) -> bytes:
+    coords = [tuple(c) for c in geom.coords]

Review Comment:
   You're right :) Changes made.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] douglasdennis commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

douglasdennis commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1381195068

   @Imbruced Took me a little longer but here are some memory profiling results. 
   TL;DR 
   Serialization Average Memory Increment Results (MiB):
   |Serializer|Point|LineString|Polygon|
   |----------|------|----------|---------|
   |WKB|7.22|13.82|13.76|
   |Master|9.98|16.2|17.7|
   |Refactor|7.1|13.6|14.98|
   
   Deserialization Average Memory Increment Results (MiB):
   |Deserializer|Point|LineString|Polygon|
   |----------|------|----------|---------|
   |WKB|28.5|39|46.8|
   |Master|34.42|45.26|52.84|
   |Refactor|34.8|45.26|52.8|
   
   Method:
   I used the `memory_profiler` package to run trials using wkb.dumps and wkb.loads, this PR's refactored serialize and deserialize functions, and the current master branch's serialize and deserialize function. For each system of serde I ran trials operating on 100,000 `Point`s, `LineString`s with five vertices, and square `Polygon`s. For each combination of serde and geometry I ran five trials and averaged the results. The script I used to run these trials is included below. Each script was ran using a command such as:
   `mprof run python python/mem_profile_serialize.py wkb point`
   
   This would give a printout like this:
   ```
   mprof: Sampling memory every 0.1s
   running new process
   Filename: python/mem_profile_serialize.py
   
   Line #    Mem usage    Increment  Occurrences   Line Contents
   =============================================================
       21     41.8 MiB     41.8 MiB           1   @profile
       22                                         def run_serialize_profile(serializer, geom):
       23     48.8 MiB      7.0 MiB      100003       x = [serializer(geom) for _ in range(100_000)]
       24     48.8 MiB      0.0 MiB           1       return x
   ```
   
   I then averaged the "Increment" amount for line 23 across all five trials. Additionally, I plotted a representative run of the serialize operations (if you click on these you should get a bigger image):
   
   | |Point|LineString|Polygon|
   |---|----|----|----|
   |WKB|![wkb_point](https://user-images.githubusercontent.com/4318895/212215809-55e7cfcd-3449-4ef5-bd9c-4c950f3d6f78.png)|![wkb_linestring](https://user-images.githubusercontent.com/4318895/212215837-6291dd6c-a2fc-4496-bb85-f3f677a7d778.png)|![wkb_polygon](https://user-images.githubusercontent.com/4318895/212215868-86a46a16-1289-421d-9e0b-859029c82172.png)|
   |Master|![master_point](https://user-images.githubusercontent.com/4318895/212215948-32067e4c-1ec2-47ac-ae1f-3ce2d6c4e7cf.png)|![master_linestring](https://user-images.githubusercontent.com/4318895/212215964-6d2a3bbd-558c-48a3-9807-d55a364a94e6.png)|![master_polygon](https://user-images.githubusercontent.com/4318895/212215974-3988e685-cb18-42f2-9143-1b2b913290f3.png)|
   |Refactor|![refactor_point](https://user-images.githubusercontent.com/4318895/212216008-ad8cc711-32f5-4aef-9f87-576e18bdff69.png)|![refactor_linestring](https://user-images.githubusercontent.com/4318895/212216034-c7b786ff-0187-4659-b73d-cc11dfd4254f.png)|![refactor_polygon](https://user-images.githubusercontent.com/4318895/212216047-41ca2be3-595b-4f02-ae8b-77d707012d43.png)|
   
   Lastly here is the serialization trial script:
   ```python
   import sys
   from typing import List
   
   from sedona.utils.geometry_serde import serialize
   from shapely.geometry import Point, LineString, Polygon
   from shapely.wkb import dumps
   
   from memory_profiler import profile
   
   
   def make_point():
       return Point(12.3, 45.6)
   
   def make_linestring():
       return LineString([(n, n) for n in range(5)])
   
   def make_polygon():
       return Polygon([(10.0, 10.0), (20.0, 10.0), (20.0, 20.0), (10.0, 20.0), (10.0, 10.0)])
   
   
   @profile
   def run_serialize_profile(serializer, geom):
       x = [serializer(geom) for _ in range(100_000)]
       return x
   
   
   def main(args: List[str]) -> int:
       if len(args) < 2:
           print("Usage: mem_profile_run.py <sedona or wkb> <point>")
           return 1
       serializer_type = args[0]
       geom_type = args[1]
   
       if geom_type == "point":
           geom = make_point()
       elif geom_type == "linestring":
           geom = make_linestring()
       elif geom_type == "polygon":
           geom = make_polygon()
       else:
           print(f"Geometry type is not supported: {geom_type}")
           return 1
   
       if serializer_type == "sedona":
           serializer = serialize
       elif serializer_type == "wkb":
           serializer = dumps
       else:
           print(f"Serializer type is not supported: {serializer_type}")
           return 1
   
       run_serialize_profile(serializer, geom)
   
   
   if __name__ == "__main__":
       exit(main(sys.argv[1:]))
   ```
   
   Deserialize trial script:
   ```python
   import sys
   from typing import List
   
   from sedona.utils.geometry_serde import serialize, deserialize
   from shapely.geometry import Point, LineString, Polygon
   from shapely.wkb import dumps, loads
   
   from memory_profiler import profile
   
   
   def make_point():
       return Point(12.3, 45.6)
   
   def make_linestring():
       return LineString([(n, n) for n in range(5)])
   
   def make_polygon():
       return Polygon([(10.0, 10.0), (20.0, 10.0), (20.0, 20.0), (10.0, 20.0), (10.0, 10.0)])
   
   
   @profile
   def run_deserialize_profile(deserializer, geom):
       x = [deserializer(geom) for _ in range(100_000)]
       return x
   
   
   def main(args: List[str]) -> int:
       if len(args) < 2:
           print("Usage: mem_profile_run.py <sedona or wkb> <point>")
           return 1
       serializer_type = args[0]
       geom_type = args[1]
   
       if geom_type == "point":
           geom = make_point()
       elif geom_type == "linestring":
           geom = make_linestring()
       elif geom_type == "polygon":
           geom = make_polygon()
       else:
           print(f"Geometry type is not supported: {geom_type}")
           return 1
   
       if serializer_type == "sedona":
           geom = serialize(geom)
           deserializer = deserialize
       elif serializer_type == "wkb":
           geom = dumps(geom)
           deserializer = loads
       else:
           print(f"Serializer type is not supported: {serializer_type}")
           return 1
   
       run_deserialize_profile(deserializer, geom)
   
   
   if __name__ == "__main__":
       exit(main(sys.argv[1:]))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] Imbruced commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

Imbruced commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1374664244

   Do you have a chance to compare the memory footprint between wkb.loads approach and the changes after 29.12 and the current version ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] douglasdennis commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

douglasdennis commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1373758835

   @Kontinuation What's your take on dropping support for empty geometry inside of multi-geometries in the new serialization format? They don't have representation in any other form so I'm not sure that supporting them provides any value. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] Imbruced commented on a diff in pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

Imbruced commented on code in PR #745:
URL: https://github.com/apache/sedona/pull/745#discussion_r1064062557


##########
python/sedona/utils/geometry_serde.py:
##########
@@ -217,70 +233,97 @@ def create_buffer_for_geom(geom_type: int, coord_type: int, size: int, num_coord
     struct.pack_into('i', buffer, 4, num_coords)
     return buffer
 
+def generate_header_bytes(geom_type: int, coord_type: int, num_coords: int) -> bytes:
+    preamble_byte = (geom_type << 4) | (coord_type << 1)
+    return struct.pack(
+        'BBBBi',
+        preamble_byte,
+        0,
+        0,
+        0,
+        num_coords
+    )
+
 
-def put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: CoordinateSequence):
+def put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: ListCoordType) -> int:
     for coord in coords:
-        offset = put_coordinate(buffer, offset, coord_type, coord)
+        struct.pack_into(CoordinateType.unpack_format(coord_type, buffer, offset, *coord))
+        offset += CoordinateType.bytes_per_coord(coord_type)
     return offset
 
 
-def put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: tuple):
-    x = coord[0]
-    y = coord[1]
-    z = coord[2] if len(coord) > 2 else math.nan
-    if coord_type == CoordinateType.XY:
-        struct.pack_into('d', buffer, offset, x)
-        struct.pack_into('d', buffer, offset + 8, y)
-        offset += 16
-    elif coord_type == CoordinateType.XYZ:
-        struct.pack_into('d', buffer, offset, x)
-        struct.pack_into('d', buffer, offset + 8, y)
-        struct.pack_into('d', buffer, offset + 16, z)
-        offset += 24
-    else:
-        # Shapely does not support M dimension for now
-        raise ValueError("Invalid coordinate type: {}".format(coord_type))
+def put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: CoordType) -> int:
+    struct.pack_into(CoordinateType.unpack_format(coord_type, buffer, offset, *coord))
+    offset += CoordinateType.bytes_per_coord(coord_type)
     return offset
 
 
-def get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) -> List[tuple]:
-    coords = []
-    bytes_per_coord = CoordinateType.bytes_per_coord(coord_type)
-    for i in range(num_coords):
-        coord = get_coordinate(buffer, offset, coord_type)
-        coords.append(coord)
-        offset += bytes_per_coord
+def get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) -> Union[np.ndarray, ListCoordType]:
+    if coord_type == CoordinateType.XYM or coord_type == CoordinateType.XYZM:
+        raise NotImplementedError("XYM or XYZM coordinates are not supported")
+
+    if num_coords < 50:

Review Comment:
   Is that some magical number ? :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] Kontinuation commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

Kontinuation commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1373383865

   LGTM. The refactored implementation looks much more pythonic while having better performance. It is also a great move to remove the expensive `has_z` calls, as shapely currently does not support M dimension.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] jiayuasu commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

jiayuasu commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1381141376

   @douglasdennis Is this PR ready to be merged? :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] Imbruced commented on a diff in pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

Imbruced commented on code in PR #745:
URL: https://github.com/apache/sedona/pull/745#discussion_r1064062431


##########
python/sedona/utils/geometry_serde.py:
##########
@@ -217,70 +233,97 @@ def create_buffer_for_geom(geom_type: int, coord_type: int, size: int, num_coord
     struct.pack_into('i', buffer, 4, num_coords)
     return buffer
 
+def generate_header_bytes(geom_type: int, coord_type: int, num_coords: int) -> bytes:
+    preamble_byte = (geom_type << 4) | (coord_type << 1)
+    return struct.pack(
+        'BBBBi',
+        preamble_byte,
+        0,
+        0,
+        0,
+        num_coords
+    )
+
 
-def put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: CoordinateSequence):
+def put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: ListCoordType) -> int:
     for coord in coords:
-        offset = put_coordinate(buffer, offset, coord_type, coord)
+        struct.pack_into(CoordinateType.unpack_format(coord_type, buffer, offset, *coord))
+        offset += CoordinateType.bytes_per_coord(coord_type)
     return offset
 
 
-def put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: tuple):
-    x = coord[0]
-    y = coord[1]
-    z = coord[2] if len(coord) > 2 else math.nan
-    if coord_type == CoordinateType.XY:
-        struct.pack_into('d', buffer, offset, x)
-        struct.pack_into('d', buffer, offset + 8, y)
-        offset += 16
-    elif coord_type == CoordinateType.XYZ:
-        struct.pack_into('d', buffer, offset, x)
-        struct.pack_into('d', buffer, offset + 8, y)
-        struct.pack_into('d', buffer, offset + 16, z)
-        offset += 24
-    else:
-        # Shapely does not support M dimension for now
-        raise ValueError("Invalid coordinate type: {}".format(coord_type))
+def put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: CoordType) -> int:
+    struct.pack_into(CoordinateType.unpack_format(coord_type, buffer, offset, *coord))
+    offset += CoordinateType.bytes_per_coord(coord_type)
     return offset
 
 
-def get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) -> List[tuple]:
-    coords = []
-    bytes_per_coord = CoordinateType.bytes_per_coord(coord_type)
-    for i in range(num_coords):
-        coord = get_coordinate(buffer, offset, coord_type)
-        coords.append(coord)
-        offset += bytes_per_coord
+def get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) -> Union[np.ndarray, ListCoordType]:
+    if coord_type == CoordinateType.XYM or coord_type == CoordinateType.XYZM:
+        raise NotImplementedError("XYM or XYZM coordinates are not supported")
+
+    if num_coords < 50:
+        coords = [
+            struct.unpack_from(CoordinateType.unpack_format(coord_type), buffer, offset + (i * CoordinateType.bytes_per_coord(coord_type)))
+            for i in range(num_coords)
+        ]
+    else:
+        nums_per_coord = CoordinateType.components_per_coord(coord_type)
+        coords = np.frombuffer(buffer, np.float64, num_coords * nums_per_coord, offset).reshape((num_coords, nums_per_coord))
+
     return coords
 
 
-def get_coordinate(buffer: bytearray, offset: int, coord_type: int) -> tuple:
-    x = struct.unpack_from('d', buffer, offset)[0]
-    y = struct.unpack_from('d', buffer, offset + 8)[0]
-    # Shapely does not support M dimension for now, so we'll simply ignore them
-    if coord_type == CoordinateType.XY or coord_type == CoordinateType.XYM:
-        return x, y
-    elif coord_type == CoordinateType.XYZ or coord_type == CoordinateType.XYZM:
-        z = struct.unpack_from('d', buffer, offset + 16)[0]
-        return x, y, z
-    else:
-        raise NotImplementedError("XYM or XYZM coordinates were not supported")
+def get_coordinate(buffer: bytearray, offset: int, coord_type: int) -> CoordType:
+    # Shapely does not support M dimension for now, so raise if it was passed
+    return struct.unpack_from(CoordinateType.unpack_format(coord_type), buffer, offset)
 
 
-def aligned_offset(offset):
+def aligned_offset(offset: int) -> int:
     return (offset + 7) & ~7
 
 
-def serialize_point(geom: Point) -> bytearray:
-    coords = geom.coords
-    if not coords:
-        return create_buffer_for_geom(GeometryTypeID.POINT, CoordinateType.XY, 8, 0)
-    coord_type = CoordinateType.type_of(coords, geom.has_z)
-    bytes_per_coord = CoordinateType.bytes_per_coord(coord_type)
-    size = 8 + bytes_per_coord
-    buffer = create_buffer_for_geom(GeometryTypeID.POINT, coord_type, size, 1)
-    put_coordinates(buffer, 8, coord_type, coords)
-    return buffer
-
+def serialize_point(geom: Point) -> bytes:
+    coords = [tuple(c) for c in geom.coords]

Review Comment:
   It can be simplified to 
   ```python
   def serialize_point_2(geom: Point) -> bytes:
       coords = [tuple(c) for c in geom.coords]
       if coords:
           # FIXME this does not handle M yet, but geom.has_z is extremely slow
           pack_format = "BBBBi" + "d" * geom._ndim
           coord_type = CoordinateType.type_of(geom)
           preamble_byte = ((GeometryTypeID.POINT << 4) | (coord_type << 1))
           return struct.pack(
               pack_format,
               preamble_byte,
               0,
               0,
               0,
               1,
               *coords[0])
       return struct.pack(
           'BBBBi',
           18,
           0,
           0,
           0,
           0
       )
   ```
   Or am I wrong ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] Imbruced commented on a diff in pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

Imbruced commented on code in PR #745:
URL: https://github.com/apache/sedona/pull/745#discussion_r1064062506


##########
python/sedona/utils/geometry_serde.py:
##########
@@ -15,22 +15,32 @@
 #  specific language governing permissions and limitations
 #  under the License.
 
+import array
 import struct
 import math
-from typing import List, Optional
+from typing import List, Optional, Tuple, Union
 
+import numpy as np
 from shapely.coords import CoordinateSequence

Review Comment:
   We are not using this
   ```python
   from shapely.coords import CoordinateSequence
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] douglasdennis commented on a diff in pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

douglasdennis commented on code in PR #745:
URL: https://github.com/apache/sedona/pull/745#discussion_r1064067552


##########
python/sedona/utils/geometry_serde.py:
##########
@@ -217,70 +233,97 @@ def create_buffer_for_geom(geom_type: int, coord_type: int, size: int, num_coord
     struct.pack_into('i', buffer, 4, num_coords)
     return buffer
 
+def generate_header_bytes(geom_type: int, coord_type: int, num_coords: int) -> bytes:
+    preamble_byte = (geom_type << 4) | (coord_type << 1)
+    return struct.pack(
+        'BBBBi',
+        preamble_byte,
+        0,
+        0,
+        0,
+        num_coords
+    )
+
 
-def put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: CoordinateSequence):
+def put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: ListCoordType) -> int:
     for coord in coords:
-        offset = put_coordinate(buffer, offset, coord_type, coord)
+        struct.pack_into(CoordinateType.unpack_format(coord_type, buffer, offset, *coord))
+        offset += CoordinateType.bytes_per_coord(coord_type)
     return offset
 
 
-def put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: tuple):
-    x = coord[0]
-    y = coord[1]
-    z = coord[2] if len(coord) > 2 else math.nan
-    if coord_type == CoordinateType.XY:
-        struct.pack_into('d', buffer, offset, x)
-        struct.pack_into('d', buffer, offset + 8, y)
-        offset += 16
-    elif coord_type == CoordinateType.XYZ:
-        struct.pack_into('d', buffer, offset, x)
-        struct.pack_into('d', buffer, offset + 8, y)
-        struct.pack_into('d', buffer, offset + 16, z)
-        offset += 24
-    else:
-        # Shapely does not support M dimension for now
-        raise ValueError("Invalid coordinate type: {}".format(coord_type))
+def put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: CoordType) -> int:
+    struct.pack_into(CoordinateType.unpack_format(coord_type, buffer, offset, *coord))
+    offset += CoordinateType.bytes_per_coord(coord_type)
     return offset
 
 
-def get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) -> List[tuple]:
-    coords = []
-    bytes_per_coord = CoordinateType.bytes_per_coord(coord_type)
-    for i in range(num_coords):
-        coord = get_coordinate(buffer, offset, coord_type)
-        coords.append(coord)
-        offset += bytes_per_coord
+def get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) -> Union[np.ndarray, ListCoordType]:
+    if coord_type == CoordinateType.XYM or coord_type == CoordinateType.XYZM:
+        raise NotImplementedError("XYM or XYZM coordinates are not supported")
+
+    if num_coords < 50:

Review Comment:
   Yup, I wrestled a leprechaun for it :)
   
   In reality, it's roughly where the performances of the two approaches crossed each other on my system so it's fairly arbitrary. In my experience geometry either has a low number of points or a lot. Unfortunately, neither approach here performs well at both scales. So, 50 seemed like a reasonable "gonna call it here" kind of threshold.
   
   I created a constant with a better name and am using that now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] douglasdennis commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

douglasdennis commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1373879622

   I stand corrected :) I should have been a little more thorough. I think empty geometries in collections should be supported then, since the standards expressly provide for them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] douglasdennis commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

douglasdennis commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1381198538

   @jiayuasu I feel good about this PR. If the others do as well then I think it can merge. This code will probably have to be returned to at a later date once shapely decides how it wants to handle M coordinates and once Shapely < 2.0 is no longer supported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] jiayuasu commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

jiayuasu commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1372700871

   @Kontinuation @Imbruced Please feel free to chime in with comments :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] Kontinuation commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

Kontinuation commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1373869602

   > @Kontinuation What's your take on dropping support for empty geometry inside of multi-geometries in the new serialization format? They don't have representation in any other form so I'm not sure that supporting them provides any value.
   
   I think it is OK to ignore empty geometries inside geometry collections since it has no impact on the topological properties of the geometries, though it may change some structural properties of the geometry collection such as indices of geometry elements (`ST_GeometryN`) and number of geometries in the geometry collection (`ST_NumGeometries`).
   
   If I understand it correctly, both WKT and WKB support empty geometries in geometry collections according to the OGC standard [Simple Feature Access - Part 1: Common Architecture](https://www.ogc.org/standards/sfa). The standard does not explicitly forbid empty geometries inside geometry collections, and these kinds of geometries could be represented according to the BNF of WKT:
   
   ```
   <multilinestring tagged text> ::= multilinestring <multilinestring text>
   <multilinestring text> ::= <empty set> | <left parent> <linestring text> {<comma> <linestring text>}* <right parent>
   <linestring text> ::= <empty set> | <left paren> <point> {<comma> <point>}* <right paren>
   <empty set> ::= EMPTY
   ```
   
   A MultiLineString containing 2 empty LineStrings could be represented as `MULTILINESTRING (EMPTY, EMPTY)`. Both JTS and Shapely could successfully parse it, though their behavior varies in various aspects:
   
   * JTS 1.19.0 parses it as a MultiLineString object containing 2 empty LineStrings.
   * Shapely 2.0 parses it as a MultiLineString object containing 2 empty LineStrings, though Shapely does not allow creating MultiLineString with empty components (`MultiLineString([LineString(), LineString()])` raises an exception).
   * Shapely 1.8 parses it as a MultiLineString containing no geometries.
   
   WKB may also represent `MULTILINESTRING (EMPTY, EMPTY)` as `b'\x00\x00\x00\x00\x05\x00\x00\x00\x02\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00'`, though some implementation failed to parse it.
   
   * JTS 1.19.0 raised a `ParseException` when parsing it
   * Shapely 2.0 and Shapely 1.8 parsed it without a problem. Shapely 2.0 parsed it as a MultiLineString containing 2 empty LineStrings while Shapely 1.8 parsed it as an empty MultiLineString.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] Imbruced commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

Imbruced commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1374663015

   Is the shapely section refering to wkb.loads and wkb.dumps methods ? What do you think about letting user to chose if the Z/M support should be turned on ? (I guess that majority of use cases are about 2D data). At the end we need sth low level written I am wondering if we can create some PR's in the shapely project ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] douglasdennis commented on pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

douglasdennis commented on PR #745:
URL: https://github.com/apache/sedona/pull/745#issuecomment-1374679892

   > Do you have a chance to compare the memory footprint between wkb.loads approach and the changes after 29.12 and the current version ?
   
   I'd love too. Unfortunately, it won't be till tomorrow or early this next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [sedona] jiayuasu merged pull request #745: [SEDONA-227] Python Serde Refactor

Posted by GitBox <gi...@apache.org>.

jiayuasu merged PR #745:
URL: https://github.com/apache/sedona/pull/745


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org