You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@superset.apache.org by GitBox <gi...@apache.org> on 2018/02/17 17:30:04 UTC

[GitHub] mistercrunch commented on a change in pull request #4444: A collection of geospatial bug fixes

mistercrunch commented on a change in pull request #4444: A collection of geospatial bug fixes
URL: https://github.com/apache/incubator-superset/pull/4444#discussion_r168927614
 
 

 ##########
 File path: superset/viz.py
 ##########
 @@ -1932,9 +1935,18 @@ def process_spatial_data_obj(self, key, df):
         if spatial is None:
             raise ValueError(_('Bad spatial key'))
         if spatial.get('type') == 'latlong':
-            df[key] = list(zip(df[spatial.get('lonCol')], df[spatial.get('latCol')]))
+            df[key] = list(zip(
+                pd.to_numeric(df[spatial.get('lonCol')], errors='coerce'),
+                pd.to_numeric(df[spatial.get('latCol')], errors='coerce'),
+            ))
         elif spatial.get('type') == 'delimited':
-            df[key] = (df[spatial.get('lonlatCol')].str.split(spatial.get('delimiter')))
+
+            def tupleify(s):
+                p = Point(s)
+                return (p.latitude, p.longitude)
+
+            df[key] = df[spatial.get('lonlatCol')].apply(tupleify)
 
 Review comment:
   ```python
   In [2]: from geopy.point import Point
   
   In [3]: Point('234,239')
   Out[3]: Point(54.0, -121.0, 0.0)
   
   In [4]: %timeit Point('234,239')
   The slowest run took 4.29 times longer than the fastest. This could mean that an intermediate result is being cached.
   100000 loops, best of 3: 11.7 ?s per loop
   
   In [5]: %timeit (float(v) for v in '234,239'.split(','))
   The slowest run took 6.84 times longer than the fastest. This could mean that an intermediate result is being cached.
   1000000 loops, best of 3: 593 ns per loop
   ```
   So roughly double the time. On 1M points, that means 0.5 second which to me is fine as almost negligible compare to the network time it takes to bring that over. Note that there's probably a numpy way of doing this that would be much faster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services