You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "Venugopal Reddy K (Jira)" <ji...@apache.org> on 2019/10/15 14:21:00 UTC

[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

     [ https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venugopal Reddy K updated CARBONDATA-3548:
------------------------------------------
    Description: 
In general, database may contain geographical location data. For instance, Telecom operators require to perform analytics based on a particular region, cell tower IDs(within a region) and/or may include geographical locations for a particular period of time. At present, Carbon do not have native support to store geographical locations/coordinates and to do filter queries based on them. Yet, longitude and latitude of coordinates can be treated as independent columns, sort hierarchically and store them.

         But, when longitude and latitude are treated independently, 2D space is linearized i.e., points in the two dimensional domain are ordered by sorting first on longitide and then on latitude. Thus, data is not ordered by geospatial proximity. Hence range queries require lot of IO operations and query performance is degraded.

        To alleviate it, we can use z-order curve to store geospatial data points. This ensures that geographically nearer points are present at same block/blocklet. This reduces the IO operations for range queries and improves query performance. Also can support polygon queries of geodata. Attached design document describes in detailed.

  was:
In general, database may contain geographical location data. For instance, Telecom operators require to perform analytics based on a particular region, cell tower IDs(within a region) and/or may include geographical locations for a particular period of time. At present, Carbon do not have native support to store geographical locations/coordinates and to do filter queries based on them. Yet, longitude and latitude of coordinates can be treated as independent columns, sort hierarchically and store them.

         But, when longitude and latitude are treated independently, 2D space is linearized i.e., points in the two dimensional domain are ordered by sorting first on longitide and then on latitude. Thus, data is not ordered by geospatial proximity. Hence range queries require lot of IO operations and query performance is degraded.

        To alleviate it, we can use z-order curve to store geospatial data points. This ensures that geographically nearer points are present at same block/blocklet. This reduces the IO operations for range queries and improves query performance.


> Support for Geospatial indexing
> -------------------------------
>
>                 Key: CARBONDATA-3548
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Venugopal Reddy K
>            Priority: Major
>         Attachments: Geospatial Index Design Doc-OpenSource.pdf
>
>
> In general, database may contain geographical location data. For instance, Telecom operators require to perform analytics based on a particular region, cell tower IDs(within a region) and/or may include geographical locations for a particular period of time. At present, Carbon do not have native support to store geographical locations/coordinates and to do filter queries based on them. Yet, longitude and latitude of coordinates can be treated as independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space is linearized i.e., points in the two dimensional domain are ordered by sorting first on longitide and then on latitude. Thus, data is not ordered by geospatial proximity. Hence range queries require lot of IO operations and query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data points. This ensures that geographically nearer points are present at same block/blocklet. This reduces the IO operations for range queries and improves query performance. Also can support polygon queries of geodata. Attached design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)