You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by in...@apache.org on 2022/03/04 08:00:43 UTC

[carbondata] branch master updated: [CARBONDATA-4325] Update Data frame supported options in document and fix partition table creation with df spatial property

This is an automated email from the ASF dual-hosted git repository.

indhumuthumurugesh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new c840b5f  [CARBONDATA-4325] Update Data frame supported options in document and fix partition table creation with df spatial property
c840b5f is described below

commit c840b5f30b15df54778b2a83608c727d25553d7c
Author: ShreelekhyaG <sh...@yahoo.com>
AuthorDate: Mon Feb 28 14:57:34 2022 +0530

    [CARBONDATA-4325] Update Data frame supported options in document and fix partition table creation with df spatial property
    
    Why is this PR needed?
    1. Only specific properties are supported using dataframe options. Need to update the documentation.
    2. Create partition table fails with Spatial index property for carbon table created with dataframe in spark-shell.
    
    What changes were proposed in this PR?
    1. Added data frame supported properties in the documentation.
    2. Using spark-shell, the table gets created with carbon session and catalogTable.properties
    is empty here. Getting the properties from catalogTable.storage.properties to access the properties set.
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No, tested in cluster.
    
    This closes #4250
---
 docs/carbon-as-spark-datasource-guide.md               | 18 ++++++++++++++++++
 .../execution/command/management/CommonLoadUtils.scala |  3 ++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/docs/carbon-as-spark-datasource-guide.md b/docs/carbon-as-spark-datasource-guide.md
index 275d5b1..e578ed0 100644
--- a/docs/carbon-as-spark-datasource-guide.md
+++ b/docs/carbon-as-spark-datasource-guide.md
@@ -96,6 +96,24 @@ df.write.format("carbon").save("/user/person_table")
 val dfread = spark.read.format("carbon").load("/user/person_table")
 dfread.show()
 ```
+## Supported OPTIONS using dataframe
+
+In addition to the above [Supported Options](#supported-options), following properties are supported using dataframe.
+
+| Property                          | Default Value                             | Description                                                                                                                                                                                                                                                                                                                                                                       |
+|-----------------------------------|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| bucket_number                     | NA                                        | Number of buckets to be created. For more details, see [Bucketing](./ddl-of-carbondata.md#bucketing).                                                                                                                                                                                                                                                                             |
+| bucket_columns                    | NA                                        | Columns which are to be placed in buckets. For more details, see [Bucketing](./ddl-of-carbondata.md#bucketing).                                                                                                                                                                                                                                                                   |
+| streaming                         | false                                     | Whether the table is a streaming table. For more details, see [Streaming](./ddl-of-carbondata.md#streaming).                                                                                                                                                                                                                                                                      |
+| timestampformat                   | yyyy-MM-dd HH:mm:ss                       | For specifying the format of TIMESTAMP data type column. For more details, see [TimestampFormat](./ddl-of-carbondata.md#dateformattimestampformat).                                                                                                                                                                                                                               |
+| dateformat                        | yyyy-MM-dd                                | For specifying the format of DATE data type column. For more details, see [DateFormat](./ddl-of-carbondata.md#dateformattimestampformat).                                                                                                                                                                                                                                         |
+| SPATIAL_INDEX                     | NA                                        | Used to configure Spatial Index name. This name is appended to `SPATIAL_INDEX` in the subsequent sub-property configurations. `xxx` in the below sub-properties refer to index name. Generated spatial index column is not allowed in any properties except in `SORT_COLUMNS` table property.For more details, see [Spatial Index](./spatial-index-guide).                        |
+| SPATIAL_INDEX.xxx.type            | NA                                        | Type of algorithm for processing spatial data. Currently, supports 'geohash' and 'geosot'.                                                                                                                                                                                                                                                                                        |
+| SPATIAL_INDEX.xxx.sourcecolumns   | NA                                        | longitude and latitude column names as in the table. These columns are used to generate index value for each row.                                                                                                                                                                                                                                                                 |
+| SPATIAL_INDEX.xxx.originLatitude  | NA                                        | Latitude of origin.                                                                                                                                                                                                                                                                                                                                                               |
+| SPATIAL_INDEX.xxx.gridSize        | NA                                        | Grid size of raster data in metres. Currently, spatial index supports raster data.                                                                                                                                                                                                                                                                                                |
+| SPATIAL_INDEX.xxx.conversionRatio | NA                                        | Conversion factor. It allows user to translate longitude and latitude to long. For example, if the data to load is longitude = 13.123456, latitude = 101.12356. User can configure conversion ratio sub-property value as 1000000, and change data to load as longitude = 13123456 and latitude = 10112356. Operations on long is much faster compared to floating-point numbers. |
+| SPATIAL_INDEX.xxx.class           | NA                                        | Optional user custom implementation class. Value is fully qualified class name.                                                                                                                                                                                                                                                                                                   |
 
 Reference : [list of carbon properties](./configuration-parameters.md)
 
diff --git a/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala b/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
index bdb3054..5cbdb3b 100644
--- a/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
+++ b/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
@@ -928,7 +928,8 @@ object CommonLoadUtils {
               .map(columnName => columnName.toLowerCase())
             attributes.filterNot(a => staticPartCols.contains(a.name.toLowerCase))
           }
-          val spatialProperty = catalogTable.properties.get(CarbonCommonConstants.SPATIAL_INDEX)
+          val spatialProperty = catalogTable.storage
+            .properties.get(CarbonCommonConstants.SPATIAL_INDEX)
           // For spatial table, dataframe attributes will not contain geoId column.
           val isSpatialTable = spatialProperty.isDefined && spatialProperty.nonEmpty &&
                                    dfAttributes.length + 1 == expectedColumns.size