You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by jatin9896 <gi...@git.apache.org> on 2018/01/15 13:01:00 UTC

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

GitHub user jatin9896 opened a pull request:

    https://github.com/apache/carbondata/pull/1805

    [CARBONDATA-1827] S3 Carbon Implementation

    1) Provide support for s3 in carbondata.
    2) Added S3Example to create store on s3.
    3) Added S3CSVExample to load csv from s3.
     
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed? NO
     
     - [ ] Any backward compatibility impacted? NO
     
     - [ ] Document update required? NO
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required? 
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           Added Examples to test the functionality
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jatin9896/incubator-carbondata s3-carbon

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1805.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1805
    
----
commit bd5b90cfefcfa25c941c104630dbc9e9ed2b150b
Author: SangeetaGulia <sa...@...>
Date:   2017-09-21T09:26:26Z

    Added S3 implementation and TestCases

commit d3d374ce7a82662e1fbb6b1d0b81bfaaa3a22cc1
Author: Jatin <ja...@...>
Date:   2017-11-29T07:24:48Z

    Removed S3CarbonFile and added append functionality

commit a79245a73103b58c997e20b47b7c51d91dd2e8ad
Author: Jatin <ja...@...>
Date:   2018-01-15T07:43:29Z

    refactored examples

----


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161675876
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length != 4) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(args(3))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql("ALTER table carbon_table compact 'MINOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql(
    --- End diff --
    
    remove this to make it simpler


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162088305
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    +   * 3. With the indexing feature of carbondata, the data read from object based storage is minimum,
    +   *    thus providing both high performance analytic and low cost storage
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master" "s3-endpoint"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 4 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master> <s3-endpoint>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(args(3))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("INFO")
    --- End diff --
    
    change to WARN, it is printing too many


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2978/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162305106
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    +      args(3)
    +    }
    +    else {
    +      ""
    +    }
    +  }
    +
    +  def getSparkMaster(args: Array[String]): String = {
    +    if (args.length >= 4) {
    +      if (args.length == 5) {
    +        args(4)
    +      }
    +      else if (args(3).contains("spark:") || args(3).contains("mesos:")) {
    --- End diff --
    
    `else if` should be same line as `}`


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1574/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    It is merged, please close it manually


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1714/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1598/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161667809
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length != 4) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<carbon store location> <spark-master>")
    --- End diff --
    
    arg(2) is table path, not store path, right?


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1661/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2801/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1615/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by anubhav100 <gi...@git.apache.org>.

Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    retest this please


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161697634
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -168,6 +168,14 @@
     
       public static final String S3A_PREFIX = "s3a://";
     
    +  public static final String S3N_ACCESS_KEY = "fs.s3n.awsAccessKeyId";
    --- End diff --
    
    Added comment.


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162306737
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    +      args(3)
    +    }
    +    else {
    +      ""
    +    }
    +  }
    +
    +  def getSparkMaster(args: Array[String]): String = {
    +    if (args.length >= 4) {
    +      if (args.length == 5) {
    --- End diff --
    
    sure


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162090325
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    --- End diff --
    
    remove this line


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Can you show an example of how to give <table_path> in S3Example using AWS S3? I tried with Huawei OBS, it has two problems:
    1. I need to set the endpoint conf in main function by `spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "obs.cn-north-1.myhwclouds.com")
    ` manually
    2. After I set the conf, there is an exception thrown when running it: 
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/http/pool/ConnPoolControl


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2892/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161698430
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3CsvExample.scala ---
    @@ -0,0 +1,111 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3CsvExample {
    +
    +  /**
    +   * This example demonstrate to create local store and load data from CSV files on S3
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 path to csv" "spark-master"
    +   */
    +
    --- End diff --
    
    done


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161697900
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length != 4) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(args(3))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql("ALTER table carbon_table compact 'MINOR'")
    --- End diff --
    
    Added


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162311496
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    +      args(3)
    +    }
    +    else {
    +      ""
    +    }
    +  }
    +
    +  def getSparkMaster(args: Array[String]): String = {
    +    if (args.length >= 4) {
    +      if (args.length == 5) {
    --- End diff --
    
    done


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1703/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1566/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2863/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    LGTM


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2948/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2895/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2820/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2810/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2807/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162305664
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    --- End diff --
    
    it might happen that at args(4) be spark master.


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161675648
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length != 4) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(args(3))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql("ALTER table carbon_table compact 'MINOR'")
    --- End diff --
    
    Add a comment: 
    > // Use compaction command to merge segments or small files in object based storage, this can be done periodically. 


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2936/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161676126
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3CsvExample.scala ---
    @@ -0,0 +1,111 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3CsvExample {
    +
    +  /**
    +   * This example demonstrate to create local store and load data from CSV files on S3
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 path to csv" "spark-master"
    +   */
    +
    --- End diff --
    
    remove empty line


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162090911
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    +   * 3. With the indexing feature of carbondata, the data read from object based storage is minimum,
    +   *    thus providing both high performance analytic and low cost storage
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master" "s3-endpoint"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 4 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master> <s3-endpoint>")
    --- End diff --
    
    move <spark-master> as the last parameter, it should be optional, default value is 'local'


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162311525
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    +      args(3)
    +    }
    +    else {
    --- End diff --
    
    done


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162089977
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    +   * 3. With the indexing feature of carbondata, the data read from object based storage is minimum,
    +   *    thus providing both high performance analytic and low cost storage
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master" "s3-endpoint"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 4 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master> <s3-endpoint>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(args(3))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MINOR'")
    --- End diff --
    
    change this to MAJOR, and remove subsequent command, to make it simpler


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by anubhav100 <gi...@git.apache.org>.

Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    retest this please


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1724/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161675090
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    --- End diff --
    
    remove unnecssary property setting, all these are not required


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161669078
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length != 4) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<carbon store location> <spark-master>")
    --- End diff --
    
    you are right. ok I will change it to tablepath.


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161697822
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3CsvExample.scala ---
    @@ -0,0 +1,111 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3CsvExample {
    +
    +  /**
    +   * This example demonstrate to create local store and load data from CSV files on S3
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 path to csv" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    --- End diff --
    
    Removed all properties.


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2906/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2912/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162089803
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    +   * 3. With the indexing feature of carbondata, the data read from object based storage is minimum,
    +   *    thus providing both high performance analytic and low cost storage
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master" "s3-endpoint"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 4 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master> <s3-endpoint>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(args(3))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    --- End diff --
    
    Now we do not allow doing compaction and data load concurrently, it will raise exception: Cannot run data loading and compaction on same table concurrently. Please wait for load to finish
    
    So, I think you should check the finish of data load then do compaction, you can check finish of data load by SHOW SEGMENT and collect the result, loop until there are 3 segments


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162311702
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    --- End diff --
    
    added one more select query.


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2953/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161675737
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3CsvExample.scala ---
    @@ -0,0 +1,111 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3CsvExample {
    +
    +  /**
    +   * This example demonstrate to create local store and load data from CSV files on S3
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 path to csv" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    --- End diff --
    
    remove unnecssary property setting, all these are not required


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1585/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2833/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2961/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2937/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161698051
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    --- End diff --
    
    done


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162304653
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    --- End diff --
    
    please do one more select * query


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2896/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162304832
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    +      args(3)
    +    }
    +    else {
    --- End diff --
    
    move one line up


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162090652
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    +   * 3. With the indexing feature of carbondata, the data read from object based storage is minimum,
    +   *    thus providing both high performance analytic and low cost storage
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master" "s3-endpoint"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 4 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master> <s3-endpoint>")
    --- End diff --
    
    Modify to `<table-path-on-S3>`


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161698128
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    --- End diff --
    
    removed.


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162090105
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    +   * 3. With the indexing feature of carbondata, the data read from object based storage is minimum,
    +   *    thus providing both high performance analytic and low cost storage
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master" "s3-endpoint"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 4 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master> <s3-endpoint>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(args(3))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    --- End diff --
    
    remove one load to make it simpler


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2899/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    I tried, it is successful now. But in log it is showing using LOCAL_SORT, it should be NO_SORT, right?


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162090538
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    +   * 3. With the indexing feature of carbondata, the data read from object based storage is minimum,
    +   *    thus providing both high performance analytic and low cost storage
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master" "s3-endpoint"
    --- End diff --
    
    modify to  `table path on S3`


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 closed the pull request at:

    https://github.com/apache/carbondata/pull/1805


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by anubhav100 <gi...@git.apache.org>.

Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    retest this please


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2916/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161674491
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    --- End diff --
    
    Change to 
    > This example demonstrate usage of
    > 1. create carbon table with storage location on object based storage like AWS S3, Huawei OBS, etc
    > 2. load data into carbon table, the generated file will be stored on object based storage
    > 3. query the table.
    > With the indexing feature of carbondata, the data read from object based storage is minimum, thus providing both high performance analytic and low cost storage


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161673730
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    --- End diff --
    
    remove empty line


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162312570
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    +      args(3)
    +    }
    +    else {
    +      ""
    +    }
    +  }
    +
    +  def getSparkMaster(args: Array[String]): String = {
    +    if (args.length >= 4) {
    +      if (args.length == 5) {
    +        args(4)
    +      }
    +      else if (args(3).contains("spark:") || args(3).contains("mesos:")) {
    --- End diff --
    
    updated to same line.


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1650/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162304995
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    +      args(3)
    +    }
    +    else {
    +      ""
    +    }
    +  }
    +
    +  def getSparkMaster(args: Array[String]): String = {
    +    if (args.length >= 4) {
    +      if (args.length == 5) {
    --- End diff --
    
    this logic can be optimized, you can make one level of `if... else if`


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162304779
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    --- End diff --
    
    why need to compare with ".com"?


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161697998
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,151 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of s3 as a store.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master"
    +   */
    +
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd HH:mm:ss")
    +      .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "yyyy/MM/dd")
    +      .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true")
    +      .addProperty(CarbonCommonConstants.DEFAULT_CARBON_MAJOR_COMPACTION_SIZE, "0.02")
    --- End diff --
    
    done


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1572/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    please rebase and drop the first two commits which are merged to carbonstore branch already


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1630/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2851/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162305178
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.{Row, SparkSession}
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   * like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   * query the table.
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "table-path on s3" "s3-endpoint" "spark-master"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 3 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path-on-s3> [s3-endpoint] [spark-master]")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(getSparkMaster(args))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("WARN")
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | SELECT *
    +         | FROM carbon_table
    +      """.stripMargin).show()
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    val countSegment: Array[Row] =
    +      spark.sql(
    +        s"""
    +           | SHOW SEGMENTS FOR TABLE carbon_table
    +       """.stripMargin).collect()
    +
    +    while (countSegment.length != 3) {
    +      this.wait(2000)
    +    }
    +
    +    // Use compaction command to merge segments or small files in object based storage,
    +    // this can be done periodically.
    +    spark.sql("ALTER table carbon_table compact 'MAJOR'")
    +    spark.sql("show segments for table carbon_table").show()
    +
    +    spark.sql("Drop table if exists carbon_table")
    +
    +    spark.stop()
    +  }
    +
    +  def getKeyOnPrefix(path: String): (String, String, String) = {
    +    val endPoint = "spark.hadoop." + ENDPOINT
    +    if (path.startsWith(CarbonCommonConstants.S3A_PREFIX)) {
    +      ("spark.hadoop." + ACCESS_KEY, "spark.hadoop." + SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3N_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3N_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3N_SECRET_KEY, endPoint)
    +    } else if (path.startsWith(CarbonCommonConstants.S3_PREFIX)) {
    +      ("spark.hadoop." + CarbonCommonConstants.S3_ACCESS_KEY,
    +        "spark.hadoop." + CarbonCommonConstants.S3_SECRET_KEY, endPoint)
    +    } else {
    +      throw new Exception("Incorrect Store Path")
    +    }
    +  }
    +
    +  def getS3EndPoint(args: Array[String]): String = {
    +    if (args.length >= 4 && args(3).contains(".com")) {
    +      args(3)
    +    }
    +    else {
    +      ""
    +    }
    +  }
    +
    +  def getSparkMaster(args: Array[String]): String = {
    +    if (args.length >= 4) {
    +      if (args.length == 5) {
    +        args(4)
    +      }
    +      else if (args(3).contains("spark:") || args(3).contains("mesos:")) {
    +        args(3)
    +      }
    +      else {
    +        "local"
    +      }
    +    }
    +    else {
    --- End diff --
    
    `else if` should be same line as `}`


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162304024
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/AlterTableLoadPartitionRDD.scala ---
    @@ -41,102 +41,95 @@ class AlterTableLoadPartitionRDD[K, V](alterPartitionModel: AlterPartitionModel,
         identifier: AbsoluteTableIdentifier,
         prev: RDD[Array[AnyRef]]) extends RDD[(K, V)](prev) {
     
    -    var storeLocation: String = null
    -    val carbonLoadModel = alterPartitionModel.carbonLoadModel
    -    val segmentId = alterPartitionModel.segmentId
    -    val oldPartitionIds = alterPartitionModel.oldPartitionIds
    -    val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
    -    val databaseName = carbonTable.getDatabaseName
    -    val factTableName = carbonTable.getTableName
    -    val partitionInfo = carbonTable.getPartitionInfo(factTableName)
    +  var storeLocation: String = null
    --- End diff --
    
    Can you put these codestyle modification into a separate PR?


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    retest this please


---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r161676080
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -168,6 +168,14 @@
     
       public static final String S3A_PREFIX = "s3a://";
     
    +  public static final String S3N_ACCESS_KEY = "fs.s3n.awsAccessKeyId";
    --- End diff --
    
    Add comment for these 4 properties


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1621/



---

[GitHub] carbondata pull request #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1805#discussion_r162088431
  
    --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala ---
    @@ -0,0 +1,157 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.carbondata.examples
    +
    +import java.io.File
    +
    +import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY}
    +import org.apache.spark.sql.SparkSession
    +import org.slf4j.{Logger, LoggerFactory}
    +
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +
    +object S3Example {
    +
    +  /**
    +   * This example demonstrate usage of
    +   * 1. create carbon table with storage location on object based storage
    +   *    like AWS S3, Huawei OBS, etc
    +   * 2. load data into carbon table, the generated file will be stored on object based storage
    +   *    query the table.
    +   * 3. With the indexing feature of carbondata, the data read from object based storage is minimum,
    +   *    thus providing both high performance analytic and low cost storage
    +   *
    +   * @param args require three parameters "Access-key" "Secret-key"
    +   *             "s3 bucket path" "spark-master" "s3-endpoint"
    +   */
    +  def main(args: Array[String]) {
    +    val rootPath = new File(this.getClass.getResource("/").getPath
    +                            + "../../../..").getCanonicalPath
    +    val path = s"$rootPath/examples/spark2/src/main/resources/data1.csv"
    +    val logger: Logger = LoggerFactory.getLogger(this.getClass)
    +
    +    import org.apache.spark.sql.CarbonSession._
    +    if (args.length < 4 || args.length > 5) {
    +      logger.error("Usage: java CarbonS3Example <access-key> <secret-key>" +
    +                   "<table-path> <spark-master> <s3-endpoint>")
    +      System.exit(0)
    +    }
    +
    +    val (accessKey, secretKey, endpoint) = getKeyOnPrefix(args(2))
    +    val spark = SparkSession
    +      .builder()
    +      .master(args(3))
    +      .appName("S3Example")
    +      .config("spark.driver.host", "localhost")
    +      .config(accessKey, args(0))
    +      .config(secretKey, args(1))
    +      .config(endpoint, getS3EndPoint(args))
    +      .getOrCreateCarbonSession()
    +
    +    spark.sparkContext.setLogLevel("INFO")
    +
    +    spark.sql(
    +      s"""
    +         | CREATE TABLE if not exists carbon_table(
    +         | shortField SHORT,
    +         | intField INT,
    +         | bigintField LONG,
    +         | doubleField DOUBLE,
    +         | stringField STRING,
    +         | timestampField TIMESTAMP,
    +         | decimalField DECIMAL(18,2),
    +         | dateField DATE,
    +         | charField CHAR(5),
    +         | floatField FLOAT
    +         | )
    +         | STORED BY 'carbondata'
    +         | LOCATION '${ args(2) }'
    +         | TBLPROPERTIES('SORT_COLUMNS'='', 'DICTIONARY_INCLUDE'='dateField, charField')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    +      s"""
    +         | LOAD DATA LOCAL INPATH '$path'
    +         | INTO TABLE carbon_table
    +         | OPTIONS('HEADER'='true')
    +       """.stripMargin)
    +
    +    spark.sql(
    --- End diff --
    
    do a select * after 1 load


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    LGTM


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2808/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2882/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1571/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1679/



---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by anubhav100 <gi...@git.apache.org>.

Github user anubhav100 commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    retest this please


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jatin9896 <gi...@git.apache.org>.

Github user jatin9896 commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    @jackylk table path is the path to bucket location like I have provided s3a://<bucket-name>/<location> and regarding endpoints, I have modified example which takes endpoint as args(4) and it is not mandatory to provide. About connection pooling exception in the example is also fixed. Please check. 


---

[GitHub] carbondata issue #1805: [CARBONDATA-1827] S3 Carbon Implementation

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1805
  
    retest this please


---