You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by jackylk <gi...@git.apache.org> on 2018/01/02 17:59:50 UTC

[GitHub] carbondata pull request #1750: [CARBONDATA-1969] Support Java API for create...

GitHub user jackylk opened a pull request:

    https://github.com/apache/carbondata/pull/1750

    [CARBONDATA-1969] Support Java API for create table and writer data

    It is nice to have Java API to create carbon table and write CSV data into the table.
    Application can use this API to write data and then query by SparkSQL
    
    In this PR, a new module called store-sdk is added. No changes to existing module.
    
    This PR is on top of #1749 
    
     - [X] Any interfaces changed?
     new API added
     - [X] Any backward compatibility impacted?
     No
     - [X] Document update required?
    Yes
     - [X] Testing done
    Testcase added
     - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    NA

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata writer_api_latest

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1750.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1750
    
----
commit 5d4376bdf7bfd99ff72da355cea0103b568f78c0
Author: Jacky Li <ja...@...>
Date:   2018-01-02T15:46:14Z

    add external table support

commit e599cdb1c15b9c47af108e2f54786b11ad1488b5
Author: Jacky Li <ja...@...>
Date:   2018-01-02T16:01:45Z

    add testcase

commit a7d634283906331ff3ddb7ec2f2dd178d44d42ff
Author: Jacky Li <ja...@...>
Date:   2018-01-02T16:03:35Z

    recover

commit c8d82b0d1bac16ede62b93bb8ca1f80a84ad5dcf
Author: Jacky Li <ja...@...>
Date:   2018-01-02T17:52:23Z

    add sdk

----


---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3297/



---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1266/



---

[GitHub] carbondata pull request #1750: [CARBONDATA-1969] Support Java API to create ...

Posted by mohammadshahidkhan <gi...@git.apache.org>.
Github user mohammadshahidkhan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1750#discussion_r161989164
  
    --- Diff: store/sdk/src/test/scala/org/apache/carbondata/store/TestCarbonFileWriter.scala ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.store
    +
    +import java.io.File
    +
    +import org.apache.spark.sql.Row
    +import org.apache.spark.sql.test.util.QueryTest
    +import org.scalatest.BeforeAndAfterAll
    +
    +import org.apache.carbondata.core.metadata.datatype.{DataTypes, StructField}
    +import org.apache.carbondata.store.api.{CarbonStore, SchemaBuilder}
    +
    +class TestCarbonFileWriter extends QueryTest with BeforeAndAfterAll {
    +
    +  test("test write carbon table and read as external table") {
    +    sql("DROP TABLE IF EXISTS source")
    +
    +    val tablePath = "./db1/tc1"
    +    cleanTestTable(tablePath)
    +    createTestTable(tablePath)
    +
    +    sql(s"CREATE EXTERNAL TABLE source STORED BY 'carbondata' LOCATION '$tablePath'")
    +    checkAnswer(sql("SELECT count(*) from source"), Row(1000))
    +
    +    sql("DROP TABLE IF EXISTS source")
    +  }
    +
    +  test("test write carbon table and read by refresh table") {
    +    sql("DROP DATABASE IF EXISTS db1 CASCADE")
    +
    +    val tablePath = "./db1/tc1"
    +    cleanTestTable(tablePath)
    +    createTestTable(tablePath)
    +
    +    sql("CREATE DATABASE db1 LOCATION './db1'")
    +    sql("REFRESH TABLE db1.tc1")
    +    checkAnswer(sql("SELECT count(*) from db1.tc1"), Row(1000))
    +
    +    sql("DROP DATABASE IF EXISTS db1 CASCADE")
    +  }
    +
    +  private def cleanTestTable(tablePath: String) = {
    +    if (new File(tablePath).exists()) {
    +      new File(tablePath).delete()
    +    }
    +  }
    +
    +  private def createTestTable(tablePath: String): Unit = {
    +    val carbon = CarbonStore.build()
    +
    +    val schema = SchemaBuilder.newInstance
    +      .addColumn(new StructField("name", DataTypes.STRING), true)
    +      .addColumn(new StructField("age", DataTypes.INT), false)
    +      .addColumn(new StructField("height", DataTypes.DOUBLE), false)
    +      .create
    +
    +    val table = carbon.createTable("t1", schema, tablePath)
    +    val segment = table.newBatchSegment()
    +
    +    segment.open()
    +    val writer = segment.newWriter()
    +    (1 to 1000).foreach { _ => writer.writeRow(Array[String]("amy", "1", "2.3")) }
    +    writer.close()
    --- End diff --
    
    Stream close can not be ensured here without finally.


---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2060/



---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1280/



---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    merged with #1798


---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2504/



---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2491/



---

[GitHub] carbondata pull request #1750: [CARBONDATA-1969] Support Java API to create ...

Posted by mohammadshahidkhan <gi...@git.apache.org>.
Github user mohammadshahidkhan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1750#discussion_r161988359
  
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/store/TableBuilder.java ---
    @@ -0,0 +1,134 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.store;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Set;
    +
    +import org.apache.carbondata.core.datastore.impl.FileFactory;
    +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
    +import org.apache.carbondata.core.metadata.CarbonMetadata;
    +import org.apache.carbondata.core.metadata.converter.SchemaConverter;
    +import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl;
    +import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
    +import org.apache.carbondata.core.metadata.schema.table.DataMapSchema;
    +import org.apache.carbondata.core.metadata.schema.table.TableInfo;
    +import org.apache.carbondata.core.metadata.schema.table.TableSchema;
    +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
    +import org.apache.carbondata.core.util.path.CarbonStorePath;
    +import org.apache.carbondata.core.util.path.CarbonTablePath;
    +import org.apache.carbondata.core.writer.ThriftWriter;
    +import org.apache.carbondata.format.SchemaEvolutionEntry;
    +import org.apache.carbondata.store.api.Table;
    +
    +public class TableBuilder {
    +
    +  private String databaseName;
    +  private String tableName;
    +  private String tablePath;
    +  private TableSchema tableSchema;
    +
    +  private TableBuilder() { }
    +
    +  public static TableBuilder newInstance() {
    +    return new TableBuilder();
    +  }
    +
    +  public Table create() throws IOException {
    +    if (tableName == null || tablePath == null || tableSchema == null) {
    +      throw new IllegalArgumentException("must provide table name and table path");
    +    }
    +
    +    if (databaseName == null) {
    +      databaseName = "default";
    +    }
    +
    +    TableInfo tableInfo = new TableInfo();
    +    tableInfo.setDatabaseName(databaseName);
    +    tableInfo.setTableUniqueName(databaseName + "_" + tableName);
    +    tableInfo.setFactTable(tableSchema);
    +    tableInfo.setTablePath(tablePath);
    +    tableInfo.setLastUpdatedTime(System.currentTimeMillis());
    +    tableInfo.setDataMapSchemaList(new ArrayList<DataMapSchema>(0));
    +    AbsoluteTableIdentifier identifier = tableInfo.getOrCreateAbsoluteTableIdentifier();
    +
    +    CarbonTablePath carbonTablePath = CarbonStorePath.getCarbonTablePath(
    +        identifier.getTablePath(),
    +        identifier.getCarbonTableIdentifier());
    +    String schemaFilePath = carbonTablePath.getSchemaFilePath();
    +    String schemaMetadataPath = CarbonTablePath.getFolderContainingFile(schemaFilePath);
    +    CarbonMetadata.getInstance().loadTableMetadata(tableInfo);
    +    SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +    org.apache.carbondata.format.TableInfo thriftTableInfo =
    +        schemaConverter.fromWrapperToExternalTableInfo(
    +            tableInfo,
    +            tableInfo.getDatabaseName(),
    +            tableInfo.getFactTable().getTableName());
    +    org.apache.carbondata.format.SchemaEvolutionEntry schemaEvolutionEntry =
    +        new SchemaEvolutionEntry(
    +            tableInfo.getLastUpdatedTime());
    +    thriftTableInfo.getFact_table().getSchema_evolution().getSchema_evolution_history()
    +        .add(schemaEvolutionEntry);
    +    FileFactory.FileType fileType = FileFactory.getFileType(schemaMetadataPath);
    +    if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) {
    +      FileFactory.mkdirs(schemaMetadataPath, fileType);
    +    }
    +    ThriftWriter thriftWriter = new ThriftWriter(schemaFilePath, false);
    +    thriftWriter.open();
    +    thriftWriter.write(thriftTableInfo);
    +    thriftWriter.close();
    --- End diff --
    
    Its not safe here to call close without finally block
    Writer close can not be ensured if any IOException occurs at  thriftWriter.write(thriftTableInfo)


---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2659/



---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3292/



---

[GitHub] carbondata issue #1750: [CARBONDATA-1969] Support Java API to create table a...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1750
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2668/



---

[GitHub] carbondata pull request #1750: [CARBONDATA-1969] Support Java API to create ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk closed the pull request at:

    https://github.com/apache/carbondata/pull/1750


---