You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by jackylk <gi...@git.apache.org> on 2016/10/04 19:49:01 UTC

[GitHub] incubator-carbondata pull request #212: Use path parameter in Spark datasour...

GitHub user jackylk opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/212

    Use path parameter in Spark datasource API

    Currently, when using carbon with spark datasource API, it need to give database name and table name as parameter, it is not the normal way of datasource API usage. In this PR, database name and table name is not required to give, user need to specify the `path` parameter only when using datasource API
    
    Jira issue will be created later as Apache Jira is not available in China temporarily.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata use-path-only

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #212
    
----
commit 83509c9a0b52d4393786fa545d424c179b4e8e61
Author: jackylk <ja...@huawei.com>
Date:   2016-10-04T18:01:57Z

    use path only

commit 5831a990f0a2f49181cdca0156c86dfd0109b0be
Author: jackylk <ja...@huawei.com>
Date:   2016-10-04T19:32:00Z

    modify testcase

commit 03aed7dbf7ad097ec13588a4ce350ff95a10a401
Author: jackylk <ja...@huawei.com>
Date:   2016-10-04T19:44:47Z

    fix style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/212


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/212#discussion_r83336930
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala ---
    @@ -55,18 +55,11 @@ class CarbonSource extends RelationProvider
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    // if path is provided we can directly create Hadoop relation. \
    -    // Otherwise create datasource relation
    -    parameters.get("path") match {
    -      case Some(path) => CarbonDatasourceHadoopRelation(sqlContext, Array(path), parameters, None)
    -      case _ =>
    -        val options = new CarbonOption(parameters)
    -        val tableIdentifier = options.tableIdentifier.split("""\.""").toSeq
    -        val identifier = tableIdentifier match {
    -          case Seq(name) => TableIdentifier(name, None)
    -          case Seq(db, name) => TableIdentifier(name, Some(db))
    -        }
    -        CarbonDatasourceRelation(identifier, None)(sqlContext)
    +    val options = new CarbonOption(parameters)
    +    if (sqlContext.isInstanceOf[CarbonContext]) {
    --- End diff --
    
    It works, please check DataFrameAPIExample.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/212#discussion_r83350430
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -861,9 +861,11 @@ private[sql] case class CreateTable(cm: tableModel) extends RunnableCommand {
           val tablePath = catalog.createTableFromThrift(tableInfo, dbName, tbName, null)(sqlContext)
           try {
             sqlContext.sql(
    -          s"""CREATE TABLE $dbName.$tbName USING carbondata""" +
    -          s""" OPTIONS (tableName "$dbName.$tbName", tablePath "$tablePath") """)
    -              .collect
    +          s"""
    +             | CREATE TABLE $dbName.$tbName
    +             | USING carbondata
    +             | OPTIONS (path "$tablePath")
    --- End diff --
    
    There would be backward compatability issues here. Old tables cannot work because `path` was not present.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/212#discussion_r83351395
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -861,9 +861,11 @@ private[sql] case class CreateTable(cm: tableModel) extends RunnableCommand {
           val tablePath = catalog.createTableFromThrift(tableInfo, dbName, tbName, null)(sqlContext)
           try {
             sqlContext.sql(
    -          s"""CREATE TABLE $dbName.$tbName USING carbondata""" +
    -          s""" OPTIONS (tableName "$dbName.$tbName", tablePath "$tablePath") """)
    -              .collect
    +          s"""
    +             | CREATE TABLE $dbName.$tbName
    +             | USING carbondata
    +             | OPTIONS (path "$tablePath")
    --- End diff --
    
    ok, will fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/212#discussion_r83350489
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala ---
    @@ -55,18 +55,11 @@ class CarbonSource extends RelationProvider
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    // if path is provided we can directly create Hadoop relation. \
    -    // Otherwise create datasource relation
    -    parameters.get("path") match {
    -      case Some(path) => CarbonDatasourceHadoopRelation(sqlContext, Array(path), parameters, None)
    -      case _ =>
    -        val options = new CarbonOption(parameters)
    -        val tableIdentifier = options.tableIdentifier.split("""\.""").toSeq
    -        val identifier = tableIdentifier match {
    -          case Seq(name) => TableIdentifier(name, None)
    -          case Seq(db, name) => TableIdentifier(name, Some(db))
    -        }
    -        CarbonDatasourceRelation(identifier, None)(sqlContext)
    +    val options = new CarbonOption(parameters)
    +    if (sqlContext.isInstanceOf[CarbonContext]) {
    --- End diff --
    
    Ok, got it. You have added `path` in `options` while creating datasource table to make it work. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #212: [CARBONDATA-285] Use path parameter in Spar...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/212
  
    This will be done in another PR for spark2 integration
    closing it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/212#discussion_r83146927
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala ---
    @@ -55,18 +55,11 @@ class CarbonSource extends RelationProvider
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    // if path is provided we can directly create Hadoop relation. \
    -    // Otherwise create datasource relation
    -    parameters.get("path") match {
    -      case Some(path) => CarbonDatasourceHadoopRelation(sqlContext, Array(path), parameters, None)
    -      case _ =>
    -        val options = new CarbonOption(parameters)
    -        val tableIdentifier = options.tableIdentifier.split("""\.""").toSeq
    -        val identifier = tableIdentifier match {
    -          case Seq(name) => TableIdentifier(name, None)
    -          case Seq(db, name) => TableIdentifier(name, Some(db))
    -        }
    -        CarbonDatasourceRelation(identifier, None)(sqlContext)
    +    val options = new CarbonOption(parameters)
    +    if (sqlContext.isInstanceOf[CarbonContext]) {
    --- End diff --
    
    sorry, yes `carboncontext.load(path)` cannot work now right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/212#discussion_r83123943
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala ---
    @@ -55,18 +55,11 @@ class CarbonSource extends RelationProvider
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    // if path is provided we can directly create Hadoop relation. \
    -    // Otherwise create datasource relation
    -    parameters.get("path") match {
    -      case Some(path) => CarbonDatasourceHadoopRelation(sqlContext, Array(path), parameters, None)
    -      case _ =>
    -        val options = new CarbonOption(parameters)
    -        val tableIdentifier = options.tableIdentifier.split("""\.""").toSeq
    -        val identifier = tableIdentifier match {
    -          case Seq(name) => TableIdentifier(name, None)
    -          case Seq(db, name) => TableIdentifier(name, Some(db))
    -        }
    -        CarbonDatasourceRelation(identifier, None)(sqlContext)
    +    val options = new CarbonOption(parameters)
    +    if (sqlContext.isInstanceOf[CarbonContext]) {
    --- End diff --
    
    There is no `load` method in dataframe, only in context class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---