You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by kunal642 <gi...@git.apache.org> on 2018/11/15 04:54:22 UTC

[GitHub] carbondata pull request #2923: [WIP] added partition columns to the last whe...

GitHub user kunal642 opened a pull request:

    https://github.com/apache/carbondata/pull/2923

    [WIP] added partition columns to the last when collecting columns

    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kunal642/carbondata partition_alter_bugfix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2923
    
----
commit c399910c682f2573827214b1f76aea0225d7ecf8
Author: kunal642 <ku...@...>
Date:   2018-11-15T04:40:00Z

    added partition columns to the last when collecting columns

----


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1419/



---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9671/



---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1637/



---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by kunal642 <gi...@git.apache.org>.

Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    retest this please


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by manishgupta88 <gi...@git.apache.org>.

Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    LGTM


---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

Posted by kunal642 <gi...@git.apache.org>.

Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2923#discussion_r235681603
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
    @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
         sql("drop datamap if exists preaggTable on table partitionTable")
       }
     
    +  test("validate data in partition table after dropping and adding a column") {
    +    sql("drop table if exists par")
    +    sql("create table par(name string) partitioned by (age double) stored by " +
    +              "'carbondata'")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    +    sql("alter table par drop columns(name)")
    +    sql("alter table par add columns(name string)")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    --- End diff --
    
    @ravipesala Spark-2.1 and 2.2 both put partition column at the last even if a new column is added.


---

[GitHub] carbondata issue #2923: [WIP] Fixed dataload failure when a column is droppe...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1413/



---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

Posted by manishgupta88 <gi...@git.apache.org>.

Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2923#discussion_r234499385
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
    @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
         sql("drop datamap if exists preaggTable on table partitionTable")
       }
     
    +  test("validate data in partition table after dropping and adding a column") {
    +    sql("drop table if exists par")
    +    sql("create table par(name string) partitioned by (age double) stored by " +
    +              "'carbondata'")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    +    sql("alter table par drop columns(name)")
    +    sql("alter table par add columns(name string)")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    --- End diff --
    
    keeping partition column at the end is carbondata behavior which may or may not be known to user. For a normal table whenever a column is dropped and added, the added column data should either be added as the last column in csv file or it should be mapped through fileheader which is the correct behavior.
    As you are using the same csv file in your test case without changing the order of data and providing header the above explained behavior might not hold true. Please revisit the changes and take opinion from other PMC's/Committers on this behavioral change


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1636/



---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1427/



---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    LGTM


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1628/



---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9676/



---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2923#discussion_r235270774
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
    @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
         sql("drop datamap if exists preaggTable on table partitionTable")
       }
     
    +  test("validate data in partition table after dropping and adding a column") {
    +    sql("drop table if exists par")
    +    sql("create table par(name string) partitioned by (age double) stored by " +
    +              "'carbondata'")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    +    sql("alter table par drop columns(name)")
    +    sql("alter table par add columns(name string)")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    --- End diff --
    
    @kunal642 Please check the behaviour of all version spark. Are all versions of spark behave the same way?


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9684/



---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2923


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by kunal642 <gi...@git.apache.org>.

Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    @ravipesala @manishgupta88 Please review


---

[GitHub] carbondata pull request #2923: [CARBONDATA-3101] Fixed dataload failure when...

Posted by kunal642 <gi...@git.apache.org>.

Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2923#discussion_r234966352
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala ---
    @@ -437,6 +437,20 @@ test("Creation of partition table should fail if the colname in table schema and
         sql("drop datamap if exists preaggTable on table partitionTable")
       }
     
    +  test("validate data in partition table after dropping and adding a column") {
    +    sql("drop table if exists par")
    +    sql("create table par(name string) partitioned by (age double) stored by " +
    +              "'carbondata'")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    +    sql("alter table par drop columns(name)")
    +    sql("alter table par add columns(name string)")
    +    sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" +
    +        s"('header'='false')")
    --- End diff --
    
    Actually spark expects the partition column to be the last. So the solution that you are proposing will not work incase of datasource because spark will parse the schema and would always add the partition column to the last


---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1428/



---

[GitHub] carbondata issue #2923: [CARBONDATA-3101] Fixed dataload failure when a colu...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2923
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9685/



---