You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Zhangshunyu <gi...@git.apache.org> on 2016/08/22 02:14:52 UTC

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

GitHub user Zhangshunyu opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/81

    [CARBONDATA-132] Parse some Spark exception from executor side and show them directly on driver. 

    ## Why raise this pr:
    For example, when data load is failed because of wrong csv file header in load DDL, the exception message only shows in executor side like "CSV header provided in DDL is not proper. Column names in schema and CSV header are not the same" but the user using beeline can not get it from driver side because dirver only shows "Dataload Faluire" , it is very inconvenient for user to get the reason unless he check the executor log info.
    
    ## How to solve:
    Get the SparkException on driver side and parse the cause, when it is DataLoadingException, we can show the DataLoadingException message to driver. Show DataLoadingException is because that it is mainly about CSV file and wrapped in understandable message which can be shown to the user.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Zhangshunyu/incubator-carbondata exc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #81
    
----
commit d6c32cb6ea80ccfe9f7aee1e14236d90933fce1a
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-08-22T02:08:26Z

    Parse some Spark exception from executor side and show them directly on driver

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...

Posted by Zhangshunyu <gi...@git.apache.org>.
Github user Zhangshunyu closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/81#discussion_r75869589
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging {
               loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
               logInfo("DataLoad failure")
               logger.error(ex)
    --- End diff --
    
    @Zhangshunyu ....Move logInfo and logger.error below the case match block....also in logInfo log the executorMessage. you can move the variable message also above with default value as "DataLoad failure: " so that the same can be used for logInfo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

Posted by Zhangshunyu <gi...@git.apache.org>.
Github user Zhangshunyu commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/81#discussion_r75984276
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging {
               loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
               logInfo("DataLoad failure")
               logger.error(ex)
    +          ex match {
    +            case sparkException: SparkException =>
    +              if (sparkException.getCause.isInstanceOf[DataLoadingException]) {
    +                executorMessage = sparkException.getCause.getMessage
    +              }
    +            case _ =>
    --- End diff --
    
    Here we only get DataLoadingException from executor and show it directly to user so that he can know his incorrect operation, but the other exception we still use "DataLoad Failure", because we do not show internal error to user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...

Posted by Zhangshunyu <gi...@git.apache.org>.
GitHub user Zhangshunyu reopened a pull request:

    https://github.com/apache/incubator-carbondata/pull/81

    [CARBONDATA-132] Fix the bug that the CSV file header exception can not be shown to user using beeline. 

    ## Why raise this pr:
    **For bug fix: The exception that 'CSV File provided is not proper. Column names in schema and csv header are not same' can not be shown to beeline.**
    
    For example, when data load is failed because of wrong csv file header in load DDL, the exception message only shows in executor side like "CSV header provided in DDL is not proper. Column names in schema and CSV header are not the same" but the **user using beeline can not get it from driver side because dirver only shows "Dataload Failure"** , it is very inconvenient for user to get the reason unless he check the executor log info.
    
    ## How to solve:
    Get the Exception on driver side and parse the cause, get the casue message to driver. Show DataLoadingException is because that it is mainly about CSV file and wrapped in understandable message which can be shown to the user.
    
    ## How to test
    Add new test cases:
    1. If both ddl and file not have fileheader:
    Beeline will show like : "DataLoad failure: CSV File provided is not proper. Column names in schema and csv header are not same. CSVFile Name : windows.csv"
    2. If ddl did not provide the proper file header:
    Beeline will show like :"DataLoad failure: CSV header provided in DDL is not proper. Column names in schema and CSV header are not the same."

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Zhangshunyu/incubator-carbondata exc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #81
    
----
commit d6c32cb6ea80ccfe9f7aee1e14236d90933fce1a
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-08-22T02:08:26Z

    Parse some Spark exception from executor side and show them directly on driver

commit 5e1235ae317ea1915f3cba24f57fad6634754af3
Author: mohammadshahidkhan <mo...@gmail.com>
Date:   2016-08-09T05:17:02Z

    CARBONDATA-153 Record count is not matching while loading the data when one data node went down in HA setup

commit 62c0b05e62e3c2cadc03e6355bd587d14eab355c
Author: Venkata Ramana G <ra...@huawei.com>
Date:   2016-08-22T13:29:06Z

    [CARBONDATA-153] This closes #77

commit 7e0584e7a1d90724e88fffd6fcea15e5ba640da8
Author: manishgupt88 <to...@gmail.com>
Date:   2016-07-19T09:25:52Z

    Perform equal distribution of dictionary values among the sublists of a list whenever a dictionary file is loaded into memory

commit 2d4609cdface93ea3f3a7a92e088e5b98f24f7e2
Author: Venkata Ramana G <ra...@huawei.com>
Date:   2016-08-23T14:02:03Z

    [CARBONDATA-80] This closes #44

commit fe1b0f07deda03fe21b98191be7750bf61d8520c
Author: mohammadshahidkhan <mo...@gmail.com>
Date:   2016-07-20T10:32:18Z

    CARBONDATA-117 BlockLet distribution for optimum resource usage

commit 5ebf90a87999b9dd5ec484e54aceb7487ca3096f
Author: Venkata Ramana G <ra...@huawei.com>
Date:   2016-08-23T15:00:07Z

    [CARBONDATA-117] This closes #56

commit 61e40eb0033fca3ffc8d09d392b6090cde284652
Author: ravikiran <ra...@gmail.com>
Date:   2016-08-23T13:58:51Z

    Delete the lock file once the unlocking is done.

commit 64586059241589ecae6e8846ff4643ab03647041
Author: Venkata Ramana G <ra...@huawei.com>
Date:   2016-08-23T15:30:04Z

    [CARBONDATA-170] This closes #86

commit 897c12a031791f60a80f859093837cbd6989e84c
Author: Jay357089 <li...@huawei.com>
Date:   2016-08-22T12:19:06Z

    colDict_Alldict

commit c11058d7435f4176b1fee1d9fe637eb233936a6a
Author: Venkata Ramana G <ra...@huawei.com>
Date:   2016-08-23T18:59:57Z

    [CARBONDATA-169] This closes #83

commit eac5573a644118c4942715f15e629ffa9ca1141b
Author: mohammadshahidkhan <mo...@gmail.com>
Date:   2016-08-23T15:17:28Z

    [CARBONDATA-171] Block distribution not proper when the number of active executors more than the node size

commit 1a28ada21af0f0ff975c93252fdbec959974e542
Author: Venkata Ramana G <ra...@huawei.com>
Date:   2016-08-23T19:28:45Z

    [CARBONDATA-171] This closes #87

commit d981c0d06e0a9f0881533f87c405b4464f71019c
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-08-24T02:06:02Z

    fix review comments

commit 6e4b21e5372c7b0d4c47dcd0d3366148d717526c
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-08-24T03:11:22Z

    add test case

commit b59d5c77d8046e6e5e85f1e9c0758cb628cd455b
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-08-24T03:34:24Z

    add test case

commit f4aef3a5beb4590ad5a3f7a5f5226be4a3d480d1
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-08-24T09:21:19Z

    fix comments

commit 27f7c4c7a0f310764c148aa9786ee51ad4884f8f
Author: Praveen Adlakha <ad...@gmail.com>
Date:   2016-08-12T12:41:38Z

    CARBONDATA-129 Do null check before adding value to CarbonProperties

commit 24d93b15a2302a7747c5cf2bc8486496efc9e47f
Author: chenliang613 <ch...@apache.org>
Date:   2016-08-26T07:35:36Z

    CARBONDATA-129 Do null check before adding value to CarbonProperties This closes #73

commit 8e75531a447e92ef145b75b7acc86732a0228706
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-08-24T06:30:50Z

    fix the problem of hdfs lock and move the lock file inside the table folder
    
    fix the problem of hdfs lock and move the lock file inside the table folder

commit d702e826268f40e5064d26b972e69f4215976fd3
Author: chenliang613 <ch...@apache.org>
Date:   2016-08-26T15:23:29Z

    [CARBONDATA-174]Fix the problem of hdfs lock and move the lock file inside the table folder This closes #89

commit be84575e99df5f9e20a8c9578ad26d3e1558dbf4
Author: mohammadshahidkhan <mo...@gmail.com>
Date:   2016-08-26T06:22:11Z

    [CARBONDATA-183] Blocks are allocated to single node when Executors configured is based on the ip address.

commit 4bb00485586dd6c8e2e744f5be3cb8b7a78ca95b
Author: ravipesala <ra...@gmail.com>
Date:   2016-08-26T16:06:46Z

    [CARBONDATA-183] Blocks are allocated to single node when Executors configured is based on the ip address This closes #98

commit cb8abfef529f2ed594f36493fe745f3a763e3847
Author: nareshpr <pr...@gmail.com>
Date:   2016-08-26T09:10:27Z

    Fixed special char delimiter for complex data.

commit 518b1325b4479f2fc21a54426a31366232abedb8
Author: ravipesala <ra...@gmail.com>
Date:   2016-08-26T16:11:14Z

    [CARBONDATA-184] Fixed special char delimiter for complex data. This closes #99

commit 166d410e6579680a15392062dd53945468b00550
Author: foryou2030 <fo...@126.com>
Date:   2016-08-16T12:08:19Z

    fix load data with first line is null
    
    add test data

commit 5d1a177e7c04f88d0115a2aea549706e008aca39
Author: chenliang613 <ch...@apache.org>
Date:   2016-08-27T00:44:42Z

    [CARBONDATA-158] fix load data with first line is null This closes #76

commit 2f3ac7576b2ef3a93d5f233351d8c5ab89139937
Author: Jay357089 <li...@huawei.com>
Date:   2016-08-25T16:10:39Z

    describe

commit dfa768a6fb92a1af1f707f5fb2995b303f7857cf
Author: chenliang613 <ch...@apache.org>
Date:   2016-08-27T00:57:40Z

    [CARBONDATA-179] Describe table show old table schema This closes #96

commit 27705c17838e3a1a426aee185bcf21db250315f5
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-08-25T03:40:35Z

    Fix the bug that table not exist exception occured when using sparksql and beeline the same time

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/81#discussion_r76017047
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -773,20 +776,27 @@ object CarbonDataRDDFactory extends Logging {
           } catch {
             case ex: Throwable =>
               loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
    -          logInfo("DataLoad failure")
    +          ex match {
    +            case sparkException: SparkException =>
    +              if (sparkException.getCause.isInstanceOf[DataLoadingException]) {
    +                executorMessage = sparkException.getCause.getMessage
    +                errorMessage = errorMessage + ": " + executorMessage
    +              }
    +            case _ =>
    --- End diff --
    
    @Zhangshunyu ...for case _ also get the message from throwable and add it to error message same as you have done for sparkException case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/81#discussion_r75869120
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging {
               loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
               logInfo("DataLoad failure")
               logger.error(ex)
    +          ex match {
    +            case sparkException: SparkException =>
    +              if (sparkException.getCause.isInstanceOf[DataLoadingException]) {
    +                executorMessage = sparkException.getCause.getMessage
    +              }
    +            case _ =>
    --- End diff --
    
    @Zhangshunyu ...for underscore case also get the message and assign it to variable executorMessage because lets say the control comes to case _ then variable executorMessage will have empty value and message will remain as "DataLoad failure: " which will be incorrect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

Posted by Zhangshunyu <gi...@git.apache.org>.
Github user Zhangshunyu commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/81#discussion_r75984291
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging {
               loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
               logInfo("DataLoad failure")
               logger.error(ex)
    --- End diff --
    
    OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---