You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by foryou2030 <gi...@git.apache.org> on 2016/08/27 06:49:14 UTC

[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

GitHub user foryou2030 opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/100

    Handle all dictionary exception more properly

    # Why rasied this pr?
    
    When using all dictionary, if we give a wrong dictionary file path, or dictionary file include bad record,
    carbon should deal with them properly.
    # How to solve?
    
    1.when give a wrong dictionary file path, throw file not found exception
    2.when dictionary file include bad record,log error, and replace the bad record as default value.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/foryou2030/incubator-carbondata dict_ex

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #100
    
----
commit 00408412971177cf192cebae8092fda20bcb7d58
Author: foryou2030 <fo...@126.com>
Date:   2016-08-27T06:42:16Z

    Handle all dictionary exception more properly

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

Posted by Vimal-Das <gi...@git.apache.org>.
Github user Vimal-Das commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/100#discussion_r76512895
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---
    @@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging {
         // filepath regex, look like "/path/*.dictionary"
         if (filePath.getName.startsWith("*")) {
           val dictExt = filePath.getName.substring(1)
    -      val listFiles = filePath.getParentFile.listFiles()
    -      if (listFiles.exists(file =>
    -        file.getName.endsWith(dictExt) && file.getSize > 0)) {
    -        true
    +      if (filePath.getParentFile.exists()) {
    +        val listFiles = filePath.getParentFile.listFiles()
    +        if (listFiles.exists(file =>
    +          file.getName.endsWith(dictExt) && file.getSize > 0)) {
    +          true
    +        } else {
    +          logWarning("[ALL_DICTIONARY] No dictionary files found or empty dictionary files! " +
    +            "Won't generate new dictionary.")
    +          false
    +        }
           } else {
    -        logInfo("No dictionary files found or empty dictionary files! " +
    -          "Won't generate new dictionary.")
    -        false
    +        throw new FileNotFoundException(
    +          "[ALL_DICTIONARY] The given dictionary file path not found!")
           }
         } else {
    -      if (filePath.exists() && filePath.getSize > 0) {
    -        true
    +      if (filePath.exists()) {
    +        if (filePath.getSize > 0) {
    +          true
    +        } else {
    +          logWarning("[ALL_DICTIONARY] No dictionary files found or empty dictionary files! " +
    +            "Won't generate new dictionary.")
    +          false
    +        }
           } else {
    -        logInfo("No dictionary files found or empty dictionary files! " +
    -          "Won't generate new dictionary.")
    -        false
    +        throw new FileNotFoundException(
    +          "[ALL_DICTIONARY] The given dictionary file path not found!")
    --- End diff --
    
    Correct english grammar in the log messages.
    The given dictionary file path not found! =>The given dictionary file path **is** not found!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/100#discussion_r76519992
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---
    @@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging {
         // filepath regex, look like "/path/*.dictionary"
         if (filePath.getName.startsWith("*")) {
           val dictExt = filePath.getName.substring(1)
    -      val listFiles = filePath.getParentFile.listFiles()
    -      if (listFiles.exists(file =>
    -        file.getName.endsWith(dictExt) && file.getSize > 0)) {
    -        true
    +      if (filePath.getParentFile.exists()) {
    +        val listFiles = filePath.getParentFile.listFiles()
    --- End diff --
    
    what if filePath.getParentFile returns null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

Posted by foryou2030 <gi...@git.apache.org>.
Github user foryou2030 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/100#discussion_r76552945
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---
    @@ -591,29 +591,31 @@ object GlobalDictionaryUtil extends Logging {
           val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
             .map(x => {
             val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR)
    -        var index: Int = 0
    +        if (tokens.size != 2) {
    +          logError("[ALL_DICTIONARY] Read a bad dictionary record: " + x)
    +        }
    +        var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME
             var value: String = ""
             try {
    -          index = tokens(0).toInt
    +          columnName = csvFileColumns(tokens(0).toInt)
               value = tokens(1)
             } catch {
               case ex: Exception =>
    -            logError("read a bad dictionary record" + x)
    +            logError("[ALL_DICTIONARY] Reset bad dictionary record as default value")
             }
    -        (index, value)
    +        (columnName, value)
           })
    +
           // group by column index, and filter required columns
           val requireColumnsList = requireColumns.toList
           allDictionaryRdd = basicRdd
             .groupByKey()
    -        .map(x => (csvFileColumns(x._1), x._2))
             .filter(x => requireColumnsList.contains(x._1))
         } catch {
           case ex: Exception =>
    -        logError("read local dictionary files failed")
    +        logError("[ALL_DICTIONARY] Read dictionary files failed. Caused by" + ex.getMessage)
    --- End diff --
    
    ok, it no required, i have removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/100#discussion_r76519971
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---
    @@ -591,29 +591,31 @@ object GlobalDictionaryUtil extends Logging {
           val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
             .map(x => {
             val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR)
    -        var index: Int = 0
    +        if (tokens.size != 2) {
    +          logError("[ALL_DICTIONARY] Read a bad dictionary record: " + x)
    +        }
    +        var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME
             var value: String = ""
             try {
    -          index = tokens(0).toInt
    +          columnName = csvFileColumns(tokens(0).toInt)
               value = tokens(1)
             } catch {
               case ex: Exception =>
    -            logError("read a bad dictionary record" + x)
    +            logError("[ALL_DICTIONARY] Reset bad dictionary record as default value")
             }
    -        (index, value)
    +        (columnName, value)
           })
    +
           // group by column index, and filter required columns
           val requireColumnsList = requireColumns.toList
           allDictionaryRdd = basicRdd
             .groupByKey()
    -        .map(x => (csvFileColumns(x._1), x._2))
             .filter(x => requireColumnsList.contains(x._1))
         } catch {
           case ex: Exception =>
    -        logError("read local dictionary files failed")
    +        logError("[ALL_DICTIONARY] Read dictionary files failed. Caused by" + ex.getMessage)
    --- End diff --
    
    ALL_DICTIONARY this term is really required in this context?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

Posted by foryou2030 <gi...@git.apache.org>.
Github user foryou2030 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/100#discussion_r76513982
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---
    @@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging {
         // filepath regex, look like "/path/*.dictionary"
         if (filePath.getName.startsWith("*")) {
           val dictExt = filePath.getName.substring(1)
    -      val listFiles = filePath.getParentFile.listFiles()
    -      if (listFiles.exists(file =>
    -        file.getName.endsWith(dictExt) && file.getSize > 0)) {
    -        true
    +      if (filePath.getParentFile.exists()) {
    +        val listFiles = filePath.getParentFile.listFiles()
    +        if (listFiles.exists(file =>
    +          file.getName.endsWith(dictExt) && file.getSize > 0)) {
    +          true
    +        } else {
    +          logWarning("[ALL_DICTIONARY] No dictionary files found or empty dictionary files! " +
    +            "Won't generate new dictionary.")
    +          false
    +        }
           } else {
    -        logInfo("No dictionary files found or empty dictionary files! " +
    -          "Won't generate new dictionary.")
    -        false
    +        throw new FileNotFoundException(
    +          "[ALL_DICTIONARY] The given dictionary file path not found!")
           }
         } else {
    -      if (filePath.exists() && filePath.getSize > 0) {
    -        true
    +      if (filePath.exists()) {
    +        if (filePath.getSize > 0) {
    +          true
    +        } else {
    +          logWarning("[ALL_DICTIONARY] No dictionary files found or empty dictionary files! " +
    +            "Won't generate new dictionary.")
    +          false
    +        }
           } else {
    -        logInfo("No dictionary files found or empty dictionary files! " +
    -          "Won't generate new dictionary.")
    -        false
    +        throw new FileNotFoundException(
    +          "[ALL_DICTIONARY] The given dictionary file path not found!")
    --- End diff --
    
    ok, fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

Posted by foryou2030 <gi...@git.apache.org>.
Github user foryou2030 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/100#discussion_r76552918
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala ---
    @@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging {
         // filepath regex, look like "/path/*.dictionary"
         if (filePath.getName.startsWith("*")) {
           val dictExt = filePath.getName.substring(1)
    -      val listFiles = filePath.getParentFile.listFiles()
    -      if (listFiles.exists(file =>
    -        file.getName.endsWith(dictExt) && file.getSize > 0)) {
    -        true
    +      if (filePath.getParentFile.exists()) {
    +        val listFiles = filePath.getParentFile.listFiles()
    --- End diff --
    
    handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/100


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---