You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by dhatchayani <gi...@git.apache.org> on 2018/01/10 11:08:08 UTC

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

GitHub user dhatchayani opened a pull request:

    https://github.com/apache/carbondata/pull/1785

    [CARBONDATA-2015] Restricted maximum length of bytes per column

    Validation for number of bytes for a column is added.
    
    We have limited the number of characters per column to 32000.
    For example, a single unicode character takes 3 bytes. So in this case, if my column has 30,000 unicode characters, then 32000 * 3 exceeds the short range. So, load will fail.
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [x] Testing done
            UT Added
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhatchayani/incubator-carbondata 32000_bytes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1785
    
----
commit e380a1d6b2ffae8611f6045e9f63d2ca6e710652
Author: dhatchayani <dh...@...>
Date:   2018-01-10T10:59:14Z

    [CARBONDATA-2015] Restricted maximum length of bytes per column

----


---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2677/



---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1443/



---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    LGTM


---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1492/



---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2725/



---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2814/



---

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1785


---

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1785#discussion_r161144435
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ---
    @@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterAll {
         } catch {
           case _:Exception => assert(true)
         }
    +  }
    +
    +  test("test load / insert / update with data more than 32000 bytes - dictionary_exclude") {
    +    val testdata = s"$resourcesPath/unicodechar.csv"
    +    sql("drop table if exists load32000bytes")
    +    sql("create table load32000bytes(name string) stored by 'carbondata'")
    +    sql("insert into table load32000bytes select 'aaa'")
    +
    +    assert(intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage.equals("DataLoad failure: Dataload failed, String size cannot exceed 32000 bytes"))
    +
    +    val source = scala.io.Source.fromFile(testdata)
    +    val data = source.mkString
    +
    +    try {
    +      sql(s"insert into load32000bytes values('$data')")
    --- End diff --
    
    avoid using try catch and use intercept[Exception]


---

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1785#discussion_r161144939
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ---
    @@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterAll {
         } catch {
           case _:Exception => assert(true)
         }
    +  }
    +
    +  test("test load / insert / update with data more than 32000 bytes - dictionary_exclude") {
    +    val testdata = s"$resourcesPath/unicodechar.csv"
    +    sql("drop table if exists load32000bytes")
    +    sql("create table load32000bytes(name string) stored by 'carbondata'")
    +    sql("insert into table load32000bytes select 'aaa'")
    +
    +    assert(intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage.equals("DataLoad failure: Dataload failed, String size cannot exceed 32000 bytes"))
    +
    +    val source = scala.io.Source.fromFile(testdata)
    +    val data = source.mkString
    +
    +    try {
    +      sql(s"insert into load32000bytes values('$data')")
    +      assert(false)
    +    } catch {
    +      case _: Exception =>
    +        assert(true)
    +    }
    +
    +    try {
    +      sql(s"update load32000bytes set(name)= ('$data')").show()
    +      assert(false)
    +    } catch {
    +      case _: Exception =>
    +        assert(true)
    +    }
    +
    +    sql("drop table if exists load32000bytes")
    +
    +  }
    +
    +  test("test load / insert / update with data more than 32000 bytes - dictionary_include") {
    +    val testdata = s"$resourcesPath/unicodechar.csv"
    +    sql("drop table if exists load32000bytes")
    +    sql("create table load32000bytes(name string) stored by 'carbondata' TBLPROPERTIES('DICTIONARY_INCLUDE'='name')")
    +    sql("insert into table load32000bytes select 'aaa'")
    +
    +    val error = intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage
    +
    +    assert(intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage.contains("generate global dictionary failed, Dataload failed, String size cannot exceed 32000 bytes"))
    +
    +    val source = scala.io.Source.fromFile(testdata)
    +    val data = source.mkString
    +
    +    try {
    +      sql(s"insert into load32000bytes values('$data')")
    +      assert(false)
    +    } catch {
    +      case _: Exception =>
    +        assert(true)
    +    }
    +
    +    try {
    --- End diff --
    
    avoid using try catch at all the places in the modified code


---

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1785#discussion_r161144886
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala ---
    @@ -113,6 +113,9 @@ object CarbonScalaUtil {
             case s: String => if (s.length > CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) {
               throw new Exception("Dataload failed, String length cannot exceed " +
                                   CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT + " characters")
    +        } else if (ByteUtil.toBytes(s).length > CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) {
    --- End diff --
    
    this can impact the data load performance. Move this check to CarbonDictionaryWriterImpl class where we are already converting the string values to byte array. This will not add any extra cost for for conversion of string to byte array


---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2849/



---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2722/



---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by dhatchayani <gi...@git.apache.org>.
Github user dhatchayani commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    retest this please


---

[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1785#discussion_r161144455
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ---
    @@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterAll {
         } catch {
           case _:Exception => assert(true)
         }
    +  }
    +
    +  test("test load / insert / update with data more than 32000 bytes - dictionary_exclude") {
    +    val testdata = s"$resourcesPath/unicodechar.csv"
    +    sql("drop table if exists load32000bytes")
    +    sql("create table load32000bytes(name string) stored by 'carbondata'")
    +    sql("insert into table load32000bytes select 'aaa'")
    +
    +    assert(intercept[Exception] {
    +      sql(s"load data local inpath '$testdata' into table load32000bytes OPTIONS ('FILEHEADER'='name')")
    +    }.getMessage.equals("DataLoad failure: Dataload failed, String size cannot exceed 32000 bytes"))
    +
    +    val source = scala.io.Source.fromFile(testdata)
    +    val data = source.mkString
    +
    +    try {
    +      sql(s"insert into load32000bytes values('$data')")
    +      assert(false)
    +    } catch {
    +      case _: Exception =>
    +        assert(true)
    +    }
    +
    +    try {
    --- End diff --
    
    same comment as above


---

[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1785
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1489/



---