You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by manishnalla1994 <gi...@git.apache.org> on 2019/01/02 12:46:12 UTC

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

GitHub user manishnalla1994 opened a pull request:

    https://github.com/apache/carbondata/pull/3047

    [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize calculation for old store using Show Segments

    Problem: Table Created and Loading on older version(1.1) was showing data-size and index-size 0B when refreshed on new version. This was because when the data-size was coming as "null" we were not computing it, directly assigning 0 value to it.
    
    Solution: Computed the correct data-size and index-size using CarbonTable.
      
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [x] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishnalla1994/carbondata Datasize0Issue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/3047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3047
    
----
commit 6bf65d7a0b42e8d9a822fd234a510550bd8d2f17
Author: manishnalla1994 <ma...@...>
Date:   2019-01-02T12:30:36Z

    Fixed Wrong Datasize and Indexsize calculation for old store

----


---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2361/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2124/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10378/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2379/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2349/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by manishgupta88 <gi...@git.apache.org>.

Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    LGTM...can be merged once build passes


---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by manishgupta88 <gi...@git.apache.org>.

Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244920921
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -46,9 +47,9 @@ object CarbonStore {
     
       def showSegments(
           limit: Option[String],
    -      tablePath: String,
    +      carbonTable: CarbonTable,
    --- End diff --
    
    Move `carbonTable` as the first argument of method


---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10420/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2143/



---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by qiuchenjian <gi...@git.apache.org>.

Github user qiuchenjian commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244895354
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,21 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize && null == load.getIndexSize) {
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false)
    +              (dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong,
    +                dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong)
    +            } else {
    +              (load.getDataSize.toLong,
    --- End diff --
    
    if one of load.getDataSize and load.getIndexSize is null, it will throw exception, i think this scene should be considered


---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by KanakaKumar <gi...@git.apache.org>.

Github user KanakaKumar commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244980360
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,23 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize || null == load.getIndexSize) {
    +              // If either of datasize or indexsize comes to be null the we calculate the correct
    +              // size and assign
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true)
    --- End diff --
    
    Show segments is a read only query. I think we should not perform write operation in a query. 
    So, I feel its better to calculate every time and show OR just display as not available.


---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10401/



---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by manishnalla1994 <gi...@git.apache.org>.

Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244957746
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,23 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize || null == load.getIndexSize) {
    +              // If either of datasize or indexsize comes to be null the we calculate the correct
    +              // size and assign
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false)
    --- End diff --
    
    Fixed.



---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by manishnalla1994 <gi...@git.apache.org>.

Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244957693
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -46,9 +47,9 @@ object CarbonStore {
     
       def showSegments(
           limit: Option[String],
    -      tablePath: String,
    +      carbonTable: CarbonTable,
    --- End diff --
    
    Done.


---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by manishnalla1994 <gi...@git.apache.org>.

Github user manishnalla1994 commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    retest this please


---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by manishnalla1994 <gi...@git.apache.org>.

Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244911752
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,21 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize && null == load.getIndexSize) {
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false)
    +              (dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE).toLong,
    +                dataIndexSize.get(CarbonCommonConstants.CARBON_TOTAL_INDEX_SIZE).toLong)
    +            } else {
    +              (load.getDataSize.toLong,
    --- End diff --
    
    Yes, fixed it now.


---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2148/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Failed  with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10397/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by manishgupta88 <gi...@git.apache.org>.

Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    LGTM


---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2135/



---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by manishgupta88 <gi...@git.apache.org>.

Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r244922117
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,23 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize || null == load.getIndexSize) {
    +              // If either of datasize or indexsize comes to be null the we calculate the correct
    +              // size and assign
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, false)
    --- End diff --
    
    Boolean flag in the method call is to update the data and index size in the table status file. Pass the flag as true so that it computes the size and update the table status file. This will avoid calculation for each Show Segment call


---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/3047


---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2330/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2165/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10389/



---

[GitHub] carbondata issue #3047: [CARBONDATA-3223] Fixed Wrong Datasize and Indexsize...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/3047
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2341/



---

[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

Posted by manishnalla1994 <gi...@git.apache.org>.

Github user manishnalla1994 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/3047#discussion_r245003004
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
    @@ -101,14 +102,23 @@ object CarbonStore {
               val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) {
                 // for streaming segment, we should get the actual size from the index file
                 // since it is continuously inserting data
    -            val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName)
    +            val segmentDir = CarbonTablePath
    +              .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
                 val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
                 val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath))
                 (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize)
               } else {
                 // for batch segment, we can get the data size from table status file directly
    -            (if (load.getDataSize == null) 0L else load.getDataSize.toLong,
    -              if (load.getIndexSize == null) 0L else load.getIndexSize.toLong)
    +            if (null == load.getDataSize || null == load.getIndexSize) {
    +              // If either of datasize or indexsize comes to be null the we calculate the correct
    +              // size and assign
    +              val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true)
    --- End diff --
    
    As it is a metadata function, we are just computing it once and saving it while passing TRUE in 'calculateDataIndexSize' this function. So the value computed can be used afterwards also.


---