You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/08/26 07:34:51 UTC

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

ajantha-bhat opened a new pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901


    ### Why is this PR needed?
    #3764 has added nosort (this is wrong code, but no functional impact as it was not changing new segment load to no_sort)
    #3856 has changed it to no_sort (creates a functional impact by changing target table new segment to use to no_sort)
    
    ### What changes were proposed in this PR?
   CDC update as new segment should use target table sort_scope
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No (verified manually the flows)
   
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680771095


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3875/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] asfgit closed pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680773026


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2134/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
ajantha-bhat edited a comment on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682380195


   > set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.
   
   @Zhangshunyu : to have a faster CDC merge,  target table itself can be created with no_sort. Now if target table is global sort, old segments are sorted and new CDC segmets are not. so, I fixed it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Zhangshunyu commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
Zhangshunyu commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682379495


   set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680714175


   @QiangCai , @ravipesala @marchpure @akashrn5 : please check this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
QiangCai commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-684628244


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682380195


   > set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.
   
   @Zhangshunyu : But the target table itself can be created with no_sort. Now some segments can be sorted and some are not. so, I fixed it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Zhangshunyu commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
Zhangshunyu commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682381033


   @ajantha-bhat yes, agree with this pr's change, to keep same as target.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
ajantha-bhat edited a comment on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682380195


   > set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.
   
   @Zhangshunyu : But the target table itself can be created with no_sort. Now if target table is global sort, old segments are sorted and new CDC segmets are not. so, I fixed it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682408178


   @ajantha-bhat if the target table is no sort and since we are inserting new segment as a separate segment during merge, we can sort  this segment and write which will help in query, instead of blindly going with target table sort?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] ajantha-bhat commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-682419143


   > @ajantha-bhat if the target table is no sort and since we are inserting new segment as a separate segment during merge, we can sort this segment and write which will help in query, instead of blindly going with target table sort?
   
   It is not blindly. The user has decided whether his table needs to be sorted or not based on his requirement (no_sort if want good load speed, global_sort if want good query speed), so it is better to have all segment follow user decision. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org