You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/03/02 12:13:41 UTC

[GitHub] [carbondata] Karan980 opened a new pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Karan980 opened a new pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099


    ### Why is this PR needed?
   Indexes cached in Executor cache are not dropped when drop table is called for external table with SDK segments. Because, external tables with sdk segments will not have metadata like table status file. So in drop table command we send zero segments to indexServer clearIndexes job, which clears nothing from executor side.
    
    ### What changes were proposed in this PR?
   Prepared the validSegments from indexFiles present at external table location and send it to IndexServer clearIndexes job through IndexInputFormat.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-793527054


   > please correct the splits path of getSplits in case of External Tables in DistributedPruneRDD as well
   
   Done


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590255909



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();

Review comment:
       give arrayList limit equals to loadMetadataDetails size, because it will never increase that size




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] vikramahuja1001 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-793477512


   please correct the splits path of getSplits in case of External Tables in DistributedPruneRDD as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r589323912



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +116,31 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590318643



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();

Review comment:
       Done

##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();
+      for (LoadMetadataDetails segment : loadMetadataDetails) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r589176617



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +116,31 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {

Review comment:
       Please reuse the LatestFilesReadCommittedScope class to take a snapshot of the non-transactional table. No need to create a separate method 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590318352



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();
+      for (LoadMetadataDetails segment : loadMetadataDetails) {
+        Segment seg = new Segment(segment.getLoadName(), segment.getSegmentFile());
+        seg.setLoadMetadataDetails(segment);
+        listOfValidSegments.add(seg);
+      }
+      indexInputFormat =
+          new IndexInputFormat(carbonTable, listOfValidSegments, new ArrayList<>(0), true,
+              indexToClear);
+    } else {
+      SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
+          getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
+      List<String> invalidSegment = new ArrayList<>();
+      for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-793873234


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590256333



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();
+      for (LoadMetadataDetails segment : loadMetadataDetails) {

Review comment:
       try using java stream api here instead of a for loop wherever possible




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-792733994


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5113/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] asfgit closed pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r590253814



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +115,32 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {
+      ReadCommittedScope readCommittedScope =
+          new LatestFilesReadCommittedScope(carbonTable.getTablePath(),
+              FileFactory.getConfiguration());
+      LoadMetadataDetails[] loadMetadataDetails = readCommittedScope.getSegmentList();
+      List<Segment> listOfValidSegments = new ArrayList<>();
+      for (LoadMetadataDetails segment : loadMetadataDetails) {
+        Segment seg = new Segment(segment.getLoadName(), segment.getSegmentFile());
+        seg.setLoadMetadataDetails(segment);
+        listOfValidSegments.add(seg);
+      }
+      indexInputFormat =
+          new IndexInputFormat(carbonTable, listOfValidSegments, new ArrayList<>(0), true,
+              indexToClear);
+    } else {
+      SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
+          getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
+      List<String> invalidSegment = new ArrayList<>();
+      for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {

Review comment:
       Use Java stream here instead of a for loop




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-788908019


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3347/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#discussion_r589176617



##########
File path: core/src/main/java/org/apache/carbondata/core/index/IndexUtil.java
##########
@@ -112,15 +116,31 @@ public static IndexJob getIndexJob(Configuration configuration) throws IOExcepti
    */
   private static void executeClearIndexJob(IndexJob indexJob,
       CarbonTable carbonTable, String indexToClear) throws IOException {
-    SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo =
-            getValidAndInvalidSegments(carbonTable, FileFactory.getConfiguration());
-    List<String> invalidSegment = new ArrayList<>();
-    for (Segment segment : validAndInvalidSegmentsInfo.getInvalidSegments()) {
-      invalidSegment.add(segment.getSegmentNo());
+    IndexInputFormat indexInputFormat;
+    if (!carbonTable.isTransactionalTable()) {

Review comment:
       Please reuse the LatestFilesReadCommittedScope class to take a snapshot of the non-transactional table.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-794000275


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3363/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-792735786


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3354/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-794004052


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5123/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4099: [CARBONDATA-4141] Index Server is not caching indexes for external tables with sdk segments.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4099:
URL: https://github.com/apache/carbondata/pull/4099#issuecomment-788903537


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5106/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org