You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2021/03/09 09:52:47 UTC

[GitHub] [iotdb] zhanglingzhe0820 opened a new pull request #2785: [IOTDB-1187] Fix unseq compaction loss data bug after delete operation

zhanglingzhe0820 opened a new pull request #2785:
URL: https://github.com/apache/iotdb/pull/2785


   ## Behavior
   (1) Use the folowing code to write data into 0.11.2:
   (2) delete from root.sg_1
   (3) Then rerun the following code
   (4) select count( * ) from root
   The count is less than 1000000.
   
   `
   static List<Record> mockData(int loop) {
       List<Record> recordList = new ArrayList<>();
       int mockSize=10000;
       long timestamp = mockSize*loop;
       for (int r = 0; r < mockSize; r++) {
           Record record = new Record();
           String storageGroup = "root.sg_1";
           String deviceId = String.format("%s.device_1", storageGroup);
           record.setStorageGroup(storageGroup);
           record.setDeviceId(deviceId);
           record.setTimestamp(new Timestamp(timestamp+r));
           Record.Column column = new Record.Column("pressure", "Float", r%999.0f);
           record.addColumn(column);
           column = new Record.Column("temperature", "Float", r%37.0f);
           record.addColumn(column);
           recordList.add(record);
       }
       return recordList;
   }
   
   public static void main(String[] args) throws IoTDBConnectionException, StatementExecutionException {
   
       IOTDBReader iotdbReader = new IOTDBReader();
       Session session = iotdbReader.openSession();
       for (int loop = 0; loop < 100; loop++) {
           List<Record> recordList = mockData(loop);
           iotdbReader.insertTablet(session, recordList, loop);
           System.out.println(loop);
       }
       session.close();
   }
   `
   
   ## Reason
   The unseq compaction does not update end time if the end time after unseq compaction is smaller than the end time before.
   This case can be caused by deletion operation as the deleted data will be removed in a compaction.
   
   ## Solution
   1、Create a new TsFileSequenceReader to get all chunkMetadata.
   2、use putEndTime instead of updateEndTime


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [iotdb] zhanglingzhe0820 commented on a change in pull request #2785: [IOTDB-1187] Fix unseq compaction loss data bug after delete operation

Posted by GitBox <gi...@apache.org>.
zhanglingzhe0820 commented on a change in pull request #2785:
URL: https://github.com/apache/iotdb/pull/2785#discussion_r590298443



##########
File path: server/src/main/java/org/apache/iotdb/db/engine/merge/task/MergeFileTask.java
##########
@@ -192,8 +192,8 @@ private void moveMergedToOld(TsFileResource seqFile) throws IOException {
             return;
           }
         }
+        updateStartTimeAndEndTime(seqFile, newFileReader);
       }
-      updateStartTimeAndEndTime(seqFile, oldFileWriter);

Review comment:
       I am also curios about it as the oldFileWriter cannot get the metadata.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [iotdb] jt2594838 commented on a change in pull request #2785: [IOTDB-1187] Fix unseq compaction loss data bug after delete operation

Posted by GitBox <gi...@apache.org>.
jt2594838 commented on a change in pull request #2785:
URL: https://github.com/apache/iotdb/pull/2785#discussion_r590290837



##########
File path: server/src/main/java/org/apache/iotdb/db/engine/merge/task/MergeFileTask.java
##########
@@ -208,15 +208,27 @@ private void moveMergedToOld(TsFileResource seqFile) throws IOException {
     }
   }
 
-  private void updateStartTimeAndEndTime(TsFileResource seqFile, TsFileIOWriter fileWriter) {
+  private void updateStartTimeAndEndTime(TsFileResource seqFile, TsFileSequenceReader fileReader)
+      throws IOException {
     // TODO change to get one timeseries block each time
-    for (Entry<String, List<ChunkMetadata>> deviceChunkMetadataEntry :
-        fileWriter.getDeviceChunkMetadataMap().entrySet()) {
-      String device = deviceChunkMetadataEntry.getKey();
-      for (ChunkMetadata chunkMetadata : deviceChunkMetadataEntry.getValue()) {
-        seqFile.updateStartTime(device, chunkMetadata.getStartTime());
-        seqFile.updateEndTime(device, chunkMetadata.getEndTime());
+    List<String> devices = fileReader.getAllDevices();
+    for (String device : devices) {
+      Map<String, List<ChunkMetadata>> chunkMetadataListMap =
+          fileReader.readChunkMetadataInDevice(device);
+      long minStartTime = Long.MAX_VALUE;
+      long maxEndTime = Long.MIN_VALUE;
+      for (List<ChunkMetadata> chunkMetadataList : chunkMetadataListMap.values()) {
+        for (ChunkMetadata chunkMetadata : chunkMetadataList) {
+          if (chunkMetadata.getStartTime() < minStartTime) {
+            minStartTime = chunkMetadata.getStartTime();
+          }
+          if (chunkMetadata.getEndTime() > maxEndTime) {
+            maxEndTime = chunkMetadata.getEndTime();
+          }
+        }
       }
+      seqFile.putStartTime(device, minStartTime);
+      seqFile.putEndTime(device, maxEndTime);

Review comment:
       I do not think the range of the new file can be directly put into the resource.
   Consider a compaction with a seq file containing three chunks, [0, 7] [10, 15] [20, 29], and an unseq file with one chunk, [17,28]. The new file (merged file) will contain only a chunk, [17, 29] if a full merge is not required, and the chunk will be appended to the old seq file as merged chunks are the minority. If you just use the time range from the new file, which is [17, 29], to put into the new TsFileResource,  the result will not be accurate.

##########
File path: server/src/main/java/org/apache/iotdb/db/engine/merge/task/MergeFileTask.java
##########
@@ -192,8 +192,8 @@ private void moveMergedToOld(TsFileResource seqFile) throws IOException {
             return;
           }
         }
+        updateStartTimeAndEndTime(seqFile, newFileReader);
       }
-      updateStartTimeAndEndTime(seqFile, oldFileWriter);

Review comment:
       I am curious why `oldFileWriter` does not contain ChunkMetadata of merged chunks.
   Technically, as merged chunks are written into `oldFileWriter`, it should have such metadata which can be used to update start times and end times. And because they are cached in memory before the writer is closed, accessing them should be faster than reading them again from `newFileReader`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [iotdb] zhanglingzhe0820 commented on a change in pull request #2785: [IOTDB-1187] Fix unseq compaction loss data bug after delete operation

Posted by GitBox <gi...@apache.org>.
zhanglingzhe0820 commented on a change in pull request #2785:
URL: https://github.com/apache/iotdb/pull/2785#discussion_r590377434



##########
File path: server/src/main/java/org/apache/iotdb/db/engine/merge/task/MergeFileTask.java
##########
@@ -208,15 +208,27 @@ private void moveMergedToOld(TsFileResource seqFile) throws IOException {
     }
   }
 
-  private void updateStartTimeAndEndTime(TsFileResource seqFile, TsFileIOWriter fileWriter) {
+  private void updateStartTimeAndEndTime(TsFileResource seqFile, TsFileSequenceReader fileReader)
+      throws IOException {
     // TODO change to get one timeseries block each time
-    for (Entry<String, List<ChunkMetadata>> deviceChunkMetadataEntry :
-        fileWriter.getDeviceChunkMetadataMap().entrySet()) {
-      String device = deviceChunkMetadataEntry.getKey();
-      for (ChunkMetadata chunkMetadata : deviceChunkMetadataEntry.getValue()) {
-        seqFile.updateStartTime(device, chunkMetadata.getStartTime());
-        seqFile.updateEndTime(device, chunkMetadata.getEndTime());
+    List<String> devices = fileReader.getAllDevices();
+    for (String device : devices) {
+      Map<String, List<ChunkMetadata>> chunkMetadataListMap =
+          fileReader.readChunkMetadataInDevice(device);
+      long minStartTime = Long.MAX_VALUE;
+      long maxEndTime = Long.MIN_VALUE;
+      for (List<ChunkMetadata> chunkMetadataList : chunkMetadataListMap.values()) {
+        for (ChunkMetadata chunkMetadata : chunkMetadataList) {
+          if (chunkMetadata.getStartTime() < minStartTime) {
+            minStartTime = chunkMetadata.getStartTime();
+          }
+          if (chunkMetadata.getEndTime() > maxEndTime) {
+            maxEndTime = chunkMetadata.getEndTime();
+          }
+        }
       }
+      seqFile.putStartTime(device, minStartTime);
+      seqFile.putEndTime(device, maxEndTime);

Review comment:
       okay




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [iotdb] qiaojialin merged pull request #2785: [IOTDB-1187] Fix unseq compaction loss data bug after delete operation

Posted by GitBox <gi...@apache.org>.
qiaojialin merged pull request #2785:
URL: https://github.com/apache/iotdb/pull/2785


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [iotdb] zhanglingzhe0820 commented on a change in pull request #2785: [IOTDB-1187] Fix unseq compaction loss data bug after delete operation

Posted by GitBox <gi...@apache.org>.
zhanglingzhe0820 commented on a change in pull request #2785:
URL: https://github.com/apache/iotdb/pull/2785#discussion_r590298443



##########
File path: server/src/main/java/org/apache/iotdb/db/engine/merge/task/MergeFileTask.java
##########
@@ -192,8 +192,8 @@ private void moveMergedToOld(TsFileResource seqFile) throws IOException {
             return;
           }
         }
+        updateStartTimeAndEndTime(seqFile, newFileReader);
       }
-      updateStartTimeAndEndTime(seqFile, oldFileWriter);

Review comment:
       I am also curious about it as the oldFileWriter cannot get the metadata.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [iotdb] jt2594838 commented on a change in pull request #2785: [IOTDB-1187] Fix unseq compaction loss data bug after delete operation

Posted by GitBox <gi...@apache.org>.
jt2594838 commented on a change in pull request #2785:
URL: https://github.com/apache/iotdb/pull/2785#discussion_r592853623



##########
File path: server/src/main/java/org/apache/iotdb/db/engine/merge/manage/MergeResource.java
##########
@@ -263,4 +266,28 @@ public void setMeasurementSchemaMap(Map<PartialPath, MeasurementSchema> measurem
   public void clearChunkWriterCache() {
     this.chunkWriterCache.clear();
   }
+
+  public void updateStartTime(TsFileResource tsFileResource, String device, long startTime) {
+    Map<String, Pair<Long, Long>> deviceStartEndTimePairMap =
+        startEndTimeCache.getOrDefault(tsFileResource, new HashMap<>());
+    Pair<Long, Long> startEndTimePair =
+        deviceStartEndTimePairMap.getOrDefault(device, new Pair<>(Long.MAX_VALUE, Long.MIN_VALUE));
+    long newStartTime = startEndTimePair.left > startTime ? startTime : startEndTimePair.left;
+    deviceStartEndTimePairMap.put(device, new Pair<>(newStartTime, startEndTimePair.right));
+    startEndTimeCache.put(tsFileResource, deviceStartEndTimePairMap);
+  }

Review comment:
       Use `computeIfAbsent` then you may avoid calling extra `put` and you can directly modify the time pairs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org