You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@iotdb.apache.org by "刘珍 (Jira)" <ji...@apache.org> on 2022/07/27 08:11:00 UTC

[jira] [Commented] (IOTDB-3247) [Persistent schema] [wal recovery] Aligned sensors, query lost data

    [ https://issues.apache.org/jira/browse/IOTDB-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571789#comment-17571789 ] 

刘珍 commented on IOTDB-3247:
---------------------------

rel_13_ae3a580,已解决。

> [Persistent schema] [wal recovery] Aligned sensors, query lost data
> -------------------------------------------------------------------
>
>                 Key: IOTDB-3247
>                 URL: https://issues.apache.org/jira/browse/IOTDB-3247
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: Core/WAL
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: yanze chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>         Attachments: config.properties, count_ts_500dev.sh, del_ts.sh, dev_name.txt, get_dev_name.sh, image-2022-05-20-16-09-01-848.png, select_count_ts_500dev.sh
>
>
> master_0519_81b9117
> 问题描述(元数据持久化 + WAL恢复):
> 100sg,500dev,20万序列/dev,共1亿对齐序列,每个序列写入10个点。
> 每个device,delete 51个序列,重启iotdb,wal恢复有2个问题:
> 问题1:未被delete的部分序列,{color:#DE350B}*查询少数据*{color}(值小于10)
> 问题2:恢复过程中有NPE
> 2022-05-20 14:16:09,213 [pool-15-IoTDB-WAL-Recover-2] WARN  o.a.i.d.w.r.f.UnsealedTsFileRecoverPerformer:208 - meet error when redo wal of /data/liuzhen_test/master_0519_81b9117/datanode/./sbin/../data/data/sequence/root.test.g_99/0/0/1652977295224-2-0-0.tsfile
> org.apache.iotdb.db.exception.WriteProcessException: java.lang.NullPointerException
>         at org.apache.iotdb.db.engine.memtable.AbstractMemTable.insertAlignedTablet(AbstractMemTable.java:394)
>         at org.apache.iotdb.db.wal.recover.file.TsFilePlanRedoer.redoInsert(TsFilePlanRedoer.java:128)
>         at org.apache.iotdb.db.wal.recover.file.UnsealedTsFileRecoverPerformer.redoLog(UnsealedTsFileRecoverPerformer.java:191)
>         at org.apache.iotdb.db.wal.recover.WALNodeRecoverTask.recoverTsFiles(WALNodeRecoverTask.java:137)
>         at org.apache.iotdb.db.wal.recover.WALNodeRecoverTask.run(WALNodeRecoverTask.java:63)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException: null
>         at org.apache.iotdb.db.utils.datastructure.AlignedTVList.arrayCopy(AlignedTVList.java:808)
>         at org.apache.iotdb.db.utils.datastructure.AlignedTVList.putAlignedValues(AlignedTVList.java:736)
>         at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunk.putAlignedValues(AlignedWritableMemChunk.java:152)
>         at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunk.writeAlignedValues(AlignedWritableMemChunk.java:182)
>         at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunkGroup.writeValues(AlignedWritableMemChunkGroup.java:55)
>         at org.apache.iotdb.db.engine.memtable.AbstractMemTable.writeAlignedTablet(AbstractMemTable.java:545)
>         at org.apache.iotdb.db.engine.memtable.AbstractMemTable.insertAlignedTablet(AbstractMemTable.java:377)
>         ... 9 common frames omitted
> 测试流程
> 1. 192.168.10.68  72C256G
> iotdb路径:/data/liuzhen_test/master_0519_81b9117/datanode
> iotdb配置(其余不改动):
> MAX_HEAP_SIZE="192G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> mlog_buffer_size=10485760
> schema_engine_mode=Schema_File
> benchmark路径:/data/benchmark/weekly_shell/bm_0514_ee75a49
> bm配置见附件。
> 2. 启动iotdb,运行benchmark
> 耗时大概3小时。
> 3. delete 序列前的数据验证
> 正确
> count_ts_500dev.sh 每个设备20万序列
> select_count_ts_500dev.sh 查询序列10个点数据。
> 4. 每个设备delete 51个序列
> 运行del_ts.sh
> 5. delete 序列后,停止iotdb前,再次验证数据的正确性
> 正确
> count_ts_500dev.sh 每个设备199949序列
> select_count_ts_500dev.sh 查询序列10个点数据。
> 6.停止iotdb
> 7. 备份数据,日志
> 8.重新启动iotdb,查看日志,有NPE
> 9. iotdb恢复成功,执行
> select_count_ts_500dev.sh   {color:#DE350B}*部分少数据的序列*{color}(只列举部分)
>  !image-2022-05-20-16-09-01-848.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)