You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/03/01 08:45:18 UTC

[jira] [Resolved] (KUDU-1341) Out of order UNDO in delta file, possibly related to REINSERT

     [ https://issues.apache.org/jira/browse/KUDU-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved KUDU-1341.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 0.8.0

Fixed in e765f1c and 1ff209e

> Out of order UNDO in delta file, possibly related to REINSERT
> -------------------------------------------------------------
>
>                 Key: KUDU-1341
>                 URL: https://issues.apache.org/jira/browse/KUDU-1341
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 0.6.0
>            Reporter: Mike Percy
>            Assignee: Todd Lipcon
>             Fix For: 0.8.0
>
>
> As reported by Bruce Song Zhang, there is a crasher in debug mode in tablet compaction under load.
> Here is a snippet of the log file he sent:
> {noformat}
> I0219 10:03:27.503398 31721 delta_tracker.cc:456] Flushing 8 deltas from DMS 114...
> I0219 10:03:27.504659 31721 delta_tracker.cc:410] Flushed delta block: 2109978395164279242
> I0219 10:03:27.504781 31721 delta_tracker.cc:416] Reopened delta block for read: 2109978395164279242
> I0219 10:03:28.150115 31721 maintenance_manager.cc:359] Time spent running FlushDeltaMemStoresOp(c6069618c9864c1d8422912bf69efd44): real 0.667s	user 0.647s	sys 0.020s
> I0219 10:03:28.665895 31721 log_reader.cc:415] T c6069618c9864c1d8422912bf69efd44: removed 1 log segments from log reader
> I0219 10:03:28.665935 31721 log.cc:736] Deleting log segment in path: /export/servers/kudu/tserver_wal_data/wals/c6069618c9864c1d8422912bf69efd44/wal-000000223 (GCed ops < 61476431)
> I0219 10:03:28.704988 31721 maintenance_manager.cc:359] Time spent running LogGCOp(c6069618c9864c1d8422912bf69efd44): real 0.039s	user 0.000s	sys 0.039s
> I0219 10:03:29.456583 31721 tablet.cc:653] Flush: entering stage 1 (old memrowset already frozen for inserts)
> I0219 10:03:29.456614 31721 compaction.cc:591] Selected 1 rowsets to compact:
> I0219 10:03:29.456630 31721 compaction.cc:594] memrowset(current size on disk: ~0 bytes)
> I0219 10:03:29.456686 31721 tablet.cc:655] Memstore in-memory size: 33095447 bytes
> I0219 10:03:29.456742 31721 tablet.cc:1132] Flush: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
> I0219 10:03:29.457515 31721 multi_column_writer.cc:82] Opened CFile writer for column country_id[int32 NOT NULL]
> I0219 10:03:29.457700 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_id[int64 NOT NULL]
> I0219 10:03:29.457865 31721 multi_column_writer.cc:82] Opened CFile writer for column cur_day[int32 NOT NULL]
> I0219 10:03:29.458027 31721 multi_column_writer.cc:82] Opened CFile writer for column country_name[string NOT NULL]
> I0219 10:03:29.458173 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_name[string NOT NULL]
> I0219 10:03:29.458333 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_id[int32 NOT NULL]
> I0219 10:03:29.458493 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_name[string NOT NULL]
> I0219 10:03:29.458636 31721 multi_column_writer.cc:82] Opened CFile writer for column product_id[int64 NOT NULL]
> I0219 10:03:29.458777 31721 multi_column_writer.cc:82] Opened CFile writer for column product_name[string NOT NULL]
> I0219 10:03:29.458927 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1[int32 NOT NULL]
> I0219 10:03:29.459066 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1_name[string NOT NULL]
> I0219 10:03:29.459198 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2[int32 NOT NULL]
> I0219 10:03:29.459347 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2_name[string NOT NULL]
> I0219 10:03:29.459491 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3[int32 NOT NULL]
> I0219 10:03:29.459630 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3_name[string NOT NULL]
> I0219 10:03:29.459774 31721 multi_column_writer.cc:82] Opened CFile writer for column data_type[int8 NOT NULL]
> I0219 10:03:29.459910 31721 multi_column_writer.cc:82] Opened CFile writer for column price[double NOT NULL]
> I0219 10:03:29.460060 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_date[timestamp NOT NULL]
> I0219 10:03:29.460197 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_number[int64 NOT NULL]
> I0219 10:03:29.460340 31721 multi_column_writer.cc:82] Opened CFile writer for column gmv[double NOT NULL]
> W0219 10:03:29.572443 31721 compaction.cc:714] Found REINSERT REDO truncating row history for Source Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233) Redo Mutations: [@5963145475847274496(DELETE), @5963145476833153024(REINSERT (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233))] Undo Mutations: [@5963144433666650112(DELETE)]
> Dest Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233) Redo Mutations: [] Undo Mutations: [@5963145476833153024(DELETE)] Note: this warning will appear only for the first truncated row
> W0219 10:03:32.302188 31721 compaction.cc:834] Total 7 rows lost some history due to REINSERT after DELETE
> I0219 10:03:32.346256 31721 tablet.cc:1222] Flush: entering phase 2 (starting to duplicate updates in new rowsets)
> I0219 10:03:32.846774 31721 tablet.cc:1277] Flush Phase 2: carrying over any updates which arrived during Phase 1
> I0219 10:03:32.846839 31721 tablet.cc:1278] Phase 2 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
> I0219 10:03:34.485435 31721 tablet.cc:1311] Flush successful on 52051 rows (19462877 bytes)
> I0219 10:03:34.501569 31721 maintenance_manager.cc:359] Time spent running FlushMRSOp(c6069618c9864c1d8422912bf69efd44): real 5.541s	user 5.454s	sys 0.081s
> I0219 10:05:45.657285 31721 tablet.cc:1333] Compaction: stage 1 complete, picked 6 rowsets to compact
> I0219 10:05:45.657618 31721 compaction.cc:591] Selected 6 rowsets to compact:
> I0219 10:05:45.657727 31721 compaction.cc:594] RowSet(15418)(current size on disk: ~30348706 bytes)
> I0219 10:05:45.657793 31721 compaction.cc:594] RowSet(15544)(current size on disk: ~1123338 bytes)
> I0219 10:05:45.657855 31721 compaction.cc:594] RowSet(15551)(current size on disk: ~26151132 bytes)
> I0219 10:05:45.657886 31721 compaction.cc:594] RowSet(15610)(current size on disk: ~30315579 bytes)
> I0219 10:05:45.657913 31721 compaction.cc:594] RowSet(15611)(current size on disk: ~32233272 bytes)
> I0219 10:05:45.657934 31721 compaction.cc:594] RowSet(15617)(current size on disk: ~18863054 bytes)
> I0219 10:05:45.657984 31721 tablet.cc:1132] Compaction: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
> I0219 10:05:45.661224 31721 multi_column_writer.cc:82] Opened CFile writer for column country_id[int32 NOT NULL]
> I0219 10:05:45.661432 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_id[int64 NOT NULL]
> I0219 10:05:45.661598 31721 multi_column_writer.cc:82] Opened CFile writer for column cur_day[int32 NOT NULL]
> I0219 10:05:45.661761 31721 multi_column_writer.cc:82] Opened CFile writer for column country_name[string NOT NULL]
> I0219 10:05:45.661933 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_name[string NOT NULL]
> I0219 10:05:45.662081 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_id[int32 NOT NULL]
> I0219 10:05:45.662228 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_name[string NOT NULL]
> I0219 10:05:45.662384 31721 multi_column_writer.cc:82] Opened CFile writer for column product_id[int64 NOT NULL]
> I0219 10:05:45.662542 31721 multi_column_writer.cc:82] Opened CFile writer for column product_name[string NOT NULL]
> I0219 10:05:45.662695 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1[int32 NOT NULL]
> I0219 10:05:45.662832 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1_name[string NOT NULL]
> I0219 10:05:45.662962 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2[int32 NOT NULL]
> I0219 10:05:45.663100 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2_name[string NOT NULL]
> I0219 10:05:45.663239 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3[int32 NOT NULL]
> I0219 10:05:45.663393 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3_name[string NOT NULL]
> I0219 10:05:45.663538 31721 multi_column_writer.cc:82] Opened CFile writer for column data_type[int8 NOT NULL]
> I0219 10:05:45.663679 31721 multi_column_writer.cc:82] Opened CFile writer for column price[double NOT NULL]
> I0219 10:05:45.663830 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_date[timestamp NOT NULL]
> I0219 10:05:45.663964 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_number[int64 NOT NULL]
> I0219 10:05:45.664091 31721 multi_column_writer.cc:82] Opened CFile writer for column gmv[double NOT NULL]
> F0219 10:06:01.038265 31721 deltafile.cc:136] Check failed: last_key_.CompareTo<UNDO>(key) <= 0 must insert undo deltas in sorted order (ascending key, then descending ts): got key (row 278573@tx5962430420551352320) after (row 278573@tx5962430179277082624)
> {noformat}
> It appears that the timestamps are misordered (in ascending order instead of descending order), triggering a DCHECK. We don't normally see this in load tests because we run with RELEASE mode in most of our long running stress tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)