Skip to content

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Jan 26, 2026

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • Tests

    • Added a test verifying default-value handling when reading decoded blocks.
  • Chores

    • Expanded diagnostic logging across read/write and decoding flows to improve observability of schema snapshots, decoding epochs, default-value handling, and key/value processing.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Jan 26, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jiaqizho for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 26, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 26, 2026

📝 Walkthrough

Walkthrough

Adds observational logging across KVStore read/write and decoding flows, a new test exercising default-value decoding in RegionBlockReader, and minor include adjustments. No functional logic or public API changes.

Changes

Cohort / File(s) Summary
KVStore read/decoding logs
dbms/src/Storages/KVStore/Decode/RegionBlockReader.cpp, dbms/src/Storages/KVStore/Decode/PartitionStreams.cpp
Inserted LOG_INFO statements to emit schema snapshot details, decoding metadata (keyspace_id, table_id, region_id, epoch, update_ts, column counts) and to log iterated data values before decoding. Observational only.
Raft write logging
dbms/src/Storages/KVStore/MultiRaft/RaftCommands.cpp
Added LOG statements to trace Put operations (keys and values) during Region::handleWriteRaftCmd, including by-index Put paths. No control-flow changes.
RowV2 decoding logs & include
dbms/src/TiDB/Decode/RowCodec.cpp
Added #include <Common/FieldVisitors.h> and LOG_INFO traces for collected not-null/null column IDs and when filling missing columns with default values during RowV2 decoding.
DeltaMerge schema log
dbms/src/Storages/StorageDeltaMerge.cpp
Added a log after refreshing/creating decoding schema snapshot reporting table_id, epoch, update_ts, version-column flag, and serialized table_info.
Tests
dbms/src/Storages/KVStore/tests/gtest_region_block_reader.cpp
Added RegionBlockReaderTest.ReadFromRegionDefaultValue to validate decoding with table defaults and injected region KV pairs (appended test block).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • JinheLin
  • hongyunyan
  • CalvinNeo

Poem

🐰 I hop through logs by lamplight,

Default carrots tucked in sight,
Rows decoded, bytes aglow,
Quiet traces help me know,
Hoppity hops—debug delight! 🥕✨

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is a blank template with no concrete details filled in: no problem statement, no implementation explanation, no issue reference, and all test/side-effect checkboxes unchecked. Fill in the problem summary referencing the schema version mismatch issue, provide a commit message explaining the debug logging additions, and document which test cases were added or modified.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title '[DNM] Add debug log and test case' is vague and generic, using non-descriptive terms that don't convey meaningful information about the core changeset. Replace '[DNM]' prefix with a clear problem context and be more specific about which debug logs are added (e.g., 'Add debug logging for schema refresh and KV pair decoding').

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@dbms/src/Storages/KVStore/Decode/RegionBlockReader.cpp`:
- Around line 47-50: The loop in RegionBlockReader::read currently logs every
data.value at INFO using Logger::get("dddddddddd") and unconditionally calls
toDebugString(); replace the ad-hoc logger with a cached LoggerPtr (e.g., log =
Logger::get("RegionBlockReader")) and change the logging to a TRACE/DEBUG level,
guarding the toDebugString() call so it is only executed when that level is
enabled (for example, if (log->trace()) { LOG_TRACE(log,
"RegionBlockReader::read data.value={}", data.value->toDebugString()); } ) to
avoid leaking data and excessive overhead.

In `@dbms/src/Storages/KVStore/MultiRaft/RaftCommands.cpp`:
- Line 371: The current INFO log in Region::handleWriteRaftCmd logs full KV
payloads using Logger::get("dddddddddd") and always calls
tikv_key.toDebugString()/tikv_value.toDebugString(); change this to use the
region's LoggerPtr (the region's `log`) and include region_id and CF/context in
the message, downgrade the level to TRACE (or DEBUG) and wrap the
toDebugString() calls in a log-level guard (e.g. if
log->isTraceEnabled()/isDebugEnabled()) to avoid the cost when the level is off;
ensure the message no longer uses the hardcoded "dddddddddd" logger and instead
references the region log with concise context instead of printing full payloads
at INFO.

In `@dbms/src/Storages/KVStore/tests/gtest_region_block_reader.cpp`:
- Around line 696-744: The test ReadFromRegionDefaultValue currently only checks
that RegionBlockReader::read returns true and logs the decoded block—add
explicit assertions to validate the decoded result: use
createBlockSortByColumnID(decoding_schema) and inspect the block returned by
reader.read(*data_list_read, false) to assert the expected row count and that
specific columns (e.g., columns with ids for c2/c5/c4 as defined in TableInfo)
contain the correct default or origin_default values (and types) for each row;
remove the LOG_INFO call to avoid noisy output and replace it with
EXPECT_EQ/EXPECT_* checks on res_block.getColumnsWithTypeAndName() or helper
functions you already use to extract column contents (e.g.,
DB::tests::getColumnsContent) and assert exact strings/values. Ensure assertions
cover both presence of default values and their types/values per row so
regressions are detected.

In `@dbms/src/TiDB/Decode/RowCodec.cpp`:
- Around line 464-468: Replace the per-row LOG_INFO calls in RowV2 decoding (the
occurrences that use Logger::get("dddddddddd") and log
not_null_column_ids/null_column_ids) with a guarded lower‑level log (TRACE or
DEBUG) by first fetching a LoggerPtr with proper context (e.g., use the existing
class/module LoggerPtr pattern instead of Logger::get("dddddddddd")), check the
logger's level (logger->isTraceEnabled() or isDebugEnabled()) before
formatting/stringifying the vectors, and only then call LOG_TRACE/LOG_DEBUG with
the LoggerPtr and the prepared message; apply the same guarded change to the
other similar blocks referenced (the occurrences around the 510-521 and 598-607
regions) so per-row decoding avoids expensive unguarded formatting and ad‑hoc
logger names.
🧹 Nitpick comments (2)
dbms/src/Storages/KVStore/Decode/RegionBlockReader.cpp (1)

185-188: Downgrade schema snapshot dumps to TRACE/DEBUG and use a consistent logger.

Dumping column_defines at INFO on every read adds non‑trivial formatting overhead and clutters production logs. Consider TRACE/DEBUG with a cached LoggerPtr instead of "dddddddddd". As per coding guidelines, use LoggerPtr with LOG_INFO/LOG_TRACE and relevant context.

dbms/src/TiDB/Decode/RowCodec.cpp (1)

23-23: Use standard include style.

Prefer angle‑bracket includes from dbms/src for consistency (e.g., <Common/FieldVisitors.h>). As per coding guidelines, use relative include paths with angle brackets.

🔧 Suggested tweak
-#include "Common/FieldVisitors.h"
+#include <Common/FieldVisitors.h>

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jan 26, 2026

Reproduce the tiflash and tikv inconsist problem with SQLs as the issue describe
pingcap/tidb#61681

DROP DATABASE IF EXISTS db_1483421321;
CREATE DATABASE db_1483421321;
USE db_1483421321;
CREATE TABLE t0(c0 TINYINT, handle INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(handle));
ALTER TABLE t0 SET TIFLASH REPLICA 1;
-- this SLEEP will make sure the case always reproducable, it not, make it longer
select sleep(5);
SELECT /*+ read_from_storage(tiflash[t0]) */ * FROM t0 order by 1;
SELECT /*+ read_from_storage(tikv[t0]) */ * FROM t0 order by 1;

ALTER TABLE t0 ADD COLUMN c1 DECIMAL NULL;
ALTER TABLE t0 ADD COLUMN c2 FLOAT NULL DEFAULT 0.0795439693286002;
ALTER TABLE t0 ADD COLUMN c5 DECIMAL NOT NULL;
ALTER TABLE t0 ADD COLUMN c4 DECIMAL DEFAULT 247262911;
ALTER TABLE t0 MODIFY COLUMN c2 FLOAT NOT NULL;
ALTER TABLE t0 MODIFY COLUMN c5 DECIMAL NULL;
ALTER TABLE t0 MODIFY COLUMN c1 BIGINT DEFAULT -56083770 NOT NULL;
-- first insert
INSERT IGNORE INTO t0 (c1, c2, c5, c4) VALUES (-2051270087, 0.44141045099028775, 0.0, 15523);
UPDATE t0 SET c5 = 870337888;

ALTER TABLE t0 MODIFY COLUMN c1 BIGINT NULL;
-- second insert
INSERT INTO t0 (c2, c5, c4) VALUES (0.004406799693866592, 0.8752311290235516, 14652);

-- in tiflash, handle=2, c1=0
SELECT /*+ read_from_storage(tiflash[t0]) */ * FROM t0 order by 1;
+--------+--------+-------------+--------------+-----------+-------+
| c0     | handle | c1          | c2           | c5        | c4    |
+--------+--------+-------------+--------------+-----------+-------+
| <null> | 1      | -2051270087 | 0.44141045   | 870337888 | 15523 |
| <null> | 2      | 0           | 0.0044067996 | 1         | 14652 |
+--------+--------+-------------+--------------+-----------+-------+

-- in tikv, handle=2, c1=null
SELECT /*+ read_from_storage(tikv[t0]) */ * FROM t0 order by 1;
+--------+--------+-------------+--------------+-----------+-------+
| c0     | handle | c1          | c2           | c5        | c4    |
+--------+--------+-------------+--------------+-----------+-------+
| <null> | 1      | -2051270087 | 0.44141045   | 870337888 | 15523 |
| <null> | 2      | <null>      | 0.0044067996 | 1         | 14652 |
+--------+--------+-------------+--------------+-----------+-------+
# First, tiflash schema the first schema and there is only columns ["c0", "handle"]
[2026/01/26 20:02:16.183 +08:00] [INFO] [SchemaBuilder.cpp:1160] ["Create table db_709.t0 (database_id=709 table_id=711) with statement: CREATE TABLE `db_709`.`t_711`(`c0` Nullable(Int8), `handle` Int32) Engine = DeltaMerge((`handle`), '{\"cols\":[{\"id\":1,\"name\":{\"L\":\"c0\",\"O\":\"c0\"},\"offset\":0,\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":0,\"Flen\":4,\"Tp\":1}},{\"id\":2,\"name\":{\"L\":\"handle\",\"O\":\"handle\"},\"offset\":1,\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":515,\"Flen\":11,\"Tp\":3}}],\"id\":711,\"index_info\":[],\"is_common_handle\":false,\"keyspace_id\":4294967295,\"name\":{\"L\":\"t0\",\"O\":\"t0\"},\"pk_is_handle\":true,\"state\":5,\"tiflash_replica\":{\"Available\":true,\"Count\":1},\"update_timestamp\":463845178115883014}', 0)"] [source="keyspace=4294967295"] [thread_id=2195]
[2026/01/26 20:02:16.186 +08:00] [INFO] [SchemaBuilder.cpp:1194] ["Create table db_709.t0 end, database_id=709 table_id=711 action=ApplyTable"] [source="keyspace=4294967295"] [thread_id=2195]
[2026/01/26 20:02:16.187 +08:00] [INFO] [TiDBSchemaSyncer.cpp:320] ["Sync table schema end after syncSchemas, table_id=711"] [source="keyspace=4294967295"] [thread_id=2195]
...

# TiFlash receive the first raftlog from TiKV
[2026/01/26 20:02:21.200 +08:00] [INFO] [RaftCommands.cpp:371] ["Region::handleWriteRaftCmd Put key=7480000000000002FFC75F728000000000FF0000010000000000FAF99017441A07FFF0, value=508D80E0AFBE97FAB706762C8000040000000405060708000F0016001A00BFDC4011A00000000A0080000000000A008000003CA3391ABC85"] [source=dddddddddd] [thread_id=2038]

# TiFlash try to decode the first raftlog according to its known schema with ["c0", "handle"]
# But found that there is new column_id encoded. So refresh its own schema from TiKV, then 
# retry to decode the raftlog.
# In this schema, the "c1" now is column_id=3 with flag=0, which means the column is nullable
# {"id":3,"name":{"L":"c1","O":"c1"},"offset":2,"state":5,"type":{"Flag":0,"Flen":10,"Tp":246}}

[2026/01/26 20:02:21.203 +08:00] [INFO] [StorageDeltaMerge.cpp:1264] ["Refresh decoding schema snapshot, table_id=711 epoch=1 update_ts=463845180265463812 with_version_column=true table_info={\"cols\":[{\"id\":1,\"name\":{\"L\":\"c0\",\"O\":\"c0\"},\"offset\":0,\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":0,\"Flen\":4,\"Tp\":1}},{\"id\":2,\"name\":{\"L\":\"handle\",\"O\":\"handle\"},\"offset\":1,\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":515,\"Flen\":11,\"Tp\":3}},{\"id\":3,\"name\":{\"L\":\"c1\",\"O\":\"c1\"},\"offset\":2,\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":0,\"Flen\":10,\"Tp\":246}},{\"default\":\"0.07954397\",\"id\":4,\"name\":{\"L\":\"c2\",\"O\":\"c2\"},\"offset\":3,\"origin_default\":\"0.07954397\",\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":-1,\"Flag\":0,\"Flen\":12,\"Tp\":4}},{\"id\":5,\"name\":{\"L\":\"c5\",\"O\":\"c5\"},\"offset\":4,\"origin_default\":\"0\",\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":4097,\"Flen\":10,\"Tp\":246}},{\"default\":\"247262911\",\"id\":6,\"name\":{\"L\":\"c4\",\"O\":\"c4\"},\"offset\":5,\"origin_default\":\"247262911\",\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":0,\"Flen\":10,\"Tp\":246}}],\"id\":711,\"index_info\":[],\"is_common_handle\":false,\"keyspace_id\":4294967295,\"name\":{\"L\":\"t0\",\"O\":\"t0\"},\"pk_is_handle\":true,\"state\":5,\"tiflash_replica\":{\"Available\":true,\"Count\":1},\"update_timestamp\":463845180265463812}"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.203 +08:00] [INFO] [PartitionStreams.cpp:173] ["decode with schema snapshot, keyspace=4294967295 table_id=711 region_id=8206207 epoch=1 update_ts=463845180265463812 with_version_column=true columns=6"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.204 +08:00] [INFO] [RegionBlockReader.cpp:49] ["RegionBlockReader::read data.value=8000040000000405060708000F0016001A00BFDC4011A00000000A0080000000000A008000003CA3391ABC85"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.204 +08:00] [INFO] [RegionBlockReader.cpp:188] ["RegionBlockReader::readImpl start, schema_snapshot.column_defines=[-1/Int64, -1024/UInt64, -1025/UInt8, 2/Int32, 1/Nullable(Int8), 3/Nullable(Decimal(10,0)), 4/Nullable(Float32), 5/Decimal(10,0), 6/Nullable(Decimal(10,0))]"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.204 +08:00] [INFO] [RowCodec.cpp:468] ["not_null_column_ids=[4, 5, 6, 7] null_column_ids=[]"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.204 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=1 next_datum_column_id=4 block_column_pos=3 column_info={name=c0 id=1 not_null=false default_value=NULL}"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.204 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=2 next_datum_column_id=4 block_column_pos=4 column_info={name=handle id=2 not_null=true default_value=0}"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.205 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=3 next_datum_column_id=4 block_column_pos=5 column_info={name=c1 id=3 not_null=false default_value=NULL}"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.205 +08:00] [INFO] [TiDBSchemaSyncer.cpp:296] ["Sync table schema begin, table_id=711"] [source="keyspace=4294967295"] [thread_id=2038]
[2026/01/26 20:02:21.210 +08:00] [INFO] [SchemaBuilder.cpp:1705] ["Alter table db_709.t0 begin, database_id=709 table_id=711"] [source="keyspace=4294967295"] [thread_id=2038]
[2026/01/26 20:02:21.218 +08:00] [INFO] [SchemaBuilder.cpp:1720] ["Alter table db_709.t0 end, database_id=709 table_id=711"] [source="keyspace=4294967295"] [thread_id=2038]
[2026/01/26 20:02:21.219 +08:00] [INFO] [TiDBSchemaSyncer.cpp:305] ["Sync table schema end, table_id=711"] [source="keyspace=4294967295"] [thread_id=2038]
[2026/01/26 20:02:21.219 +08:00] [INFO] [PartitionStreams.cpp:263] ["sync schema cost 13 ms, keyspace=4294967295 table_id=711"] [thread_id=2038]

[2026/01/26 20:02:21.221 +08:00] [INFO] [StorageDeltaMerge.cpp:1264] ["Refresh decoding schema snapshot, table_id=711 epoch=2 update_ts=463845180343844895 with_version_column=true table_info={\"cols\":[{\"id\":1,\"name\":{\"L\":\"c0\",\"O\":\"c0\"},\"offset\":0,\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":0,\"Flen\":4,\"Tp\":1}},{\"id\":2,\"name\":{\"L\":\"handle\",\"O\":\"handle\"},\"offset\":1,\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":515,\"Flen\":11,\"Tp\":3}},{\"default\":\"-56083770\",\"id\":7,\"name\":{\"L\":\"c1\",\"O\":\"c1\"},\"offset\":2,\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":1,\"Flen\":20,\"Tp\":8}},{\"id\":4,\"name\":{\"L\":\"c2\",\"O\":\"c2\"},\"offset\":3,\"origin_default\":\"0.07954397\",\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":-1,\"Flag\":4097,\"Flen\":12,\"Tp\":4}},{\"id\":5,\"name\":{\"L\":\"c5\",\"O\":\"c5\"},\"offset\":4,\"origin_default\":\"0\",\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":0,\"Flen\":10,\"Tp\":246}},{\"default\":\"247262911\",\"id\":6,\"name\":{\"L\":\"c4\",\"O\":\"c4\"},\"offset\":5,\"origin_default\":\"247262911\",\"state\":5,\"type\":{\"Charset\":\"binary\",\"Collate\":\"binary\",\"Decimal\":0,\"Flag\":0,\"Flen\":10,\"Tp\":246}}],\"id\":711,\"index_info\":[],\"is_common_handle\":false,\"keyspace_id\":4294967295,\"name\":{\"L\":\"t0\",\"O\":\"t0\"},\"pk_is_handle\":true,\"state\":5,\"tiflash_replica\":{\"Available\":true,\"Count\":1},\"update_timestamp\":463845180343844895}"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.222 +08:00] [INFO] [PartitionStreams.cpp:173] ["decode with schema snapshot, keyspace=4294967295 table_id=711 region_id=8206207 epoch=2 update_ts=463845180343844895 with_version_column=true columns=6"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.222 +08:00] [INFO] [RegionBlockReader.cpp:49] ["RegionBlockReader::read data.value=8000040000000405060708000F0016001A00BFDC4011A00000000A0080000000000A008000003CA3391ABC85"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.222 +08:00] [INFO] [RegionBlockReader.cpp:188] ["RegionBlockReader::readImpl start, schema_snapshot.column_defines=[-1/Int64, -1024/UInt64, -1025/UInt8, 2/Int32, 1/Nullable(Int8), 4/Float32, 5/Nullable(Decimal(10,0)), 6/Nullable(Decimal(10,0)), 7/Int64]"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.222 +08:00] [INFO] [RowCodec.cpp:468] ["not_null_column_ids=[4, 5, 6, 7] null_column_ids=[]"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.222 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=1 next_datum_column_id=4 block_column_pos=3 column_info={name=c0 id=1 not_null=false default_value=NULL}"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.222 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=2 next_datum_column_id=4 block_column_pos=4 column_info={name=handle id=2 not_null=true default_value=0}"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.222 +08:00] [INFO] [DeltaMergeStore.cpp:2358] ["creating the first segment with segment_id=1"] [source="keyspace=4294967295 table_id=711"] [thread_id=2038]


# TiFlash try to decode the second raftlog according to its latest known schema.
# But again, found that decode error, so refresh the tiflash own schema from TiKV, then
# retry to decode the raftlog.
# In this schema, the "c1" now is column_id=7 with flag=1, which means the column is NOT nullable
# {"default":"-56083770","id":7,"name":{"L":"c1","O":"c1"},"offset":2,"state":5,"type":{"Flag":1,"Flen":20,"Tp":8}}

[2026/01/26 20:02:21.252 +08:00] [INFO] [RaftCommands.cpp:371] ["Region::handleWriteRaftCmd Put key=7480000000000002FFC75F728000000000FF0000010000000000FAF99017441A07FFEE, value=509080E0AFBE97FAB706762C8000040000000405060708000F0016001A00BFDC4011A00000000A008033E04D600A008000003CA3391ABC85"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.252 +08:00] [INFO] [PartitionStreams.cpp:173] ["decode with schema snapshot, keyspace=4294967295 table_id=711 region_id=8206207 epoch=2 update_ts=463845180343844895 with_version_column=true columns=6"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.252 +08:00] [INFO] [RegionBlockReader.cpp:49] ["RegionBlockReader::read data.value=8000040000000405060708000F0016001A00BFDC4011A00000000A008033E04D600A008000003CA3391ABC85"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.253 +08:00] [INFO] [RegionBlockReader.cpp:188] ["RegionBlockReader::readImpl start, schema_snapshot.column_defines=[-1/Int64, -1024/UInt64, -1025/UInt8, 2/Int32, 1/Nullable(Int8), 4/Float32, 5/Nullable(Decimal(10,0)), 6/Nullable(Decimal(10,0)), 7/Int64]"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.253 +08:00] [INFO] [RowCodec.cpp:468] ["not_null_column_ids=[4, 5, 6, 7] null_column_ids=[]"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.253 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=1 next_datum_column_id=4 block_column_pos=3 column_info={name=c0 id=1 not_null=false default_value=NULL}"] [source=dddddddddd] [thread_id=2038]
[2026/01/26 20:02:21.253 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=2 next_datum_column_id=4 block_column_pos=4 column_info={name=handle id=2 not_null=true default_value=0}"] [source=dddddddddd] [thread_id=2038]


# TiFlash try to decode the third raftlog according to its latest known schema.
# This time, TiFlash does not meet decode error and continue to decode.


[2026/01/26 20:02:23.056 +08:00] [INFO] [RaftCommands.cpp:371] ["Region::handleWriteRaftCmd Put key=7480000000000002FFC75F728000000000FF0000020000000000FAF99017441A07FFD5, value=50A980E0AFBE97FAB706762580000300000004050608000F001600BF720CDD400000000A0080000000010A00800000393C"] [source=dddddddddd] [thread_id=2040]
[2026/01/26 20:02:23.056 +08:00] [INFO] [PartitionStreams.cpp:173] ["decode with schema snapshot, keyspace=4294967295 table_id=711 region_id=8206207 epoch=2 update_ts=463845180343844895 with_version_column=true columns=6"] [source=dddddddddd] [thread_id=2040]
[2026/01/26 20:02:23.057 +08:00] [INFO] [RegionBlockReader.cpp:49] ["RegionBlockReader::read data.value=80000300000004050608000F001600BF720CDD400000000A0080000000010A00800000393C"] [source=dddddddddd] [thread_id=2040]
[2026/01/26 20:02:23.057 +08:00] [INFO] [RegionBlockReader.cpp:188] ["RegionBlockReader::readImpl start, schema_snapshot.column_defines=[-1/Int64, -1024/UInt64, -1025/UInt8, 2/Int32, 1/Nullable(Int8), 4/Float32, 5/Nullable(Decimal(10,0)), 6/Nullable(Decimal(10,0)), 7/Int64]"] [source=dddddddddd] [thread_id=2040]
[2026/01/26 20:02:23.057 +08:00] [INFO] [RowCodec.cpp:468] ["not_null_column_ids=[4, 5, 6] null_column_ids=[]"] [source=dddddddddd] [thread_id=2040]
[2026/01/26 20:02:23.057 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=1 next_datum_column_id=4 block_column_pos=3 column_info={name=c0 id=1 not_null=false default_value=NULL}"] [source=dddddddddd] [thread_id=2040]
[2026/01/26 20:02:23.057 +08:00] [INFO] [RowCodec.cpp:521] ["appendRowV2ToBlockImpl: fill default value for missing column, next_column_id=2 next_datum_column_id=4 block_column_pos=4 column_info={name=handle id=2 not_null=true default_value=0}"] [source=dddddddddd] [thread_id=2040]

# In this schema, "c1" is column_id=7 with flag=1, so the column value is filled with "0" according to
# the original_default in ColumnInfo

[2026/01/26 20:02:23.057 +08:00] [INFO] [RowCodec.cpp:607] ["appendRowV2ToBlockImpl: fill default value for missing column, block_column_pos=8 column_info={name=c1 id=7 not_null=true default_value=0}"] [source=dddddddddd] [thread_id=2040]

The trigger conditions are:

  • The table has TiFlash replicas.
  • A column is altered from NOT NULL to NULLABLE via a DDL statement.
  • Data insertion or updates occur during the DDL operation.

Signed-off-by: JaySon-Huang <tshent@qq.com>
@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jan 26, 2026

Verify re-creating tiflash replica can fix the problem

-- Reproduce the inconsistency problem by SQL in the previous comment
-- in tiflash, handle=2, c1=0
SELECT /*+ read_from_storage(tiflash[t0]) */ * FROM t0 order by 1;
+--------+--------+-------------+--------------+-----------+-------+
| c0     | handle | c1          | c2           | c5        | c4    |
+--------+--------+-------------+--------------+-----------+-------+
| <null> | 1      | -2051270087 | 0.44141045   | 870337888 | 15523 |
| <null> | 2      | 0           | 0.0044067996 | 1         | 14652 |
+--------+--------+-------------+--------------+-----------+-------+
-- in tikv, handle=2, c1=null
SELECT /*+ read_from_storage(tikv[t0]) */ * FROM t0 order by 1;
+--------+--------+-------------+--------------+-----------+-------+
| c0     | handle | c1          | c2           | c5        | c4    |
+--------+--------+-------------+--------------+-----------+-------+
| <null> | 1      | -2051270087 | 0.44141045   | 870337888 | 15523 |
| <null> | 2      | <null>      | 0.0044067996 | 1         | 14652 |
+--------+--------+-------------+--------------+-----------+-------+

------ Fix by re-creating tiflash replica

-- remove tiflash replica
alter table db_1483421321.t0 set tiflash replica 0;

-- And check the Region peer distribution until all region of the given table are removed from TiFlash. So there is no rows with store_type = 'tiflash'.
select TABLE_ID, p.STORE_ID, ADDRESS, count(p.REGION_ID), 
  CASE WHEN JSON_CONTAINS(s.LABEL, JSON_OBJECT('key', 'engine', 'value', 'tiflash')) 
       THEN 'tiflash' ELSE 'tikv'END AS store_type, s.LABEL
from
  information_schema.tikv_region_status r,
  information_schema.tikv_region_peers p,
  information_schema.tikv_store_status s
where r.db_name = 'db_1483421321' and r.table_name = 't0'
  and r.region_id = p.region_id and p.store_id = s.store_id
group by TABLE_ID, p.STORE_ID, s.LABEL, ADDRESS;

-- re-add tifalsh replica after all region have been removed from tiflash
alter table db_1483421321.t0 set tiflash replica 1;

-- tiflash c1 is null for handle=2
SELECT /*+ read_from_storage(tiflash[t0]) */ * FROM t0 order by 1;
+--------+--------+-------------+--------------+-----------+-------+
| c0     | handle | c1          | c2           | c5        | c4    |
+--------+--------+-------------+--------------+-----------+-------+
| <null> | 1      | -2051270087 | 0.44141045   | 870337888 | 15523 |
| <null> | 2      | <null>      | 0.0044067996 | 1         | 14652 |
+--------+--------+-------------+--------------+-----------+-------+

-- tikv c1 is null for handle=2
SELECT /*+ read_from_storage(tikv[t0]) */ * FROM t0 order by 1;
+--------+--------+-------------+--------------+-----------+-------+
| c0     | handle | c1          | c2           | c5        | c4    |
+--------+--------+-------------+--------------+-----------+-------+
| <null> | 1      | -2051270087 | 0.44141045   | 870337888 | 15523 |
| <null> | 2      | <null>      | 0.0044067996 | 1         | 14652 |
+--------+--------+-------------+--------------+-----------+-------+


@JaySon-Huang JaySon-Huang changed the title Add debug log and test case [DNM] Add debug log and test case Jan 26, 2026
@JaySon-Huang
Copy link
Contributor Author

Summary:

The encoded key-value pairs stored in the Raft log do not contain any field that perfectly indicates which schema version should be used for decoding a given key-value entry. To address this, TiFlash employs the following strategies to decode Raft log entries during ingestion:

  1. TiFlash runs a background task that periodically fetches and applies SchemaDiff to maintain an up-to-date table schema internally. The default interval is 60 seconds.
  2. If, during Raft log decoding, the current local schema lacks necessary information (e.g., an unknown column_id), causing decoding to fail, TiFlash proactively triggers a SchemaSync: it fetches the latest SchemaDiff, applies it, and then retries decoding the Raft log entry.
  3. Similarly, if an unknown column_id (or similar schema mismatch) is encountered while processing a read request, TiFlash also triggers SchemaSync—fetching and applying the latest SchemaDiff—and then re-attempts the read operation.

However, under the current implementation, TiFlash cannot correctly handle a specific scenario: when a DDL alters a column from NOT NULL to NULLABLE, and concurrent Raft log entries containing NULL values for that column are written during the DDL execution window. In such cases, TiFlash erroneously fills in the original_default value instead of correctly interpreting the NULL.

Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 26, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 26, 2026

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-next-gen c155430 link true /test pull-integration-next-gen
pull-integration-test c155430 link true /test pull-integration-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant