-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Improve diff inline highlighting using per-character/word diff #16881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
623a6eb
to
8ab47b9
Compare
2426fb6
to
ee58098
Compare
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts. This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim#16768.
Previously in multi-buffer diff if a buffer doesn't have text in a particular block, it would cause all the texts in this block to show up as modified.
This helps making sure it will return the correct results when inline:simple changed to inline:none, but also previously it wouldn't respect changes to add/remove icase due to the caching. In general the caching probably does more harm than good.
This is a drive-by change to fix current bug where a character with composing character will not have icase work properly since the code was ignoring it.
Vim just fills buffers as they go and don't re-order them if a buffer is removed.
This happens when we have Å and å. Note that this doesn't fix normal diffing regarding such texts and simple highlighting. That could be fixed in the future.
5bca11a
to
8c33618
Compare
4a925aa
to
a6bad8a
Compare
Properly initialize df_changes to avoid uninitialize memory crash Also, properly initialize df_lnum / df_count to clear them to zero. We could make sure to never use the invalid values if the buffer is not NULL but in practice that is quite tricky and is prone to error making it easy to do memory bugs. Just set everything to 0 to make it easy to reason about.
a6bad8a
to
b479fc6
Compare
Fix overflow when parsing change Don't leak memory when diffing multiple buffers. Need to clear the dout after every diff_read()
08576cb
to
3f7f8e2
Compare
Ok this PR should be ready. Let me know if there are any feedbacks or thoughts. Implemented tests and previously the PR was also crashing in Linux / Windows builds (apparently macOS allocator automatically zeroes allocated memory, but other OSes don't) which are also now fixed. |
Unfortunately, this introduced a few Coverity warnings: 23. Condition !(diff_flags & (98304 /* 0x8000 | 0x10000 */)), taking true branch.
3587 if (!(diff_flags & ALL_INLINE_DIFF) || diff_internal_failed())
3588 {
3589 // Use simple algorithm
3590 int change_start = MAXCOL; // first col of changed area
3591 int change_end = -1; // last col of changed area
3592 int ret;
3593
3594 ret = diff_find_change_simple(wp, lnum, dp, idx, &change_start, &change_end);
3595
3596 // convert from inclusive end to exclusive end per diffline's contract
3597 change_end += 1;
3598
3599 // Create a mock diffline struct. We always only have one so no need to
3600 // allocate memory.
24. assignment: Assigning: idx = diff_buf_idx(wp->w_buffer). The value of idx is now between 1 and 8 (inclusive).
3601 idx = diff_buf_idx(wp->w_buffer);;
3602 CLEAR_FIELD(simple_diffline_change);
3603 diffline->changes = &simple_diffline_change;
3604 diffline->num_changes = 1;
3605 diffline->bufidx = idx;
CID 1645489: (#1 of 1): Out-of-bounds read (OVERRUN)
25. overrun-local: Overrunning array dp->df_lnum of 8 8-byte elements at element index 8 (byte offset 71) using index idx (which evaluates to 8).
3606 diffline->lineoff = lnum - dp->df_lnum[idx];
3607
CID 1645487:Out-of-bounds write (OVERRUN) [ "select issue" ]
3608 simple_diffline_change.dc_start[idx] = change_start;
CID 1645490:Out-of-bounds write (OVERRUN) [ "select issue" ]
3609 simple_diffline_change.dc_end[idx] = change_end;
CID 1645488:Out-of-bounds write (OVERRUN) [ "select issue" ]
3610 simple_diffline_change.dc_start_lnum_off[idx] = off;
CID 1645486:Out-of-bounds write (OVERRUN) [ "select issue" ]
3611 simple_diffline_change.dc_end_lnum_off[idx] = off;
3612 return ret;
3613 } Can we validate, that |
This was definitely a typo I missed, but is benign. At the top of the function we already do this to make sure it's not out of bounds: int
diff_find_change(
win_T *wp,
linenr_T lnum,
diffline_T *diffline)
{
diff_T *dp;
int idx;
int off;
idx = diff_buf_idx(wp->w_buffer);
if (idx == DB_COUNT) // cannot happen
return FALSE; This code was in the original |
#16988 will fix this. Let me know if that satisfies Coverity (Is there a public dashboard?) |
Coverity can only run once (on master branch) per day :( |
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary. Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts, add "inline:simple" to the defaults (which is the old behaviour) This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences. The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding. For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise. For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly. The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters. Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified. This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future. As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in vim/vim#16768. closes: vim/vim#16881 vim/vim@9943d47 Co-authored-by: Yee Cheng Chin
FYI: This might be included in your ToDo list, but word-wise inline diff does not recognize character class as a word boundary in a multibyte string. Actually 'w' and 'b' cursor move command and '<' and '>' regexp recognize the word boundary.
In mbyte.c, |
I just tested and you are right. I'll submit a fix. |
FWIW
It does mean if you use word diff right now it's impossible to customize how it groups words for say Japanese or Chinese (which is specifically a problem since all the Chinese characters in a whole sentence get lumped into a single "word" unlike Japanese which gets to split on kanji/hiragana/katakana etc). I guess that's why |
…word Previously inline word diff simply used Vim's definition of keyword to determine what is a word, which leads to multi-byte character classes such as emojis and CJK (Chinese/Japanese/Korean) characters all classifying as word characters, leading to entire sentences being grouped as a single word which does not provide meaningful information. Fix this by treating all non-alphanumeric characters (with class number above 2) as non-word characters, as there is usually no benefit in using word diff on them. These include CJK characters, emojis, and also subscript/superscript numbers. Meanwhile, multi-byte characters like Cyrillic and Greek letters will still continue to considered as words. Note that this is slightly inconsistent with how words are defined elsewhere, as Vim usually considers any character with class >=2 to be a "word". Related: vim#16881 (diff inline highlight)
…word Previously inline word diff simply used Vim's definition of keyword to determine what is a word, which leads to multi-byte character classes such as emojis and CJK (Chinese/Japanese/Korean) characters all classifying as word characters, leading to entire sentences being grouped as a single word which does not provide meaningful information in a diff highlight. Fix this by treating all non-alphanumeric characters (with class number above 2) as non-word characters, as there is usually no benefit in using word diff on them. These include CJK characters, emojis, and also subscript/superscript numbers. Meanwhile, multi-byte characters like Cyrillic and Greek letters will still continue to considered as words. Note that this is slightly inconsistent with how words are defined elsewhere, as Vim usually considers any character with class >=2 to be a "word". Related: vim#16881 (diff inline highlight)
…word Previously inline word diff simply used Vim's definition of keyword to determine what is a word, which leads to multi-byte character classes such as emojis and CJK (Chinese/Japanese/Korean) characters all classifying as word characters, leading to entire sentences being grouped as a single word which does not provide meaningful information in a diff highlight. Fix this by treating all non-alphanumeric characters (with class number above 2) as non-word characters, as there is usually no benefit in using word diff on them. These include CJK characters, emojis, and also subscript/superscript numbers. Meanwhile, multi-byte characters like Cyrillic and Greek letters will still continue to considered as words. Note that this is slightly inconsistent with how words are defined elsewhere, as Vim usually considers any character with class >=2 to be a "word". Related: vim#16881 (diff inline highlight)
I filed #17050 to fix this. The proposed solution is to simply treat all non-alphanumeric multi-byte characters as non-word in the context of word diff. It's slightly inconsistent with how Vim word motions work but I think this works best in this context. This mostly means emojis and CJK characters will treat each character as an individual word. |
Problem: inline word diff treats multibyte chars as word char (after 9.1.1243) Solution: treat all non-alphanumeric characters as non-word characters (Yee Cheng Chin) Previously inline word diff simply used Vim's definition of keyword to determine what is a word, which leads to multi-byte character classes such as emojis and CJK (Chinese/Japanese/Korean) characters all classifying as word characters, leading to entire sentences being grouped as a single word which does not provide meaningful information in a diff highlight. Fix this by treating all non-alphanumeric characters (with class number above 2) as non-word characters, as there is usually no benefit in using word diff on them. These include CJK characters, emojis, and also subscript/superscript numbers. Meanwhile, multi-byte characters like Cyrillic and Greek letters will still continue to considered as words. Note that this is slightly inconsistent with how words are defined elsewhere, as Vim usually considers any character with class >=2 to be a "word". related: #16881 (diff inline highlight) closes: #17050 Signed-off-by: Yee Cheng ChinSigned-off-by: Christian Brabandt
Problem: inline word diff treats multibyte chars as word char (after 9.1.1243) Solution: treat all non-alphanumeric characters as non-word characters (Yee Cheng Chin) Previously inline word diff simply used Vim's definition of keyword to determine what is a word, which leads to multi-byte character classes such as emojis and CJK (Chinese/Japanese/Korean) characters all classifying as word characters, leading to entire sentences being grouped as a single word which does not provide meaningful information in a diff highlight. Fix this by treating all non-alphanumeric characters (with class number above 2) as non-word characters, as there is usually no benefit in using word diff on them. These include CJK characters, emojis, and also subscript/superscript numbers. Meanwhile, multi-byte characters like Cyrillic and Greek letters will still continue to considered as words. Note that this is slightly inconsistent with how words are defined elsewhere, as Vim usually considers any character with class >=2 to be a "word". related: vim/vim#16881 (diff inline highlight) closes: vim/vim#17050 vim/vim@9aa120f Co-authored-by: Yee Cheng Chin
…har (#33323) Problem: inline word diff treats multibyte chars as word char (after 9.1.1243) Solution: treat all non-alphanumeric characters as non-word characters (Yee Cheng Chin) Previously inline word diff simply used Vim's definition of keyword to determine what is a word, which leads to multi-byte character classes such as emojis and CJK (Chinese/Japanese/Korean) characters all classifying as word characters, leading to entire sentences being grouped as a single word which does not provide meaningful information in a diff highlight. Fix this by treating all non-alphanumeric characters (with class number above 2) as non-word characters, as there is usually no benefit in using word diff on them. These include CJK characters, emojis, and also subscript/superscript numbers. Meanwhile, multi-byte characters like Cyrillic and Greek letters will still continue to considered as words. Note that this is slightly inconsistent with how words are defined elsewhere, as Vim usually considers any character with class >=2 to be a "word". related: vim/vim#16881 (diff inline highlight) closes: vim/vim#17050 vim/vim@9aa120f Co-authored-by: Yee Cheng Chin
Problem: Diff mode's inline highlighting is lackluster. It only performs a line-by-line comparison, and calculates a single shortest range within a line that could encompass all the changes. In lines with multiple changes, or those that span multiple lines, this approach tends to end up highlighting much more than necessary.
Solution: Implement new inline highlighting modes by doing per-character or per-word diff within the diff block, and highlight only the relevant parts.
This change introduces a new diffopt option "inline:". Setting to "none" will disable all inline highlighting, "simple" (the default) will use the old behavior, "char" / "word" will perform a character/word-wise diff of the texts within each diff block and only highlight the differences.
The new char/word inline diff only use the internal xdiff, and will respect diff options such as algorithm choice, icase, and misc iwhite options. indent-heuristics is always on to perform better sliding.
For character highlight, a post-process of the diff results is first applied before we show the highlight. This is because a naive diff will create a result with a lot of small diff chunks and gaps, due to the repetitive nature of individual characters. The post-process is a heuristic-based refinement that attempts to merge adjacent diff blocks if they are separated by a short gap (1-3 characters), and can be further tuned in the future for better results. This process results in more characters than necessary being highlighted but overall less visual noise.
For word highlight, always use first buffer's iskeyword definition. Otherwise if each buffer has different iskeyword settings we would not be able to group words properly.
The char/word diffing is always per-diff block, not per line, meaning that changes that span multiple lines will show up correctly. Added/removed newlines are not shown by default, but if the user has 'list' set (with "eol" listchar defined), the eol character will be be highlighted correctly for the specific newline characters.
Also, add a new "DiffTextAdd" highlight group linked to "DiffText" by default. It allows color schemes to use different colors for texts that have been added within a line versus modified.
This doesn't interact with linematch perfectly currently. The linematch feature splits up diff blocks into multiple smaller blocks for better visual matching, which makes inline highlight less useful especially for multi-line change (e.g. a line is broken into two lines). This could be addressed in the future.
As a side change, this also removes the bounds checking introduced to diff_read() as they were added to mask existing logic bugs that were properly fixed in #16768.
Notes
Marking this as draft/WIP as I am still fixing up a couple minor issues, and need to write a fair amount of tests to cover this feature. That said, the feature should be fully usable now (other than(Update: This PR is no longer draft)diff_hlID()
not working with this yet), and I would love it if people could give it a try and provide feedbacks.Some extra notes below for those interested:
Some examples of what it looks like:
inline:simple
(old behavior)inline:char
inline:char,algorithm:patience,icase
inline:word
More complicated multi-line change (the diff comes from this PR itself):
Multi-buffer highlight:
Color schemes can define
DiffTextAdd
to highlight added texts differently from changed texts:Name
Every diff program seems to call this feature something different. I mostly settled on "inline highlight" but open to other suggestions.
character diff
As mentioned above, I added a refinement post-process step to clean up the diffs. Just to show the problem and the before/after (first image is a naive implementation, and second is the one with a post-process step to merge small gaps):
From surveying other popular diff programs, the ones that use character diff for inline highlighting (e.g. Meld, VSCode) all have to do similar work, via pre/post-processing to massage the diff results a little bit. VSCode for example does quite a bit of post-processing (merging small gaps, extending small diffs to word boundaries, etc), and I tried to just keep this minimal and go for the largest bang for the buck to make the result look decent without over-complicating the implementation (implemented in
diff_refine_inline_char_highlight()
). This step is heuristics based and could be changed in the future.Sliders
Another issue is sliders, where you could slide a diff left or right due to repeating texts. To fix this, I'm always forcing on indent-heuristics to piggyback on it for free white-space sliding alignment. FWIW I think indent-heuristics should be in the default
diffopt
setting value (since Git has had it on by default for a long time) but that's another discussion. Without / with indent-heuristics:Without indent-heuristics (you can see how they don't align at whitespace boundaries):


With indent-heuristics forced on:
Note that this is not perfect. Since we are hijacking a feature designed for line diffing (where xdiff uses the white space and indentation of lines as a heuristics) rather than per-character diff, this doesn't help solve sliders at symbols (ideally the highlight should border the "+" sign):
If we want to fix this we should probably modify xdiff itself to add a new heuristics mode that's aware that it's doing character diffing and has heuristics scoring designed for intraline diff (e.g. VSCode has a scoring system that rank whitespace, symbols, keywords, etc differently to slide the diff to the best location).
Word diff
This basically just splits the diff block along word difference instead of character. For the most part it works fine. As mentioned I'm defining a word based on the first buffer's
iskeyword
setting to have a consistent definition. Some other programs that use word diff (e.g. diffchar.vim or Git's own word-diff feature) allows you to define words in different ways or via a regex pattern. For first pass I would rather just keep it simple and just provide a singleinline:word
option for now. We could always add a newinline:pattern
etc setting later as it's extensible.Newline
As noted above, newlines can be diffed as well, but it will only show up when
list
option is set (note how the first newline on the right is highlighted):I was debating back and forth how to show this information. VSCode for example highlights the entire rest of the line in the changed color, which I find to be too much. Meld essentially automatically shows the equivalent of
listchar
only where a change happens. I think this could be an interesting future extension, something along the lines ofset diffopt+=auto-list
where we automatically showlist
only when there is a diff highlight.There's also an issue here where more than half of the color schemes (including bundled ones) I tried tend to not work well when
list
is on during diffing. I am still not sure whether this is because of how Vim combines the highlight attributes or it's the color schemes' fault. This is usually particularly problematic if the color scheme usesgui=inverse
for NonText group. That's why most of the screenshots I took were using the third-party "iceberg" color scheme as it works the best in this situation.linematch
Unfortunately this PR doesn't work perfectly with linematch as mentioned above. Here's a showcase of the issue:
linematch off:


linematch on:
Note how in the linematch example, the code splits the diff block into two, and therefore inline highlighting fails to recognize that the first part of the texts ("another line of text…" corresponds to the other buffer) and it's marked as changed instead. I have some ideas how to fix it but for scope issues decided not to tackle it for now. (This feature is more complicated when 3-way diff is involved and the "correct" fix is not obvious and may require some additional improvements to multi-buffer diff to work well IMO)
Anticipated FAQs
Why not just take Git's word-diff code when we use xdiff already?
Word diff is an application layer feature in Git, not part of the xdiff library. It does something similar to this PR but is specific to Git. It isn't something we can just take.
Why not a plugin?
There's an existing plugin (diffchar.vim) that already does something similar. It does interesting things but it's limited by Vim's API. For example, it does not do live update when typing, does not handle multi-line blocks, and does not support multi-buffer diff'ing (this PR handles it fine as it uses Vim's builtin diff block merging in
diff_read
). Vim also has more internal knowledge and can control how the rendering is done instead of a plugin needing to bend over backwards to get this to work.I also think this feature is a basic requirement for a competent diff program and should be part of the overall design. We should not force users to install a plugin to get access to such features.
It does have some interesting features, such as the ability to highlight corresponding texts in the other buffer when cursor is on a highlighted chunk; and for chunks where there's an added piece of text (meaning there's no corresponding texts in the other buffer), use underline in neighbor texts to highlight where it is inserted in the other buffer. I decided not to clone all of them in this PR.
I do think later on Vim could expose more APIs both for accessing the diff highlight information, and/or provide ways to modify it. E.g. If another diff plugin like Difftastic (which does syntax diffing using treesitter) wants to hook into Vim/Neovim, it should have an API where it could define where exactly the highlight should be. We could add something like
set diffopt+=inline:expr
.Any performance issues?
I was planning to do some benchmarks but from initial testing, I simply didn't see any performance issues at all and therefore decided not to do it. Vim will only calculate inline difference for the on-screen diff blocks, and the result is cached inside the diff block, and therefore in vast majority of cases this change will not cause much performance regression when it's on. It will only be an issue if we have some serious degenerate cases where a single line is really long, in which case from experiences this could be slightly slow, but usually still within reason. Xdiff is also quite fast compared to other diff programs so I don't think this should cause much issues.
Relevant links
neovim/neovim#29549
neovim/neovim#3433
https://github.com/rickhowe/diffchar.vim
TODOs
Current TODOs before PR is fully ready:
diff_hlID()
to work properlyThere are further diff changes / improvements I would like to work on, but probably will wait to see how this PR goes first. Future potential options include making this work better with linematch, move detection, expose xdiff's
--anchored
feature, better API integration, etc.