Subtleties of colorizing unified diff output

I wanted to colorize unified diff output on the commandline; red for deletions and green for additions. As I learned on StackOverflow, there is colordiff which is the right solution for the problem. But why use an existing solution written by someone else in 400 lines of Perl, when you can use a partial solution of your own in one line of sed?

Wait... don't answer that.

So here's a one-liner that highlights deletions with a red background and additions with a green background, and hunk markers in blue text.

sed 's/^-/\x1b[41m-/;s/^+/\x1b[42m+/;s/^@/\x1b[34m@/;s/$/\x1b[0m/'

I chose to highlight changes using a background color so that whitespace changes would be more readily apparent. Interestingly, xterm does not display a background color for tab characters. This means that you are able to clearly see tab <-> space indentation changes in a diff. However, it also means that you can't see changes of trailing tabs. Sadly, colordiff does not support background colors.

Filenames are highlighted in the same way as content... for a good reason. You see, to differentiate between a filename line and a content line, you have to fully parse the diff output. Otherwise, if you add a line of text to a file that looks like:

++ vim73/src/ui.c      2013-06-01 14:48:45.012467754 -0500

you will get a line in your unified diff that looks like:

+++ vim73/src/ui.c      2013-06-01 14:48:45.012467754 -0500

which any regex-based approach is going to incorrectly see as a diff filename header. Clearly the same problem arises when deleting lines that start with --. Since colordiff is also a line-by-line regex-based implementation, it also highlights filenames the same as content. This is one of those cases where you can change your problem specification to make your solution trivial.

Example:

  • evil.orig
    blah blah blah
    humbug
    one two three
    four five six
    -- vim73/src/ui.c      2013-06-01 14:48:45.012467754 -0500
    @@ -1,6 +1,6 @@
    blah blah blah
    one two three
    four five six
    eight nine ten
    zero
    humbug
    
  • evil.new
    blah blah blah
    bah humbug
    one two three
    four five six
    ++ vim73/src/ui.c      2013-06-01 14:48:45.012467754 -0500
    @@ -1,6 +1,6 @@
    blah blah blah
    one two three
    four five six
    seven eight nine ten
    zero
    humbug
    

Yields a misleading unified diff that looks like:

--- evil.orig   2013-06-01 16:18:25.282693446 -0500
+++ evil.new    2013-06-01 16:30:27.535803954 -0500
@@ -1,12 +1,12 @@
 blah blah blah
-humbug
+bah humbug
 one two three
 four five six
--- vim73/src/ui.c      2013-06-01 14:48:45.012467754 -0500
+++ vim73/src/ui.c      2013-06-01 14:48:45.012467754 -0500
 @@ -1,6 +1,6 @@
 blah blah blah
 one two three
 four five six
-eight nine ten
+seven eight nine ten
 zero
 humbug

That one space before the false hunk header is probably the most visually apparent clue that something isn't right. Unless you're paying attention to the actual numbers in the hunk header, that is; but if the hunk is a couple hundred lines long and the false diff portion is only a couple of lines, even that would be hard to notice.

Colorize the diff (with my sed implementation),

--- evil.orig   2013-06-01 16:18:25.282693446 -0500
+++ evil.new    2013-06-01 16:30:27.535803954 -0500
@@ -1,12 +1,12 @@
 blah blah blah
-humbug
+bah humbug
 one two three
 four five six
--- vim73/src/ui.c      2013-06-01 14:48:45.012467754 -0500
+++ vim73/src/ui.c      2013-06-01 14:48:45.012467754 -0500
 @@ -1,6 +1,6 @@
 blah blah blah
 one two three
 four five six
-eight nine ten
+seven eight nine ten
 zero
 humbug

... and it is slightly less subtle.

Perhaps there is a case here for a diff colorizer built on a real parse of a unified diff?

Comments

No comments.