Convert diff output to colorized HTML

 in

If you search the web you can find a number of references to programs/scripts that convert diff output to HTML. This is a bash version.

The script expects "unified" diff output (diff -u) on its standard input and produces a self-contained colorized HTML page on its standard output. Consider the two files:

#include <stdio.h>

// main
main()
{
    printf("Hello world\n");
}
and
#include <stdio.h>

main()
{
    printf("Hello World\n");
    printf("Goodbye cruel world\n");
}
Running diff on the two files and piping that through diff2html produces a colorized HTML page:
  $ diff -u v1.c v2.c | diff2html >v1-v2.html
The resulting page is:
--- v1.c	2008-08-27 13:04:40.000000000 -0500
+++ v2.c	2008-08-27 13:04:29.000000000 -0500
@@ -1,8 +1,8 @@
 #include <stdio.h>
 
-// main
 main()
 {
-	printf("Hello world\n");
+	printf("Hello World\n");
+	printf("Goodbye cruel world\n");
 }
 

The script follows:

#!/bin/bash
#
# Convert diff output to colorized HTML.

cat <<XX
<html>
<head>
<title>Colorized Diff</title>
</head>
<style>
.diffdiv  { border: solid 1px black;           }
.comment  { color: gray;                       }
.diff     { color: #8A2BE2;                    }
.minus3   { color: blue;                       }
.plus3    { color: maroon;                     }
.at2      { color: lime;                       }
.plus     { color: green; background: #E7E7E7; }
.minus    { color: red;   background: #D7D7D7; }
.only     { color: purple;                     }
</style>
<body>
<pre>
XX

echo -n '<span class="comment">'

first=1
diffseen=0
lastonly=0

OIFS=$IFS
IFS='
'

# The -r option keeps the backslash from being an escape char.
read -r s

while [[ $? -eq 0 ]]
do
    # Get beginning of line to determine what type
    # of diff line it is.
    t1=${s:0:1}
    t2=${s:0:2}
    t3=${s:0:3}
    t4=${s:0:4}
    t7=${s:0:7}

    # Determine HTML class to use.
    if    [[ "$t7" == 'Only in' ]]; then
        cls='only'
        if [[ $diffseen -eq 0 ]]; then
            diffseen=1
            echo -n '</span>'
        else
            if [[ $lastonly -eq 0 ]]; then
                echo "</div>"
            fi
        fi
        if [[ $lastonly -eq 0 ]]; then
            echo "<div class='diffdiv'>"
        fi
        lastonly=1
    elif [[ "$t4" == 'diff' ]]; then
        cls='diff'
        if [[ $diffseen -eq 0 ]]; then
            diffseen=1
            echo -n '</span>'
        else
            echo "</div>"
        fi
        echo "<div class='diffdiv'>"
        lastonly=0
    elif  [[ "$t3" == '+++'  ]]; then
        cls='plus3'
        lastonly=0
    elif  [[ "$t3" == '---'  ]]; then
        cls='minus3'
        lastonly=0
    elif  [[ "$t2" == '@@'   ]]; then
        cls='at2'
        lastonly=0
    elif  [[ "$t1" == '+'    ]]; then
        cls='plus'
        lastonly=0
    elif  [[ "$t1" == '-'    ]]; then
        cls='minus'
        lastonly=0
    else
        cls=
        lastonly=0
    fi

    # Convert &, <, > to HTML entities.
    s=$(sed -e 's/\&/\&amp;/g' -e 's/</\&lt;/g' -e 's/>/\&gt;/g' <<<"$s")
    if [[ $first -eq 1 ]]; then
        first=0
    else
        echo
    fi

    # Output the line.
    if [[ "$cls" ]]; then
        echo -n '<span class="'${cls}'">'${s}'</span>'
    else
        echo -n ${s}
    fi
    read -r s
done
IFS=$OIFS

if [[ $diffseen -eq 0  &&  $onlyseen -eq 0 ]]; then 
    echo -n '</span>'
else
    echo "</div>"
fi
echo

cat <<XX
</pre>
</body>
</html>
XX

# vim: tabstop=4: shiftwidth=4: noexpandtab:
# kate: tab-width 4; indent-width 4; replace-tabs false;

______________________

Mitch Frazier is an Associate Editor for Linux Journal.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thanks Richie

jim3e8's picture

Your script was useful. The IFS should be set to newline though so that tabs are not collapsed.

It is also nice to additionally accept diff input on stdin as in Mitch's original script so one can colorize output from svn/hg/cvs diff via a pipe. Just replace your "diff -u $@" with a variable called $CMD, and set $CMD to "cat" when no args are provided.

Interesting post,thanks.Good

monitoring's picture

Interesting post,thanks.Good website.

Or use vim

Anonymous's picture

You can also open the diff in Vim, choose your preferred colorscheme, and then use :TOhtml.

Not another one!

Richie's picture

I first thought this was going to be a pure output colorizer for diff. When I read on and realized it was yet another converter to generate HTML from diff output, I was disappointed. "Never mind", I thought, "I'll just write my own." Based heavily on the above program this produces pretty much the same output, but in the console, using ANSI color control sequences.

#!/bin/bash
#
# colorize diff output for ANSI terminals
# based on "diff2html" 
# (http://www.linuxjournal.com/content/convert-diff-output-colorized-html)

# absolute color definitions
NOCOL="\e[0m"
BOLD="\e[1m"
RED="\e[31m"
GREEN="\e[32m"
PINK="\e[35m"
CYAN="\e[36m"
WHITE="\e[37m"

# style color definitions
C_COMMENT=$WHITE
C_DIFF=$RED
C_OLDFILE=$CYAN
C_NEWFILE=$RED
C_STATS=$GREEN
C_OLD=$BOLD$RED
C_NEW=$BOLD$GREEN
C_ONLY=$PINK

# check args
[[ $# -lt 2 ]] && echo "Usage: $0  " && exit 1

# The -r option keeps the backslash from being an escape char.
diff -u $@ | while read -r s ; do
	# determine line color
	if [[ "${s:0:7}" == 'Only in' ]]; then color=$C_ONLY
	elif  [[ "${s:0:4}" == 'diff' ]]; then color=$C_DIFF
	elif  [[ "${s:0:3}" == '---'  ]]; then color=$C_OLDFILE
	elif  [[ "${s:0:3}" == '+++'  ]]; then color=$C_NEWFILE
	elif  [[ "${s:0:2}" == '@@'   ]]; then color=$C_STATS
	elif  [[ "${s:0:1}" == '+'    ]]; then color=$C_NEW
	elif  [[ "${s:0:1}" == '-'    ]]; then color=$C_OLD
	else color=
	fi

	# Output the line.
	if [[ "$color" ]]; then
		printf "$color"
		echo -n $s
		printf "$NOCOL\n"
	else echo $s
	fi
done

Tinker with the "style color definitions" to change colors to suit your environment.

In the course of messing with this, I made a couple of improvements that could be rolled back into the original version: 1. only one call to "read", and 2. "diff -u" is now called internally.

I tried not changing IFS, and it seemed to work, so I left those lines out, too. The other thing the original could benefit from is properly named CSS classes! (What on earth do "plus3" and "at2" mean?) As you can see, I've given more meaningful names to my ANSI equivalents.

colordiff

gliks's picture

Hey Richie,

you could just use an existing tool colordiff

sudo apt-get colordiff

great script though.

Cheers.

tkdiff

Anonymous's picture

Perhaps more interactively helpful is a graphical diff viewer such as tkdiff.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix