Wiki History Scripts

At the bottom of this page are two Python scripts, history.cgi and diff.cgi. These are CGI scripts that pull data from the C2 wiki and generate a somewhat friendly history listing and version diff, respectively. To use them,

Copy the scripts to files in your web space named history.cgi and diff.cgi.
Configure your web server to treat these files as CGI scripts.
Invoke the history script as http://url/history.cgi?PageName.
Display the difference between individual page versions by selecting the versions and clicking the "Compare" button. The default is to compare the most recent version with the prior version.

Observations

The scripts have been used on Python 2.2.2. They may run on Python 2.1, but will not run on 1.x versions.
Some pages (such as WikiWikiSandbox at the moment) were deleted but reinstated before the older versions had vanished from HistoryPages. So some older versions are mixed with newer versions. Because history.cgi sorts by the version number, this can cause an inversion where older instances precede the current instance. With a little bit of effort the script could be changed to sort by timestamp instead. This is left as an exercise for the reader.
Not all versions are present. The C2 wiki collapses together versions by a single author.
It is recommended to create your own copies of the scripts and use them only for yourself. A public script would be unadvisable because Ward has mechanisms in place to limit flurries of requests from the same IP. In the case of public scripts the requests all come from the single IP hosting the script.
It is possible to correlate the output of RecentPosts with the HistoryPages and display even more information about [most] versions. This is not as trivial as it seems, because the timestamps on RecentPosts are the times the edits were saved, while the timestamps on HistoryPages are the times the respective versions were superseded. This feature is left as an exercise for the reader.
Not much error checking, nor is it UnitTested.
You can create a WikiBookmarklet to take whatever wiki page you're viewing and show the history for that page.

Samples

You can see a sample snapshot of history.cgi at http://andstuff.org/wiki/history.html?WardCunningham. The "Compare" button for the snapshot is hard-wired to a comparison of revisions 688 and 689, which shows some of the capabilities of the color-coded diff (borrowed from JotEngine and MoinMoin).

history.cgi


        #!/usr/bin/python

        # This script is public domain


        import urllib

        import os

        import re


        try :

        page = os.environ['QUERY_STRING']

        except :

        page = ""


        if page == "" :

        print "Content-type: text/plain"

        print

        print "Please indicate a page name"

        exit()


        stream = urllib.urlopen('http://c2.com/wiki/history/' + page)

        history = stream.read()

        stream.close()


        stream = urllib.urlopen('http://c2.com/cgi/quickDiff?' + page)

        diff = stream.read()

        stream.close()


        versions = []

        re.sub(r'(HREF="(\d+)">\d+ +([-0-9A-Za-z]+ [0-9:]+))', lambda match: versions.insert(0, [int(match.group(2)), match.group(3), "http://c2.com/wiki/history/" + page + '/' + match.group(2)]), history)

        re.sub(r'(Revision (\d+) made ([0-9]+ [a-z]+ ago))', lambda match: versions.insert(0, [int(match.group(2)), match.group(3), "http://c2.com/cgi/wiki?" + page]), diff)

        versions.sort()

        versions.reverse()


        print "Content-type: text/html"

        print

        print "History of " + page + ""

        print "History of " + page + "
"

        print ""

        print ""

        print ""

        count = 0

        for version in versions :

        count = count + 1

        if count == 1 : latestsel = "checked='checked' "

        else :          latestsel = ""

        if count == 2 : lastsel = "checked='checked' "

        else :          lastsel = ""

        print ""

        print "    "

        print "    "

        print "    "

        print " Revision " + str(version[0]) + " " + version[1] + "
"

        print ""

        print "
"

        print ""

diff.cgi


        #!/usr/bin/python

        # This script is public domain.


        import re

        import cgi, cgitb; cgitb.enable()

        import urllib


        def getcur(page) :

        stream = urllib.urlopen('http://c2.com/cgi/wiki?edit=' + page)

        text = stream.read()

        stream.close()

        match = re.search(r']+>(.*)', text, re.DOTALL)

        if match is None : result = ""

        else             : result = match.group(1)

        result = result.replace('<', '<')

        result = result.replace('>', '>')

        result = result.replace('"', '"')

        result = result.replace('&', '&')

        return result


        def diff(s1, s2) :

        from difflib import SequenceMatcher


        s1 = s1.replace('&', '&')

        s1 = s1.replace('<', '<')

        s2 = s2.replace('&', '&')

        s2 = s2.replace('<', '<')


        seq1 = s1.splitlines()

        seq2 = s2.splitlines()


        seqobj = SequenceMatcher(None, seq1, seq2)


        linematch = seqobj.get_matching_blocks()


        if len(seq1) == len(seq2)        and linematch[0] == (0, 0, len(seq1)) :   # No differences.

        return 'No differences.'


        lastmatch = (0, 0)

        end       = (len(seq1), len(seq2))


        result = "\n"

        for match in linematch :              # Print all differences.

        if lastmatch == match[0:2] :        # Starts of pages identical.

        lastmatch = (match[0] + match[2], match[1] + match[2])

        continue

      
        result = result                + "\n"

      
        leftpane  = ""

        rightpane = ""

        linecount = max(match[0] - lastmatch[0], match[1] - lastmatch[1])

        for line in range(linecount) :

        if line < match[0] - lastmatch[0] :

        if line > 0 :

        leftpane += '\n'

        leftpane += seq1[lastmatch[0] + line]

        if line < match[1] - lastmatch[1] :

        if line > 0 :

        rightpane += '\n'

        rightpane += seq2[lastmatch[1] + line]

      
        charobj   = SequenceMatcher(None, leftpane, rightpane)

        charmatch = charobj.get_matching_blocks()

      
        if leftpane == "" and rightpane == "" :

        ratio = 1.0

        else :

        ratio = charobj.ratio()

      
        if ratio < 0.5 :                    # Insufficient similarity.

        if len(leftpane) != 0 :

        leftresult = "" + leftpane + ""

        else :

        leftresult = ""

      
        if len(rightpane) != 0 :

        rightresult = "" + rightpane + ""

        else :

        rightresult = ""

        else :                              # Some similarities; markup changes.

        charlast = (0, 0)

        charend  = (len(leftpane), len(rightpane))

      
        leftresult  = ""

        rightresult = ""

        for thismatch in charmatch :

        if thismatch[0] - charlast[0] != 0 :

        leftresult = leftresult                          + ""                          + leftpane[charlast[0]:thismatch[0]]                          + ""

        if thismatch[1] - charlast[1] != 0 :

        rightresult = rightresult                           + ""                           + rightpane[charlast[1]:thismatch[1]]                           + ""

        leftresult = leftresult                        + leftpane[thismatch[0]:thismatch[0] + thismatch[2]]

        rightresult = rightresult                         + rightpane[thismatch[1]:thismatch[1] + thismatch[2]]

        charlast = (thismatch[0] + thismatch[2], thismatch[1] + thismatch[2])

      
        leftpane  = leftresult.replace('\n', '
\n')

        rightpane = rightresult.replace('\n', '
\n')

      
        result = result                + "\n"

      
        lastmatch = (match[0] + match[2], match[1] + match[2])

      
        result = result + '
      
      
      "                + "Line " + str(lastmatch[0] + 1) + ", removed:"                + " "                + "Line " + str(lastmatch[1] + 1) + ", added:"                + "
      
      
      
      
      
      
      
      "                + leftpane                + " "                + rightpane                + "
      
      
\n'

      
      
        return result

      
      
        form = cgi.FieldStorage()

      
      
        page = form.getfirst('page', "")

        v1 = form.getfirst('v1', "")

        v2 = form.getfirst('v2', "")

      
      
        stream = urllib.urlopen('http://c2.com/wiki/history/' + page + '/' + v1)

        v1 = stream.read()

        stream.close()

      
      
        stream = urllib.urlopen('http://c2.com/wiki/history/' + page + '/' + v2)

        v2 = stream.read()

        stream.close()

      
      
        if not re.search(r'404 Not Found', v1) is None : v1 = getcur(page)

        if not re.search(r'404 Not Found', v2) is None : v2 = getcur(page)

      
      
        print "Content-type: text/html"

        print

        print "Differences for " + page + ""

        print "Differences for " + page + "
"

        print diff(v1, v2)

        print ""

      
      
      
        See
        PageHistory
        WikiHistory
      
      
      
        CategoryCoding

" + "Line " + str(lastmatch[0] + 1) + ", removed:" + "		" + "Line " + str(lastmatch[1] + 1) + ", added:" + "
" + leftpane + "		" + rightpane + "