Working with legacy code

When working in QA or Test Automation you are much more likely to be confronted with a legacy application with hundreds of thousands of code lines with missing documentation and test cases than finding well documented one with high test coverage and beautiful code. In many cases the legacy code has to be re-factored to improve testability. Therefore one critical skill is to be able to work with legacy code.

When I start reading source code I start from a birds eye perspective. I first want to know how big the project is I am looking at.

Loc comes in handy in this situation to get a first general impression about the scale of the code base.

Measuring Lines Of Code (LOC)

sudo apt-get install sloccount
cd sloccount openerp-server-5.0.0_rc3

SLOC        Directory       SLOC-by-Language (Sorted)
58241   addons-extra    python=50644,php=7434,sh=163
53168   addons          python=53126,php=42
22288   server          python=22287,sh=1
15599   client          python=15599
12334   web             python=12205,sh=129
72      top_dir         python=72

Totals grouped by language (dominant language first):
python:      153933 (95.20%)
php:           7476 (4.62%)
sh:             293 (0.18%)

Measuring complexity (McCabe)

Another very useful code metric in the situation described above is the McCabe complexity metric. This tool helps you to identify the most complex code. The complex areas need the most attention when it comes to quality assurance measures (documentation, testing, etc.). These areas usually contain the most defects, too.

#manually extract tarball
PyMetrics.py program.py

Understanding the code structure using Callgraphs

Almost every time I am approaching a code base unknown to me I am looking for a certain functionality which I am particularly interested in. As soon as I understand the functionality I go to the next one and so on until I understand everything I need to. To identify relevant parts in the code a call graph proved to be very handy. The graph “lists” all the relevant modules and functions and shows in which order they are called. I always use the call graph as a “map” that helps me to navigate the code base unknown to me.

Installation of the pycallgraph tool (in Ubuntu)

apt-get install pycallgraph

Instead of starting your program with your python interpreter you just use pycallgraph to execute your script. For example run the scotch recording proxy within the pycallgraph tool:

pycallgraph run-recording-proxy -i scotch.* -e *.*

Call graphs get big very fast because the program usually calls a lot of library functions. For that reason I excluded everything except the scotch module from the diagram (-i and -e options), which I included.

Pycallgraph of the scotch recording proxy

Printing the callgraph

Callgraphs are usually big especially if you are working with a non trivial module. To be able to print those callgraphs in case you do not have a plotter the program Dia (Diagrams, UML, etc.) comes in very handy. To install it on a Ubuntu box just type:

sudo apt-get install dia

Dia helps you to split huge visualizations graphics into multiple pages that you then print separately.

Tags:

Leave a reply

(Required)
(Required, but never shared)