The Pycon DE 2011 team did an awesome job in recording all the conference talks. The quality is extraordinary.
Note: Narrative is in German.
Slides:
Mark_Fink_Ajax_PerfTest_Harness_v1.0
Software Quality Assurance and Test Automation in Practice
The Pycon DE 2011 team did an awesome job in recording all the conference talks. The quality is extraordinary.
Note: Narrative is in German.
Slides:
Mark_Fink_Ajax_PerfTest_Harness_v1.0
When I start reading source code I start from a birds eye perspective. I first want to know how big the project is I am looking at.
Loc comes in handy in this situation to get a first general impression about the scale of the code base.
sudo apt-get install sloccount
cd sloccount openerp-server-5.0.0_rc3
SLOC Directory SLOC-by-Language (Sorted)
58241 addons-extra python=50644,php=7434,sh=163
53168 addons python=53126,php=42
22288 server python=22287,sh=1
15599 client python=15599
12334 web python=12205,sh=129
72 top_dir python=72
Totals grouped by language (dominant language first):
python: 153933 (95.20%)
php: 7476 (4.62%)
sh: 293 (0.18%)
Another very useful code metric in the situation described above is the McCabe complexity metric. This tool helps you to identify the most complex code. The complex areas need the most attention when it comes to quality assurance measures (documentation, testing, etc.). These areas usually contain the most defects, too.
#manually extract tarball
PyMetrics.py program.py
Almost every time I am approaching a code base unknown to me I am looking for a certain functionality which I am particularly interested in. As soon as I understand the functionality I go to the next one and so on until I understand everything I need to. To identify relevant parts in the code a call graph proved to be very handy. The graph “lists” all the relevant modules and functions and shows in which order they are called. I always use the call graph as a “map” that helps me to navigate the code base unknown to me.
apt-get install pycallgraph
Instead of starting your program with your python interpreter you just use pycallgraph to execute your script. For example run the scotch recording proxy within the pycallgraph tool:
pycallgraph run-recording-proxy -i scotch.* -e *.*
Call graphs get big very fast because the program usually calls a lot of library functions. For that reason I excluded everything except the scotch module from the diagram (-i and -e options), which I included.

Callgraphs are usually big especially if you are working with a non trivial module. To be able to print those callgraphs in case you do not have a plotter the program Dia (Diagrams, UML, etc.) comes in very handy. To install it on a Ubuntu box just type:
sudo apt-get install dia
Dia helps you to split huge visualizations graphics into multiple pages that you then print separately.
Most of the Apache2 configuration I borrowed from this howto:
http://techxplorer.com/2009/04/27/using-apache-and-ubuntu-for-local-websites-with-ssl/
We are building a sample site which you access in your browser with the following url:
First of all you need Apache2 installed (for Ubuntu):
sudo apt-get install apache2
If you want to make it really pretty (what I usually want to do) put the following FQDN into your /etc/hosts configuration (you probably have to restart your browser to make this work):
127.0.2.1 my.sample.com
Any requests for the my.sample.com domain, which does not exit on the internet, will be sent to the 127.0.2.1 IP address (part of the loopback).
Start a new file in the /etc/apache/sites-available directory with the name of the site, for example: my.sample.com. Use this configuration as a starting point and change the IP Address, ServerName, ServerAdmin, DocumentRoot and Directory as appropriate.
<VirtualHost 127.0.2.1:443>
ServerName my.sample.com
ServerAdmin admin@sample.com
DocumentRoot /home/mark/devel/tam/build
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
<Directory /home/mark/devel/tam/build>
Options Indexes FollowSymLinks Multiviews
AllowOverride all
Order allow,deny
allow from all
</Directory>
ErrorLog /var/log/apache2/error.log
LogLevel warn
CustomLog /var/log/apache2/access.log combined
</VirtualHost>
Execute the following command to enable the new site:
sudo a2ensite my.sample.com
Restart Apache by executing the following command:
sudo /etc/init.d/apache2 restart
Test the new website:
Put a sample file (index.html or something like that) into the document root and check that it is visible using your browser.
In order to create certificates you need openssl and some configuration which is described in this section. If you want you can just install the provided sample certificates instead of creating your own (the certificates are contained in testing-software/ssh-sniffer). If you want to skip the certificate creation just jump to install certificates. Make sure you use this only as a local test scenario!
Install or upgrade openssl (in Ubuntu):
sudo apt-get install openssl
In order to use the steps provided below you have to configure openssl (/etc/ssl/openssl.cfg).
#
# OpenSSL example configuration file.
# This is mostly being used for generation of certificate requests.
#
# This definition stops the following lines choking if HOME isn't
# defined.
HOME = .
RANDFILE = $ENV::HOME/.rnd
# Extra OBJECT IDENTIFIER info:
#oid_file = $ENV::HOME/.oid
oid_section = new_oids
# To use this configuration file with the "-extfile" option of the
# "openssl x509" utility, name here the section containing the
# X.509v3 extensions to use:
# extensions =
# (Alternatively, use a configuration file that has only
# X.509v3 extensions in its main [= default] section.)
[ new_oids ]
# We can add new OIDs in here for use by 'ca' and 'req'.
# Add a simple OID like this:
# testoid1=1.2.3.4
# Or use config file substitution like this:
# testoid2=${testoid1}.5.6
####################################################################
[ ca ]
default_ca = sampleCA # The default ca section
####################################################################
[ sampleCA ]
dir = ./CA # Where everything is kept
certs = $dir/certs # Where the issued certs are kept
crl_dir = $dir/crl # Where the issued crl are kept
database = $dir/index.text # database index file.
#unique_subject = no # Set to 'no' to allow creation of
# several certificates with same subject.
new_certs_dir = $dir # default place for new certs.
certificate = $dir/sampleCA.crt # The CA certificate
serial = $dir/serial # The current serial number
crlnumber = $dir/crlnumber # the current crl number
# must be commented out to leave a V1 CRL
crl = $dir/sampleCA.crl # The current CRL
private_key = $dir/sampleCA.key # The private key
RANDFILE = $dir/private/.rand # private random number file
x509_extensions = usr_cert # The extensions to add to the cert
...
We need to create own CA certificate and CA key, which we will use to sign our web server and our client certificates.
When starting out with a new CA we need to create the following directories/ subdirectories:
mkdir ./sampleCA
mkdir ./sampleCA/CA
mkdir ./sampleCA/CA/newcerts
mkdir ./sampleCA/CA/certs
mkdir ./sampleCA/CA/crl
mkdir ./sampleCA/CA/private
and the files:
touch ./sampleCA/CA/index.text
touch ./sampleCA/CA/serial
echo "01" > ./sampleCA/CA/serial
cd ./sampleCA
Create CA key with 1024 bit:
openssl genrsa -out ./CA/sampleCA.key
Create CA Certificate Request:
openssl req -new -key ./CA/sampleCA.key -out ./CA/sampleCA.csr
Self-sign CA certificate:
openssl x509 -req -days 365 -in ./CA/sampleCA.csr -out ./CA/sampleCA.crt -signkey ./CA/sampleCA.key
Now we finally have our CA certificate and our CA key, which we will use to sign our webserver and our client certificates.
Check the CA certificates content:
openssl x509 -in ./CA/sampleCA.crt -text
For the web server we need to provide the server side certificate and key.
Create the web server key:
openssl genrsa -des3 -out sampleWebServer.key
(pass phrase: test123)
Remove the pass phrase from the web server key otherwise Apache2 will always prompt you for the pass phrase at server startup. Which is a real problem if you want to keep the configuration running.
openssl rsa -in sampleWebServer.key -out sampleWebServer.key
The “-des3” option tell openssl to encrypt the key with a 3DES (Triple-DES) Pass-phrase. For later, this will be the startup pass-phrase for our webserver certificate. Always keep the pass phrases in mind or keep it in a save place!
Now create the web server certificate request:
openssl req -new -key sampleWebServer.key -out sampleWebServer.csr
IMPORTANT NOTE: During the creation process you will be asked several questions, e.g. the “Common Name” (=CN). As the CN has no default value in openssl.cnf you MUST enter the complete webserver’s name (FQDN=Fully Qualified Domain Name), For example my.sample.com!!! This string will be later compared by Apache to the config directive “ServerName”. If these strings are not identical, the webserver will not be able to start up!
Sign the web servers certificate request with the CA key:
openssl ca -in sampleWebServer.csr -cert ./CA/sampleCA.crt -keyfile ./CA/sampleCA.key -out sampleWebServer.crt
Check the generated certificate if you want:
openssl x509 -in sampleWebServer.crt -text
sudo cp sampleWebServer.key /etc/ssl/private/
sudo chmod 400 /etc/ssl/private/sampleWebServer.key
sudo cp sampleWebServer.crt /etc/ssl/certs/
sudo cp ./CA/sampleCA.crt /etc/ssl/certs/
Enable the ssl module with the following command:
sudo a2enmod ssl
Use the following configuration as a starting point and add the necessary configuration lines to the configuration file we made before. Change the IP Address, ServerName, ServerAdmin, DocumentRoot and Directory as appropriate.
<IfModule mod_ssl.c>
<VirtualHost 127.0.2.1:443>
ServerName my.sample.com
ServerAdmin admin@sample.com
DocumentRoot /home/mark/devel/tam/build
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
<Directory /home/mark/devel/tam/build>
#SSLRequireSSL
#SSLRequire %{SSL_CLIENT_S_DN_O} eq "Internet Widgits Pty Ltd"
Options Indexes FollowSymLinks Multiviews
AllowOverride all
Order allow,deny
allow from all
</Directory>
ErrorLog /var/log/apache2/error.log
LogLevel warn
CustomLog /var/log/apache2/access.log combined
# SSL
SSLEngine on
# Server Certificate
SSLCertificateFile /etc/ssl/certs/sampleWebServer.crt
SSLCertificateKeyFile /etc/ssl/private/sampleWebServer.key
#SSLCACertificateFile /etc/ssl/certs/sampleCA.crt
#SSLVerifyClient require
#SSLVerifyDepth 2
BrowserMatch ".*MSIE.*"\
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0
</VirtualHost>
</IfModule>
Restart Apache:
sudo /etc/init.d/apache2 restart
(you probably have to provide the pass phrase test123)
Test the new website and enter the following URL into your browser:
(you probably have to add a security exception for the server certificate to your web browser in order to access your site – your browser will prompt you accordingly)
Make sure that you use https and not just http!
In order to use client authentication you need a client certificate to be installed with your browser. Please note that your CA certificate which is needed to verify the client certificate has already been installed on the server.
We use openssl to create the client certificate, too.
cd ./sampleCA
Create the client key:
openssl genrsa -des3 -out client.key 1024
(pass phrase: test123)
Create user certificate request:
openssl req -new -key client.key -out client.crs
(This time we enter for the common name (CN) our full name, for example "client")
Sign user certificate request and create certificate:
openssl ca -in client.crs -cert ./CA/sampleCA.crt -keyfile ./CA/sampleCA.key -out client.crt
Convert user certificate to p12 format:
openssl pkcs12 -export -clcerts -in client.crt -inkey client.key -out client.p12
In case you also need the pem format (containing both private key and certificate):
openssl pkcs12 -in client.p12 -out client.pem -nodes
At the moment, our Apache webserver accepts any SSL connection. Therefore we have to force him to cross-check whether the presented user certificate is valid. You can further enhance security by evaluating that the client certificate matches certain conditions. For example to match the company name.
To make this happen, please uncomment the following Apache configuration options in your my.sample.com file:
SSLRequireSSL
SSLRequire %{SSL_CLIENT_S_DN_O} eq "Internet Widgits Pty Ltd"
and:
SSLCACertificateFile /etc/ssl/certs/sampleCA.crt
SSLVerifyClient require
SSLVerifyDepth 2
Restart the Apache2 web server:
sudo /etc/init.d/apache2 restart
(you probably have to provide the pass phrase test123)
Please verify that you can not access the website without the client certificate.

From now on your browser needs to provide a valid client certificate in order to access the website. In order to import the client.p12 certificate (it is contained in the ./sampleCA folder) follow the instructions.
In Firefox navigate through the menu structure:
Edit > Preferences > Advanced > View Certificates > Your Certificates -> Import -> Choose file
(I did not set any password)
Verify that you now are able to access your site again. Well done!
I order to optimize the heap configuration for your situation you have to try different settings. Usually in order to do so you need some kind of test environment and a load generator which is able to produce realistic load patterns in a repeatable way. With this setup you can try different heap configurations and monitor the heap utilization and garbage collector. Best practice is to make an educated guess on the configuration and start optimizing from there. In this article I will show you how to do this.
In order to being able to optimize the heap utilization of your application you need to understand how the Java heap and garbage collector work and how they are configured.
The heap is split up in Young-Generation (Eden-Space, and two Survivor-Spaces of identical size usually called From and To), Old Generation (Tenured) and Permanent Space.

The idea behind this organization of the heap is that most of the objects dye in New-Space because they have a very short life cycle:
To clean up the whole heap within one garbage collector run would take much longer than reducing the cleaning up to just the Young-Generation (Minor-GC). Only the Full-Gc works on the complete heap. Usually a Minor-Gc is much faster than a Full-Gc. The execution time of a Gc run is very important to you because most garbage collection strategies stop the world. This means that even if you have multiple processor cores, all execution is halted during GC run. The stop-the-world strategies have many negative effects on applications that optimize response times. Today most JVMs also provide an implementation of a mark-and-sweep strategy which eliminates the negative effects because they do not need long pauses.
The next step before you can start optimizing the Java heap and garbage collector configuration is to develop a thorough understanding of the configuration parameters.
For server applications you should set both parameters identical because the resizing of the heap also initiates a Full-Gc. With an application under load this could have a severe impact on application performance from the very start. On most operating systems the regained memory is not returned to the operating system.
Over time Sun developed different strategies for the garbage collector. Different applications have different requirements regarding response times. In case you optimize for response times or you are having realtime requirements you are in a totally different situation than somebody who is optimizing for throughput. In this section I describe the different strategies for you so that you will be able to select the one that fits best to your needs. Different strategies and configuration options are available in the different JVM versions. Check the version of your JVM before you start working.
When the garbage collector runs all referenced objects in Eden-Space and From-Survivor are copied into the To-Survivor. During this operation objects that already have been kept for some time in the Survivor spaces are moved into the Old-Generation. After that Eden-Space and From-Survivor are emptied and From- and To- Survivor spaces switch their functions. Are more objects referenced than fit into From-Survivor the remaining objects are directly moved into the Old-Generation. After garbage collection of the Young-Generation is finished both Eden-Space and To-Survivor are empty.
For clean-up of the Old- and Perm- generations a mark-and-sweep algorithm is used. During the mark phase all objects that are referenced by active objects (for example active thread objects, system classes, local variables, pending exceptions, references of the native stack). The removal of the unmarked objects takes place in the sweep phase. After completion of the sweep phase the markings are removed. The memory is now fragmented. During the compaction phase all remaining objects are moved to the beginning of the generation in order to get a continuous free memory region. This simplifies the allocation of memory for new objects.
TODO: describe other strategies
After you decided for the initial configuration for heap and garbage collector you will use a load tool to put your application under realistic load (depending on your application this takes usually 8-10 hours). In order to decide for improvements you need to analyse the memory utilization under load. Many monitoring tools also provide monitoring for the Java memory utilization. Call me old-fashioned but I do not trust them. Many of them have weird algorithms to calculate the values you are dependant on for your optimization. I have seen their inaccuracy so many times. Therefore I recommend to log the garbage collector information to a logfile (usually you will do this anyway). After your test run is complete you extract the data for your analysis from the logfile.
important: look at the frequency and time consumed by gc
The JVM provides a lot of different options to provide log information about heap utilization and garbage collector performance.
This section gives a brief overview of some tools that assist you in analysing the heap utilization of your JVM. Depending on your environment a particular tool could be more suitable than others. Currently there are many hundreds of tools available and there is no way that this section could provide a comprehensive overview of tool situation. Nevertheless it is intended to give a brief overview of different approaches and to reference supportive tools.
Instant visual analysis of GC by sun. Visualgc directly connects to the JVM of your running application. There is no need to output the information into a logfile.
Visualgc is not included in the jdk any more. So download it from http://java.sun.com/performance/jvmstat/
In order to install it on Ubuntu uncompress the download to /usr/local/bin.
Edit ~/.bashrc in order to add the following two lines:
export JVMSTAT_JAVA_HOME='/usr/lib/jvm/java-6-sun'
PATH=$PATH:/usr/local/bin/jvmstat/bin
Run jps in order to look up the JVM id of your running application:
#jps
9599 Jps
9590 Life
#visualgc 9590

For me in most cases connecting to the JVM with visualgc is not an option for various reasons. Mostly I run lots of performance tests and most of them have long durations 4-10 hours. Therefore I prefer to collect logfiles after the test has been completed and analyse them during post-processing. For the analysis of logfiles I maintain a Python script which recently had a complete overhaul.
To log information about heap utilization into a logfile the JVM offers you several parameters (see above). For the analysis I describe here I recommend the following configuration in addition to the memory configuration parameters:
-verbose:gc -XX:+PrintGCTimeStamps
-XX:-TraceClassUnloading -XX:+PrintHeapAtGC
The Python script that produces the diagrams for Gc analysis I used in this article you will find here together with a sample Java application for heap utilization: http://bitbucket.org/markfink/testing-software/src/tip/gcview/
The jstat utility is a statistics monitoring tool. It attaches to a Java VM and collects and logs performance statistics as specified by the command line options.
The jstat utility does not require the Java VM to be started with any special options. This utility is included in the JDK.
The following table lists the jstat command options.
| -class | prints statistics on the behaviour of the class loader |
| -compiler | prints statistics on the behaviour of the Java compiler |
| -gc | prints statistics on the behaviour of the garbage collected heap |
| -gccapacity | prints statistics of the capacities of the generations and their corresponding spaces |
| -gccause | prints the summary of garbage collection statistics with the cause of the last and current (if applicable) garbage collection events |
| -gcnew | prints statistics of the behaviour of the new generation |
| -gcnewcapacity | prints statistics of the sizes of the new generations and their corresponding spaces |
| -gcold | prints statistics of the behaviour of the old and permanent generations |
| -gcoldcapacity | prints statistics of the sizes of the old generation |
| -gcpermcapacity | |
| prints statistics of the sizes of the permanent generation | |
| -gcutil | prints a summary of garbage collection statistics |
| -printcompilation | |
| prints Java compilation method statistics | |
A complete description of the jstat options plus examples, can be found at:
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstat.html
The following demonstrates the jstat command which attaches to pid 16945 and takes five samples at 250 millisecond intervals. The -gcnew option specifies that statistics of the behaviour of the new generation is output.
# jps
16945 Life
16954 Jps
# jstat -gcnew 16945 250 5
S0C S1C S0U S1U TT MTT DSS EC EU YGC YGCT
48000.0 48000.0 0.0 48000.0 1 15 24000.0 672000.0 109709.5 1 0.623
48000.0 48000.0 0.0 48000.0 1 15 24000.0 672000.0 123418.2 1 0.623
48000.0 48000.0 0.0 48000.0 1 15 24000.0 672000.0 137127.0 1 0.623
48000.0 48000.0 0.0 48000.0 1 15 24000.0 672000.0 150835.8 1 0.623
48000.0 48000.0 0.0 48000.0 1 15 24000.0 672000.0 164544.6 1 0.623
Former JTune is now included in HPJmeter (please do not confuse it with the Jmeter open source performance testing tool). You can either connect directly to a running server or read from a logfile. To be able to connect directly you will need to start an agent with your application. The HPJmeter user guide describes how to do this: http://docs.hp.com/en/5992-5899/index.html.
What makes HPJmeter particularly interesting for me is that you can read Gc logs with it.

HPJmeter also brings with it a comprehensive help which describes Gc concepts and metrics in detail.
Jconsole also provides some basic information in heap utilization.
#jps
16633 Jps
16624 Life
#jconsole 16624

This section describes in detail how to analyse and tune a JVM. As I explained above a optimized configuration can neither be calculated nor guessed because the heap utilization depends on the application and its usage. We usually start with a suboptimal configuration provided by an experienced developer. On a test environment we put realistic load on the application and measure/ analyse the behaviour. From the analysis results we derive a optimized JVM configuration. This configuration will be tested and analysed again and so forth until we get satisfying outcome with our optimized JVM configuration.
In order to demonstrate the optimization of the heap utilization I wrote a small Java application that simulates the creation of two different lifeforms (grass and sheep) at given ratios. This is not a real life simulation. In fact you would be surprised how simple it is. The applications only purpose is to instantiate a lot of small objects with a little lifetime and some bigger objects that have a much higher lifespan. You can find the sample application here: http://bitbucket.org/markfink/testing-software/src/tip/gcview/tuning_sample/.
SCENARIO=grass-sheep
DAYS=5000 # number of days to simulate
SLEEPTIME=10 # slow down the simulation in ms
GRASS_GROWTHRATE=250 # amount of new entities every day
GRASS_MAXAGE=25 # entities die after they reach maxage
GRASS_SIZE=1000 # memory footprint for each entity
SHEEP_GROWTHRATE=5 # amount of new entities every day
SHEEP_MAXAGE=900 # entities die after they reach maxage
SHEEP_SIZE=10000 # memory footprint for each entity
I visualized the behaviour of the simulation in the following diagram:

First approach towards running our sample application is without any configuration of the JVM. The execution failed with an OutOfMemoryError.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at Lifeform.<init>(Life.java:86)
at Life.<init>(Life.java:61)
at Life.main(Life.java:13)
If we analyse a little further we notice 49 full garbage collections within a few seconds. Note that there are no minor collections.

We realize that the heap is way to small and decide to configure its size. The application now finishes without an error. Analysing the garbage collections we see that only one Full-Gc is executed. In the GC-diagram we are now able to identify single minor garbage collector runs and see how the Survivor spaces (From and To) are utilized.

It will be tough to further optimize this heap configuration. In real world I would probably stick with this configuration but for the sake of demonstration I will try to further optimize the heap configuration.
I will start out with my standard heap configuration which turns out to reduce amount of minor-Gc runs by 80 percent. The time spent in GC is now down to 7 sec.
-Xms1500m -Xmx1500m
-XX:NewSize=256m -XX:MaxNewSize=256m
-XX:PermSize=128m -XX:MaxPermSize=128m
-XX:SurvivorRatio=8

Now I start optimizing the size of the survivor spaces. Please note that a bigger SurvivorRatio in the JVM configuration means smaller survivor spaces! I now set the SurvivorRatio to 12. All in all this results into a very tiny improvement. The only significant effect is that the Full-Gcs are now completely gone. The time spent in Gc is now down to 4.77 sec.

All of the above gc diagrams are created from the logfile with a Python script: http://bitbucket.org/markfink/testing-software/src/tip/gcview/
[1] Whitepaper on memory management by Sun, http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf
[2] General information in Java performance, http://java.sun.com/docs/performance
[3] Garbage Collection tuning guide, http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
[4] Article on GC methods of the Java HotSpot VM, http://www.devx.com/Java/Article/21977/
[5] Joseph D. Mocker’s collection of JVM Options, http://blogs.sun.com/roller/resources/watt/jvm-options-list.html
[6] Concurrent Mark and Sweep, http://research.sun.com/techrep/2000/smli_tr-2000-88.pdf
[7] G1, http://research.sun.com/jtech/pubs/04-g1-paper-ismm.pdf and http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5419.pdf