COVTOOL - Free test coverage analyzer for C++

LINKS

Installation instructions
development home page
Mailing list page
Overview of usage
Caveats
Instrumenting
Code Annotations and Statistics

Introduction

COVTOOL is an open source test coverage analyzer for C++ programs. It lets you dynamically instrument your source code as you compile. An instrumented program keeps track of the lines of code that were executed during its run and produces a log of the same upon program termination. Multiple program runs will produce multiple logs.

You can then use the many log files generated during a suite of tests to analyze test coverage percentages. Most importantly, you can use the coverage information to annotate your source. Thus you can find which lines in which source files did not get executed during your entire suite of tests. Code that has been tested may still be buggy, but you should have almost no confidence in code that is never tested.

Test coverage information is not specific to a given executable. You can annotate your program(s) and performance analysis based on many runs of many programs.

With this information in hand, you can add additional tests to ensure that all your lines of code are actually tested. Presumably the additional tests will check the accuracy of your program's behavior, because without this, the test coverage numbers are meaningless.

See theory of operations for a description of how the instrumentation process works.

Annotated source code example

Once you have run a suite of tests with an instrumented executable, you can annotate sourcefiles. The annotation program produces an output like the following:

#include <stdio.h>

void function(int count)

+ {

+   if(count < 10)

-     printf("too low\n");

  else

+     printf("ok!\n");

}

The annotator places a '+' in front of the lines that were both instrumented and executed. The lines that were instrumented but not executed have a '-' in the front. Lines that have a blank in front of them do not contain executable statements. The open '{' for each function is treated as a statement, but the '}' for the function is not.

In the above example, you can see that the count was never less than 10 in all the runs of the function. That means you need to add a test that uses a count which is less than 10 before you ship your product.

For a much more detailed example, see this example

Summary data

In addition to finding out which lines were not executed, the coverage log files (named *.covexp) can tell you the total number of (instrumented) lines of code linked into your program, as well as the total number that were executed. It can provide this information for directories, files, and your code base as a whole.

Individual log files (*.covexp) contain information about individual runs. When you compile multiple .covexp files into a single 'merged.db' you can find out the total number of lines in your program that were instrumented and executed.

You can merge the 'merged.db' files from many directories into a single master copy. In this way you can partition the data in any way that makes sense to you.

The coverage logs are stored in ascii and are easily parsed with a perl script to produce any output format you like. See the next section.

Coverage log and merged database format

Each individual coverage log file (*.covexp) and the merged database file share the same format. While an attempt is made to present the information as attractively as possible, the log files should be viewed as a stream of 'statements' each beginning with a uniquely identifying token:

file: begins a description of a file
el: executed lines in a file
il: instrumented lines in a file
dir: instrumented, executed, and percented covered in a directory and its daughter directories.
totals: total instrumented and executed line info

file statements

A file statement begins with a file: tag and its associated fields. It is then followed by one il: clause and at most one el: clause with their associated fields. The log file is a collection of file: statements ending in a totals: statement. Here is an example:

file: /home/dir/file.c 6 3 50
el: 1 2 6
il: 1 2 3 4 5 6
totals: 6 3 50

In this example, only one file was instrumented, "/home/dir/file.c". That file has 6 instrumented lines. Only 3 were executed, and thus the test coverage percentage is 50%. The executed lines were 1, 2, and 6. The instrumented lines were 1, 2, 3, 4, 5, and 6. Since only one file had instrumentation, the totals for all the coverage runs was identical to that file's individual statistics.

Note that in coverage log files, the full pathname of the instrumented files is kept. This is useful because can run the various tools from an directory and get identical text appearing in the logs. Thus when you merge logs made in different directories, you will get answers you expect.

Caveats

Like any program, COVTOOL might have bugs that prevent you from using it. Let me apologize in advance for any bugs you encounter. Please report them through the sourceforge problem reporting mechanism. If your code won't compile after you start using the instrumentor, please submit a .i file with a macro expanded version of the offending .c file. I can't help without this.

At this time, threads are not supported.

COVTOOL modifies your source code by injecting lots of calls to functions found in covtoolhelper.o AS YOU COMPILE!. These function calls slow down the execution of your program -- maybe a lot. A good guess is about O(ln(N)) where N is the number of lines in the file where the source code is found.

COVTOOL does not, at this time anyway, record execution statistics during static initialization and destruction. Execution order can greatly confuse the issue of writing the runtime data collector.

You must also instrument the program at compile time or you won't get any instrumented or executed line information (il: or el:).

The cov++ instrumentation wrapper script uses the file name extension, .c++, for its own private use. It assumes that you are using .c as the file name extension of your C++ source files. You can override these two file name extensions with others of your own choosing -- but your compiler has to accept both as valid C++ code. For example, use the -EXT option like this:

cov++ -EXT .cpp .c++ ....

If you wish to use a different compiler or a linker, use the -CMD option as follows:

The "-CMD A B" option uses A as the compiler, and B as the linker

For example:

-CMD 'cl /TP' cl

Also, you can override the compiler used by cov++ as well. It is a script, please review the documentation therein.

The true instrumentor program, covtool.exe, makes no assumptions about file naming conventions because it processes stdin to stdout. Few of the examples in the source code distribution actually use cov++. Check out any of the makefiles with a tests:: target in them to see how to use covtool.exe directly.

The g++ -pedantic option causes an outrageous number of warnings when you compile with cov++. This is due to a bug in -pedantic. The g++ -E option produces code that g++ -pedantic doesn't like. Feel free to complaint to the GNU folks.

Using the tools

Here are the major steps in using COVTOOL:

develop the source code for your program without using COVTOOL
develop tests that you think cover the code well (make sure you check for correct program outputs).
run the tests and make sure they all give good answers.
recompile your program using the COVTOOL instrumentor (covtool.exe or more easily cov++). You must also link your program using cov++ or if you choose to link manually, you should include covtoolhelper.o as the last object module on the link command line before the C++ runtime libraries -- at least on Linux. Other platforms might require it be the first object not the last.
rerun the tests (if necessary fix bugs that instrumentation has brought to light. Missing return statements and the use of pointers after they have been deleted are bugs that might finally show up when you instrument).
merge the various coverage log files (*.covexp) into a merged.db file
find the files with low coverage percentages and add additional tests to bring those percentages up. You can use the annotation tool (covannotate.exe) to see which lines aren't getting tested. (Actually, using the html generating script, gen_html, is probably the easiest way to get annotated sources. It reads your MERGED coverage database and produces all the html needed to view your coverage situation).
repeat the above until you are satisfied with your coverage situation.

Instrumenting your program

The easiest way to instrument a program is to change the compilation directive from 'g++' to 'cov++'. 'cov++' is a wrapper around both g++ and covtool.exe. Its goal is to encapsulate the ugly steps that would othewise have to be taken manually. 'cov++' also adds a #define for COVTOOL_INST -- in case you want to do different things when instrumenting -- like turning of threads.

If you have a makefile, and you defined your suffix rules like this:

.SUFFIXES: .c .o
.c.o:
g++ -c [options] $<

You would change it only slightly:

.SUFFIXES: .c .o
.c.o:
cov++ -c [options] $<

Actually, you can simply your life somewhat by making the change permanent in your makefile, but controlled with a flag. Consider:

.SUFFIXES: .c .o

INSTRUMENTATION=0

ifeq ($(INSTRUMENTATION),0)
.c.o:
g++ -c [options] $<
else
.c.o:
cov++ -c [options] $<
endif

In this way, you can turn instrumentation on and off just by changing the value of INSTRUMENTATION.

Important note: You must also link your program using cov++. Or, if you choose to link without cov++, you must include covtoolhelper.o as the last object module before the C++ runtime libraries (at least on Linux. On other os's [if we ever port] that module might have to go first, not last).

Also, you can choose to have cov++ include a debuggable version of the runtime data collector. Add the -CBG option. It will allow you to debug the source found in the installation directory's file covtoolhelper.c.

Running an instrumented program

When running your instrumented program, you do not have to do anything different than you normally do. Each invocation will produce a new *.covexp file. When you have run all your tests, you may want to merge your log files into a single database file. See the next section.

Normally, the runtime data collector names the coverage log files like this:

cov-run-[PID].covexp

where [PID] is a number -- the process id of the program run.

If you would like to group your *.covexp files, you can set an environment variable, COVTOOL_PREFIX, before running your program. This will cause the collector to replace the 'cov-run' part of the file name with something of your choosing. So, bash users can do either of the following:

export COVTOOL_PREFIX=group1
run_your_program

Or you could do this:

COVTOOL_PREFIX=group2 run_your_program

From your makefile, you could do something like this (if you are using gmake ;-)

.EXPORT COVTOOL_PREFIX

COVTOOL_PREFIX=cov-run

test_group_name:: COVTOOL_PREFIX=this_group_name

test_group_name:: test1
...
test_group_name:: test2
...
test_group_name:: test3

This lets the default prefix be cov-run, but for a specific set of tests, the environment variable is overridden. When test1 - test3 runs, they will create log files of the form:

this_group_name-[PID].covexp

Merging coverage log files

The program, covmerge.exe, reads a list of coverage log files and prints the merged output file to stdout. Here is how you would merge all the generated coverage log files in a directory tree into a single merged database:

covmerge.exe `find * -name '*.covexp' -print` >merged.db

Annotating source files

Annotating source files is best done using a merged database of all the test runs -- but you can leave them un-merged if you so choose. Here is an example of how to annotate a file:

covannotate.exe file.c *.covexp >file.c.annotated

For an example of an annotated source file, see the example in the introduction

If you want the coverage statistics, you almost certainly need a merged database. See the previous section.

See the section above for a description of the coverage information log/database files. Here is how to get the total coverage information:

grep totals: merged.db

This will produce a line that looks like this:

totals: [total_instrumented_lines] [total_executed_lines] [percent_covered]

for example: "totals: 6 3 50"

To get the coverage statistics for a specific file, do this:

grep /pathname/file.c merged.db

and you should get something like this:

file: /pathname/file.c [instrumented_lines] [executed_lines] [percent]

for example: "file: /home/lboggs/covtool/covtool.c 1228 1190 96"

Once you have merged all the coverage instrumenatation data into a merged.db file, you can use the script, gen_html, to build web pages showing all the annotationed source files and tabular descriptions of the coverage statistics on a directory by directory basis.

Installing COVTOOL

Downloadable version of COVTOOL can be found at the sourceforge COVTOOL project download site

Once you have downloaded the version you desire, use gunzip to extract the tar file from the .gz file. Then use tar xvf to extract the distribution files from the tar ball.

Once downloaded and extracted, see the file README for more details.

The runtime data collector

The test coverage log files are created as your program runs by the runtime data collector. This is a collection of functions defined in the file covtoolhelper.c. As shipped, this program fragment does nothing more than keep track of whether or not specific lines in specific files are instrumented and executed. It does not keep track of how often they were executed -- although this would be a simple change to the tools.

Further, the runtime data collector does not support threads -- so if you need that feature you will have to put a thread lock around the function CvT_record_line____(). Or rather _in_ that function. Data collection is not turned on until after main completes running so as to avoid static initialization problems.

The runtime data collector could also be modified, quite easily to check for the validity of program state. In absence of purify, you could add heap validity testing to the data collection associated with every line in your program -- of course the run time would become almost infinite if you did.

	#include <stdio.h>

	void function(int count)
+	{
+	if(count < 10)
-	printf("too low\n");
	else
+	printf("ok!\n");
	}

file:	begins a description of a file
el:	executed lines in a file
il:	instrumented lines in a file
dir:	instrumented, executed, and percented covered in a directory and its daughter directories.
totals:	total instrumented and executed line info