drcov
is a DynamoRIO-based tool that collects coverage information from a
binary. There are many useful tools, such as Lighthouse that
make use of the drcov
file format. This format is not strictly exclusive to
drcov
. Any DBI tool or framework can be used to collect the neccessary
information. In fact, Lighthouse contains experimental scripts that use
Frida and Intel PIN to collect the same coverage
information.
As useful as it is, the drcov
file format is not officially documented by
the DynamoRIO project. Hopefully, the information presented below will make it
easier to get started with the format, especially if you want to write your own
coverage collection tool with a different DBI.
The file format begins with a header containing some metadata.
DRCOV VERSION: 2
DRCOV FLAVOR: drcov
As Lighthouse only supports version 2 log files, I did not look into what the
version 1 log file format is. DRCOV FLAVOR
is a string is used to describe
the tool that generated the coverage information and does not actually impact
anything.
Next, the log file has the module table that contains a map of the loaded modules in the process that the coverage information is collected from.
Module Table: version 2, count 39
Columns: id, base, end, entry, checksum, timestamp, path
0, 0x10c83b000, 0x10c83dfff, 0x0000000000000000, 0x00000000, 0x00000000, /Users/ayrx/code/frida-drcov/bar
1, 0x112314000, 0x1123f4fff, 0x0000000000000000, 0x00000000, 0x00000000, /usr/lib/dyld
2, 0x7fff5d866000, 0x7fff5d867fff, 0x0000000000000000, 0x00000000, 0x00000000, /usr/lib/libSystem.B.dylib
3, 0x7fff5dac1000, 0x7fff5db18fff, 0x0000000000000000, 0x00000000, 0x00000000, /usr/lib/libc++.1.dylib
4, 0x7fff5db19000, 0x7fff5db2efff, 0x0000000000000000, 0x00000000, 0x00000000, /usr/lib/libc++abi.dylib
5, 0x7fff5f30d000, 0x7fff5fa93fff, 0x0000000000000000, 0x00000000, 0x00000000, /usr/lib/libobjc.A.dylib
8, 0x7fff60617000, 0x7fff60647fff, 0x0000000000000000, 0x00000000, 0x00000000, /usr/lib/system/libxpc.dylib
... snip ...
This is probably the messiest part of the drcov
file format as it has had
quite a few changes. I take the documentation in the Lighthouse project at
face value since my primary goal is to have coverage log files that work in
Lighthouse.
As documented by Lighthouse, the Module Table
header has two variations,
both of which contain the number of entries in the module table.
Format used in DynamoRIO v6.1.1 through 6.2.0
eg: 'Module Table: 11'
Format used in DynamoRIO v7.0.0-RC1 (and hopefully above)
eg: 'Module Table: version X, count 11'
Each version has a slightly different table format.
DynamoRIO v6.1.1, table version 1:
eg: (Not present)
DynamoRIO v7.0.0-RC1, table version 2:
Windows:
'Columns: id, base, end, entry, checksum, timestamp, path'
Mac/Linux:
'Columns: id, base, end, entry, path'
DynamoRIO v7.0.17594B, table version 3:
Windows:
'Columns: id, containing_id, start, end, entry, checksum, timestamp, path'
Mac/Linux:
'Columns: id, containing_id, start, end, entry, path'
DynamoRIO v7.0.17640, table version 4:
Windows:
'Columns: id, containing_id, start, end, entry, offset, checksum, timestamp, path'
Mac/Linux:
'Columns: id, containing_id, start, end, entry, offset, path'
Of the many values, only id
, start
(or base
), end
, and path
are
actually required when it comes to interoperability with Lighthouse.
id
: This is a sequential number assigned when generating the module table. It is later used to map a basic block to a module.start
,base
: This is the memory address where the module starts.end
: This is the memory address where the module ends.path
: This is the path where the module is located on disk.
Finally, the log file has a basic block table that contains a list of basic
blocks that were executed when the coverage information is being collected.
While drcov
can dump the basic block table in text format (with the
-dump_text
option), it defaults to dumping the table in binary format which
is what will be most commonly seen.
BB Table: 861 bbs
<binary data>
The table starts with a header that indicates the number of basic blocks in the
table. The binary data that follows the BB Table
header is an array of
_bb_entry_t
structs that is 8 bytes each. The format of each _bb_entry_t
struct in the table is as follows:
typedef struct _bb_entry_t {
uint start; /* offset of bb start from the image base */
ushort size;
ushort mod_id;
} bb_entry_t;
Each item in the struct is rather self-explanatory.
start
: This is the offset from the base of the module where the basic block entry starts.size
: This is the size of the basic block.mod_id
: This is the id of the module where the basic block is found. This corresponds to the id assigned to the module when generating the module table.
These 3 items, when combined with the module table, allows us to know which basic blocks were executed when the coverage information was collected.