RSS Feed

Peter Goodman's blog about PHP, Parsing Theory, C++, Functional Programming, Applications,

Tracking Data with Function Pointers

Recently I presented a poster at OSDI'12. The poster outlined our use of dynamic binary translation (DBT) for analysing operating system (OS) kernel modules. One novelty of our approach is that we ensure that only module code is analysed; non-module kernel code is never translated. This restriction entails taking control when module code executes (so that it can be translated) and relinquishing control when non-module kernel code executes. To regain control when kernel code invokes module code, we proactively search for and change function pointers in shared data structures.

Proactively changing function pointers that potentially point into module code is achieved by interposing on the interface between modules and the kernel. Modules and the kernel share data structures, and those data structures can contain function pointers. Finding and changing function pointers requires recursively applying a replacement function to the fields of data structures, starting from the "root" function arguments. Without guards in place, this recursive process might not terminate (e.g. a cyclic data structure). In the case of deeply linked data structures (e.g. trees), this recursive process might be expensive. To avoid this expense, we apply the replacement function only to those data structures that have changed.

Suppose we have the following code that defines a function called func_name and a function pointer field called func_ptr in a struct foo.

void func_name(void) {
    printf("hello world!n");

struct foo {
    void (*func_ptr)(void);

int main(void) {
    struct foo bar;
    bar.func_ptr = &func;_name;
    return 0;

The code in the main function roughly corresponds to the following objects in memory:

On the left, we have func_ptr, which contains the address (0xBEEF) of the func_name function. When func_ptr is invoked, control transfers into the func_name function. This is signalled by the instruction pointer (%rip) changing to 0xBEEF.

Recall that the goal is to detect changes to data structures. Suppose that we have no control over the allocation (static, stack, or heap) or layout/structure/semantics of the data structures that we want to track. Given these constraints, there does not appear to be a convenient way to embed information inside of a structure.

Two immediate approaches come to mind: i) embed some information in pointers to the data structures that we want to track, or; ii) use a map to associate addresses of data structures to their tracking meta-information.

The first solution is undersirable for four reasons:

  1. One must ensure that all instances of the original pointer are altered.
  2. One must ensure that the altered data structure pointer is correctly used.
  3. The meta-information of distinct instances of the altered pointer might get out of sync.
  4. There is a limited number of useful bits for meta-information in pointers.

The second solution has many desirable properties and would work. However, if one is tracking many data structures, then the cost of maintaining the map might be undesirable.

A third solution exists if we know the types of the fields of the data structures to track. As previously stated, we have no control over the allocation or semantics of a data structure. This implies that we cannot extend the structure to contain meta-information (or a pointer thereof), and that we cannot arbitrarily change values in the structure (lest we break the program semantics). Suppose, however, that we knew that a structure contained a function pointer. Then changing that function is fine, so long as the control-flow behaviour when invoking that function pointer remains the same. Consider the following:

Above, we introduced extra indirection in the form of a jmp instruction at address 0xFEED. When func_ptr is invoked, control transfers to 0xFEED, which then jmps to 0xBEEF.

But, instructions are just another form of data. There is no reason (at least on x86) that we can't just put some meta-information beside the newly inserted jmp. For example:

struct meta {
    byte jmp_code[5] __attribute__((aligned (8)));
} __attribute__((packed));

struct meta *func_name_meta = …;

Suppose that func_name_meta is initialized to a pointer to a struct meta object, and that object is located in executable memory. Futher, suppose that the jmp_code of func_name_meta is intialized to a 5-byte jmp instruction that transfers control to func_name (0xBEEF). Then we can swap func_ptr with func_name_meta and still expect the same control-flow behaviour. Why?

The first five bytes of *func_name_meta are machine code, and the entire structure lives in executable memory. The next N bytes of the struct meta object contain meta-information (used for detecting changes). The address of the struct meta object (func_name_meta) is also the address of the first field (jmp_code) within the object. As a result, replacing a pointer to func_name with func_name_meta is valid insofar as we are changing one code pointer with another. When control transfers to func_name_meta, the jmp instruction in the jmp_code field transfers control to func_name.

Usefully storing and extracting information from the meta-information is convenient:

struct meta *meta_of_bar = (struct meta *) bar.func_ptr;

Again, taking advantage of the layout of struct meta, we can now cast function pointers into struct meta pointers and operate on function pointers as if they were pointers to objects (because they are!).

There is a bit more going on behind the scenes in Granary, in particular: the allocation of the executable code, what meta-information is kept, how untracked objects are detected and handled, and garbage collection of object trackers. However, I think this article has outlined the salient points of the approach, which I believe is more general than simple object tracking. Hopefully you will find this technique as fun/evil as I do!


There are no comments, but you look like you have something on your mind.