If you’ve ever had a look at an application from the point of view of a reverse engineer you know that there are many tools out there capable of doing a pretty good static analysis. Things get a bit trickier when we move on to dynamic analysis though. When you first give your binary file to IDA it will locate the entry point and try to analyse the whole binary using a flooding-like algorithm. What IDA can’t do is guessing the possible values of the registers in each step of the process. This can cause all sorts of trouble when we come across some self-modifying code or any other binary obfuscation mechanism. In this post I’ll go through one of these problems, possibly the most common and one of the easiest to overcome; virtual functions.
A virtual is a dynamically bound function for which we usually don’t have an exported symbol. This means that the position of the function code is not known at compile time, but calculated when the program is running. For this reason, static disassemblers cannot easily resolve the call into a symbolic name, thus making reversing harder. On the other hand, this allows for many cool tricks that object-oriented guys like calling inheritance, polymorphism and other fancy names that we are not interested in. In the end it all comes down to a table of pointers (vtable from now on) that gets filled in at some point. When the program needs to call one of the virtual functions it just goes to the corresponding offset, grabs the pointer and makes the call. This is how it looks like in x86 assembly.
mov eax, [edi] ; move the address of the vtable into eax push edi ; push "this" pointer call dword ptr [eax+4] ; call function at offset 4
Most C++ compilers, including Microsoft’s and GNU’s, put a pointer to the vtable at offset 0 in the storage space assigned to the object. Now the first line of the assembly code makes sense; the code snippet just grabs the address of the vtable in the first line and makes the actual call in the third one. The second line is just a push of the address of the object itself. In C++ the (hidden) first argument to every function is a pointer to the object that owns the function. The standard name for the pointer is “this”.
This looks like a nice trick so why not implement it in C as well? Well, it looks like Microsoft already thought of that as playing with these things I found the C interface definitions intertwined with the ones for C++.
EXTERN_C const IID IID_IOplockStorage;
#if defined(__cplusplus) && !defined(CINTERFACE)
MIDL_INTERFACE("8d19c834-8879-11d1-83e9-00c04fc2c6d4")
IOplockStorage : public IUnknown
{
public:
virtual HRESULT STDMETHODCALLTYPE CreateStorageEx(
/* [in] */ LPCWSTR pwcsName,
/* [in] */ DWORD grfMode,
/* [in] */ DWORD stgfmt,
/* [in] */ DWORD grfAttrs,
/* [in] */ REFIID riid,
/* [iid_is][out] */ void **ppstgOpen) = 0;
virtual HRESULT STDMETHODCALLTYPE OpenStorageEx(
/* [in] */ LPCWSTR pwcsName,
/* [in] */ DWORD grfMode,
/* [in] */ DWORD stgfmt,
/* [in] */ DWORD grfAttrs,
/* [in] */ REFIID riid,
/* [iid_is][out] */ void **ppstgOpen) = 0;
};
#else /* C style interface */
typedef struct IOplockStorageVtbl
{
BEGIN_INTERFACE
HRESULT ( STDMETHODCALLTYPE *QueryInterface )(
IOplockStorage * This,
/* [in] */ REFIID riid,
/* [iid_is][out] */ void **ppvObject);
ULONG ( STDMETHODCALLTYPE *AddRef )(
IOplockStorage * This);
ULONG ( STDMETHODCALLTYPE *Release )(
IOplockStorage * This);
HRESULT ( STDMETHODCALLTYPE *CreateStorageEx )(
IOplockStorage * This,
/* [in] */ LPCWSTR pwcsName,
/* [in] */ DWORD grfMode,
/* [in] */ DWORD stgfmt,
/* [in] */ DWORD grfAttrs,
/* [in] */ REFIID riid,
/* [iid_is][out] */ void **ppstgOpen);
HRESULT ( STDMETHODCALLTYPE *OpenStorageEx )(
IOplockStorage * This,
/* [in] */ LPCWSTR pwcsName,
/* [in] */ DWORD grfMode,
/* [in] */ DWORD stgfmt,
/* [in] */ DWORD grfAttrs,
/* [in] */ REFIID riid,
/* [iid_is][out] */ void **ppstgOpen);
END_INTERFACE
} IOplockStorageVtbl;
interface IOplockStorage
{
CONST_VTBL struct IOplockStorageVtbl *lpVtbl;
};
/* ... */
#endif /* C style interface */
You can see in the code that there is an equivalent C interface with three extra calls, i.e. QueryInterface, AddRef and Release. These calls are used for introspection and reference counting but we won’t bother with them in this post. At the end of the snippet we can see the definition for IOplockStorage, which is the C structure equivalent to the C++ object instance. Knowing this we could use exactly the same code to make a call to the C and the C++ interfaces, but for the fact that the offsets are different. Anyway, most times we can figure out which interface a program is using by just looking at the series of push instructions that precede the function call.
push eax push ebx push ecx push eax push edx push eax mov eax, [edi] ; move the address of the vtable into eax push edi ; push "this" pointer call dword ptr [eax+4] ; call function at offset 4
In this block of code we see 6 pushes before the call. From this we infer that the function has 6 parameters plus the “this” pointer that gets pushed with push edi. Say that we know edi points to an IOplockStorage interface. We just go through the interface specifications and find that the function at offset 4 in the C interface is AddRef, which takes only the “this” pointer as a parameter. On the other hand we have the C++ interface which has OpenStorageEx at position 4. OpenStorageEx takes 6 parameters plus the “this” pointer, which is exactly what we see in the disassembly.
To sum up, when you find a chunk of assembly code that uses indirect references to call functions you are possibly facing a vtable dereference. At that moment you should try and find out where the base pointer comes from. This is usually a call to a statically bound function that IDA knows how to solve. When you know the type of the base pointer (look it up in the header file) you should grab the interface definition (usually a .h file) and figure out if the call is C or C++. After you’ve done all this the analysis turns into a static assembly code review with nicely solved function calls.
Happy hacking!
Comments (0)