Wednesday, October 22, 2014

WinDbg and PE I

Portable Executable Anatomy

In WinDbg, File->Open Executable and select C:\Windows\System32\notepad.exe

CommandLine: C:\Windows\System32\notepad.exe
Symbol search path is: symsrv*symsrv.dll*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
ModLoad: 00680000 006b0000   notepad.exe
ModLoad: 77580000 776bc000   ntdll.dll
ModLoad: 76d10000 76de4000   C:\Windows\system32\kernel32.dll
ModLoad: 756a0000 756eb000   C:\Windows\system32\KERNELBASE.dll
ModLoad: 77240000 772e0000   C:\Windows\system32\ADVAPI32.dll
ModLoad: 770c0000 7716c000   C:\Windows\system32\msvcrt.dll
ModLoad: 77790000 777a9000   C:\Windows\SYSTEM32\sechost.dll
ModLoad: 76df0000 76e92000   C:\Windows\system32\RPCRT4.dll
ModLoad: 75c10000 75c5e000   C:\Windows\system32\GDI32.dll
ModLoad: 77170000 77239000   C:\Windows\system32\USER32.dll
ModLoad: 76d00000 76d0a000   C:\Windows\system32\LPK.dll
ModLoad: 75a50000 75aed000   C:\Windows\system32\USP10.dll
ModLoad: 772e0000 7735b000   C:\Windows\system32\COMDLG32.dll
ModLoad: 77710000 77767000   C:\Windows\system32\SHLWAPI.dll
ModLoad: 74a20000 74bbe000   C:\Windows\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.7601.17514_none_41e6975e2bd6f2b2\COMCTL32.dll
ModLoad: 75c60000 768aa000   C:\Windows\system32\SHELL32.dll
ModLoad: 743e0000 74431000   C:\Windows\System32\WINSPOOL.DRV
ModLoad: 758f0000 75a4c000   C:\Windows\system32\ole32.dll
ModLoad: 76fd0000 7705f000   C:\Windows\system32\OLEAUT32.dll
ModLoad: 74bf0000 74bf9000   C:\Windows\System32\VERSION.dll
(e2c.85c): Break instruction exception - code 80000003 (first chance)

When a PE is opened up, the OS loader will load the modules from which our executable imports the functions.
  • Each of these DLLs will occupy a specific memory address range. For example, notepad.exe uses ntdll.dll, so ntdll.dll is loaded along with notepad.exe and occupies the address range: 77580000 776bc000.
  • The base address of ntdll.dll is thus 77580000.
  • We can display loaded modules using lm command.
Every Portable Executable (PE) begins with a DOS header having the structure of type: _IMAGE_DOS_HEADER. It is located at the base address of the main module, which is 00680000 of notepad.exe in this example.

0:000>dt _IMAGE_DOS_HEADER 00680000
ntdll!_IMAGE_DOS_HEADER
   +0x000 e_magic          : 0x5a4d  // MZ Signature of the PE
   +0x002 e_cblp           : 0x90
   +0x004 e_cp             : 3
   +0x006 e_crlc           : 0
   +0x008 e_cparhdr        : 4
   +0x00a e_minalloc       : 0
   +0x00c e_maxalloc       : 0xffff
   +0x00e e_ss             : 0
   +0x010 e_sp             : 0xb8
   +0x012 e_csum           : 0
   +0x014 e_ip             : 0
   +0x016 e_cs             : 0
   +0x018 e_lfarlc         : 0x40
   +0x01a e_ovno           : 0
   +0x01c e_res            : [4] 0
   +0x024 e_oemid          : 0
   +0x026 e_oeminfo        : 0
   +0x028 e_res2           : [10] 0
   +0x03c e_lfanew         : 0n224  // Decimal value of the PE File Header Offset

Two important fields:
  • e_magic: This has the MZ signature hex value, 0x5a4d. Any PE will begin with the characters "MZ" which are present in the DOS header.
  • e_lfanew: It holds offset to the PE File Header. This value is in decimal (0n) and we must convert it to hex before locating the PE File Header using it (0xE0). Therefore, the PE File header is at an offset 0xE0 from the Image Base address of notepad.exe (00680000).
Data Directories of Interest

Now let us examine PE header :

0:000>!dh 00680000 -f
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
     14C machine (i386)
       4 number of sections
4A5BC60F time date stamp Mon Jul 13 16:41:03 2009

       0 file pointer to symbol table
       0 number of symbols
      E0 size of optional header
     102 characteristics
            Executable
            32 bit word machine

OPTIONAL HEADER VALUES
     10B magic #
    9.00 linker version
    A800 size of code
   22400 size of initialized data
       0 size of uninitialized data
    3689 address of entry point
    1000 base of code
         ----- new -----
00680000 image base
    1000 section alignment
     200 file alignment
       2 subsystem (Windows GUI)
    6.01 operating system version
    6.01 image version
    6.01 subsystem version
   30000 size of image
     400 size of headers
   39741 checksum
00040000 size of stack reserve
00011000 size of stack commit
00100000 size of heap reserve
00001000 size of heap commit
    8140  DLL characteristics
            Dynamic base
            NX compatible
            Terminal server aware
       0 [       0] address [size] of Export Directory
    A048 [     12C] address [size] of Import Directory
    F000 [   1F160] address [size] of Resource Directory
       0 [       0] address [size] of Exception Directory
       0 [       0] address [size] of Security Directory
   2F000 [     E34] address [size] of Base Relocation Directory
    B62C [      38] address [size] of Debug Directory
       0 [       0] address [size] of Description Directory
       0 [       0] address [size] of Special Directory
       0 [       0] address [size] of Thread Storage Directory
    6D58 [      40] address [size] of Load Configuration Directory
     278 [     128] address [size] of Bound Import Directory
    1000 [     400] address [size] of Import Address Table Directory
       0 [       0] address [size] of Delay Import Directory
       0 [       0] address [size] of COR20 Header Directory
       0 [       0] address [size] of Reserved Directory

The bottom array of directories is the data directory array. It's an array of structures of IMAGE_DATA_DIRECTORY type each having two fields: VirtualAddress and Size. The virtual address is the Relative Virtual Address (RVA) corresponding to the image base address (00680000). This RVA will point to another structure. The data directories we are interested in are:
  • Import Directory
  • Import Address Table Directory
  • Export Directory
Import Directory (Import Table)

The Import Table at (0068000 + A048 = 0068A048) contains an array of data structures of type: _IMAGE_IMPORT_DESCRIPTOR. Let's examine this data type.

0:000>dt _IMAGE_IMPORT_DESCRIPTOR
ole32!_IMAGE_IMPORT_DESCRIPTOR
   +0x000 Characteristics  : Uint4B      // union with the second
   +0x000 OriginalFirstThunk : Uint4B    // union with the first
   +0x004 TimeDateStamp    : Uint4B
   +0x008 ForwarderChain   : Uint4B
   +0x00c Name             : Uint4B
   +0x010 FirstThunk       : Uint4B

  • Each DLL loaded along with the main module will have its own _IMAGE_IMPORT_DESCRIPTOR structure.
  • There is no field that will give us an idea about how many _IMAGE_IMPORT_DESCRIPTOR structures are present in the import table (array); however, the end of this table is denoted by a structure who's all the fields are set to 0.
  • OriginalFirstThunk: points to Imports Name Table
  • FirstThunk: points to Import Address Table
  • Both tables are identical to each other till the point when the PE is mapped to memory by the OS loader.
Let us view the memory at the Import Table Virtual Address (0068000 + A048 = 0068A048):
0:000> dd 0068A048
0068a048  0000a234 ffffffff ffffffff 0000a224
0068a058  00001000 0000a260 ffffffff ffffffff
0068a068  0000a214 0000102c 0000a380 ffffffff
0068a078  ffffffff 0000a208 0000114c 0000a3dc
0068a088  ffffffff ffffffff 0000a1fc 000011a8
0068a098  0000a50c ffffffff ffffffff 0000a1f0
0068a0a8  000012d8 0000a56c ffffffff ffffffff
0068a0b8  0000a1e0 00001338 0000a594 ffffffff
or
0:000> dt _IMAGE_IMPORT_DESCRIPTOR 0068A048
ole32!_IMAGE_IMPORT_DESCRIPTOR
   +0x000 Characteristics  : 0xa234
   +0x000 OriginalFirstThunk : 0xa234 // point to Imports Name Table
   +0x004 TimeDateStamp    : 0xffffffff
   +0x008 ForwarderChain   : 0xffffffff
   +0x00c Name             : 0xa224
   +0x010 FirstThunk       : 0x1000 // point to Import Address Table

Let us examine Imports Name Table (00680000 + 0xa234 = 0068a234):
0:000> dd 0068a234
0068a234  0000a634 0000a646 0000a65a 0000a668
0068a244  0000a678 0000a688 0000a698 0000a6ae
0068a254  0000a6c4 0000a6d4 00000000 0000a6e6
0068a264  0000a6f6 0000a704 0000a714 0000a722
0068a274  0000a734 0000a746 0000a756 0000a772
0068a284  0000a77e 0000a78a 0000a79c 0000a7ba
0068a294  0000a7d0 0000a7ec 0000a802 0000a822
0068a2a4  0000a844 0000a856 0000a86a 0000a87a

We get a list of RVAs. Each of these RVAs is also called the _IMAGE_THUNK_DATA which in turn points to the structure of type: _IMAGE_IMPORT_BY_NAME structure:
0:000> dt _IMAGE_IMPORT_BY_NAME
ole32!_IMAGE_IMPORT_BY_NAME
   +0x000 Hint             : Uint2B     // WORD
   +0x002 Name             : [1] UChar  // BYTE
  • Name is the Name of the Imported Function which can have a variable length.
  • So there is one-to-one correspondence between _IMAGE_THUNK_DATA and _IMAGE_IMPORT_BY_NAME structure.
Let us view it:
0:000> dt _IMAGE_IMPORT_BY_NAME 0068a634
ole32!_IMAGE_IMPORT_BY_NAME
   +0x000 Hint             : 0x27e
   +0x002 Name             : [1]  "R"
or
0:000> dc 0068a634
0068a634  6552027e 74655367 756c6156 57784565  ~.RegSetValueExW
0068a644  026e0000 51676552 79726575 756c6156  ..n.RegQueryValu
0068a654  57784565 02300000 43676552 65736f6c  eExW..0.RegClose
0068a664  0079654b 6552023c 65724367 4b657461  Key.<.RegCreateK
0068a674  00577965 65520261 65704f67 79654b6e  eyW.a.RegOpenKey
0068a684  00577845 73490180 74786554 63696e55  ExW...IsTextUnic
0068a694  0065646f 6c430057 5365736f 69767265  ode.W.CloseServi
0068a6a4  61486563 656c646e 02240000 72657551  ceHandle..$.Quer
  • As you can see, the Names of Functions are stored in sequence.
  • The Hint value is set by the linker.
This way, the names of functions are populated in the Import Names Table pointed by the OriginalFirstThunk.

Now, let us examine the FirstThunk (0x1000) (Import Address Table) in _IMAGE_IMPORT_DESCRIPTOR.
0:000> dd 00681000
00681000  77251456 7725462d 7725461d 77251494
00681010  7725460d 7725440e 7725361c 7724b4d7
00681020  7724c9ec 7724ca04 00000000 76d75696
00681030  76d5a235 76d5a293 76d5a345 76d5b0d0
00681040  76d5b2b2 76d5abff 76d52c8b 76d5c470
00681050  775d2dd6 76d5fd8d 76d5bebd 76d5c502
00681060  76d50478 76d64ccc 76da0289 76d4c2f1
00681070  76d5553e 76d5d850 76d5dd72 76d5eff2

We see that it is already populated with virtual addresses. Let us view them.
0:000> ln 77251456 
(77251456)   ADVAPI32!RegSetValueExWStub   |  (77251494)   ADVAPI32!RegCreateKeyW
Exact matches:
    ADVAPI32!RegSetValueExWStub = <no type information>
0:000> ln 7725462d
(7725462d)   ADVAPI32!RegQueryValueExWStub   |  (7725463d)   ADVAPI32!RegEnumKeyExW
Exact matches:
    ADVAPI32!RegQueryValueExWStub = <no type information>

So, these are the virtual address of the functions imported by our PE from the ADVAPI32.dll loaded module. The reason we see this table populated with the virtual addresses already is that our PE is already loaded by the OS Loader and the Import Address Table is already filled with function pointers.

We will focus now on the Names field of the _IMAGE_IMPORT_DESCRIPTOR. This field is important because it gives information about the Name of the DLL. We can check its Name:
0:000> dc 0068a224
0068a224  41564441 32334950 6c6c642e 90909000  ADVAPI32.dll....

Similarly, we can parse the Import Table to locate the next:
0:000> dd 0068A048
0068a048  0000a234 ffffffff ffffffff 0000a224
0068a058  00001000 0000a260 ffffffff ffffffff
0068a068  0000a214 0000102c 0000a380 ffffffff
0068a078  ffffffff 0000a208 0000114c 0000a3dc
0068a088  ffffffff ffffffff 0000a1fc 000011a8
0068a098  0000a50c ffffffff ffffffff 0000a1f0
0068a0a8  000012d8 0000a56c ffffffff ffffffff
0068a0b8  0000a1e0 00001338 0000a594 ffffffff
or
0:000> dt _IMAGE_IMPORT_DESCRIPTOR 0068A048 + 4 * 5
ole32!_IMAGE_IMPORT_DESCRIPTOR
   +0x000 Characteristics  : 0xa260
   +0x000 OriginalFirstThunk : 0xa260
   +0x004 TimeDateStamp    : 0xffffffff
   +0x008 ForwarderChain   : 0xffffffff
   +0x00c Name             : 0xa214
   +0x010 FirstThunk       : 0x102c

Let us view the name of the next loaded module:
0:000> dc 0068a214
0068a214  4e52454b 32334c45 6c6c642e 90909000  KERNEL32.dll....

As mentioned before, the end of the _IMAGE_IMPORT_DESCRIPTOR array is denoted by a structure filled with all NULL values as shown below:
0:000> dd 0068A048 L50
0068a048  0000a234 ffffffff ffffffff 0000a224
0068a058  00001000 0000a260 ffffffff ffffffff
0068a068  0000a214 0000102c 0000a380 ffffffff
0068a078  ffffffff 0000a208 0000114c 0000a3dc
0068a088  ffffffff ffffffff 0000a1fc 000011a8
0068a098  0000a50c ffffffff ffffffff 0000a1f0
0068a0a8  000012d8 0000a56c ffffffff ffffffff
0068a0b8  0000a1e0 00001338 0000a594 ffffffff
0068a0c8  ffffffff 0000a1d4 00001360 0000a5b8
0068a0d8  ffffffff ffffffff 0000a1c4 00001384
0068a0e8  0000a5c8 ffffffff ffffffff 0000a1b8
0068a0f8  00001394 0000a5e4 ffffffff ffffffff
0068a108  0000a1ac 000013b0 0000a5f0 ffffffff
0068a118  ffffffff 0000a19c 000013bc 0000a604
0068a128  ffffffff ffffffff 0000a18c 000013d0
0068a138  0000a610 ffffffff ffffffff 0000a180
0068a148  000013dc 0000a624 ffffffff ffffffff
0068a158  0000a174 000013f0 00000000 00000000
0068a168  00000000 00000000 00000000 53524556
0068a178  2e4e4f49 006c6c64 6c64746e 6c642e6c

Discussion:
How API calls made in a program are replaced by bytecode by a compiler?

Assume we use kernel32!GetSystemTimeAsFileTime(). This call will be replaced by the following instruction by our compiler:

CALL DWORD PTR DS:[010010EC]

DS:[010010EC] = 7C8017E9

  • The reason is, instead of hard coding the function pointer of the API in the bytecode, we give a pointer to the memory location where this function pointer will be stored.
  • The advantage of doing so is that if we invoke this API in multiple locations in our program, we need not modify the addresses in all those locations if the function pointer happens to change in a newer version of the DLL.
  • 010010EC is a memory location which has the address of GetSystemTimeAsFileTime() imported from kernel32.dll. It is a memory address inside the Import Address Table of the main module.


























No comments:

Post a Comment