pedbg: Windows 7 Memory Management IV

Heap Manager

Most applications allocate smaller blocks than the 64-KB minimum allocation granularity possible using page granularity functions such as VirtualAlloc and VirtualAllocExNuma.

Allocating such a large area for relatively small allocations is not optimal from a memory usage and performance standpoint.
To address this need, Windows provides a component called the heap manager, which manages allocations inside larger memory areas reserved using the page granularity memory allocation functions.

The allocation granularity in the heap manager is relatively small: 8 bytes on 32-bit systems, and 16 bytes on 64-bit systems.

The heap manager exists in two places: Ntdll.dll and Ntoskrnl.exe. The C runtime (CRT) uses the heap manager when using functions such as malloc, free, and the C++ new operator.
The most common Windows heap functions are:

HeapCreate or HeapDestroy. Creates or deletes, respectively, a heap. The initial reserved and committed size can be specified at creation.
HeapAlloc. Allocates a heap block.
HeapFree. Frees a block previously allocated with HeapAlloc.
HeapReAlloc. Changes the size of an existing allocation (grows or shrinks an existing block).
HeapLock or HeapUnlock. Controls mutual exclusion to the heap operations.
HeapWalk. Enumerates the entries and regions in a heap.

+ Types of Heaps

Each process has at least one heap: the default process heap. The default heap is created at process startup and is never deleted during the process's lifetime. It defaults to 1 MB in size, but it can be made bigger by specifying a starting size in the image file by using the /HEAP linker flag. This size is just the initial reserve, however - it will expand automatically as needed.

GetProcessHeap, HeapCreate, HeapDestroy. An array with all heaps is maintained in each process, and a thread can query them with the Windows function GetProcessHeaps.

A heap can manage allocations either in large memory regions reserved from the memory manager via VirtualAlloc or from memory mapped file objects mapped in the process address space.

The latter approach is rarely used in practice, but it's suitable for scenarios where the content of the blocks needs to be shared between two processes or between a kernel-mode and a user-mode component.
The Win32 GUI subsystem driver (Win32k.sys) uses such a heap for sharing GDI and User objects with user mode.
If a heap is built on top of a memory mapped file region, certain constraints apply with respect to the component that can call heap functions.

First, the internal heap structures use pointers, and therefore do not allow remapping to different addresses in other processes.
Second, the synchronization across multiple processes or between a kernel component and a user process is not supported by the heap functions. Also, in the case of a shared heap between user mode and kernel mode, the user-mode mapping should be read-only to prevent user-mode code from corrupting the heap's internal structures, which would result in a system crash. The kernel-mode driver is also responsible for not putting any sensitive data in a shared heap to avoid leaking it to user mode.

+ Heap Manager Structure

     Application
|
   Windows heap APIs                  -----
   (HeapAlloc, HeapFree, LocalAlloc, ...) |
|                                 |-> heap manager
   Front-end heap layer (optional)        |
   Core heap layer                    -----
|
   Memory Manager

For user-mode heaps only, an optional front-end heap layer can exist on top of the existing core functionality. The only front-end supported on Windows is the Low Fragmentation Heap (LFH). Only one front-end layer can be used for one heap at one time.

+ Heap Synchronization

The heap manager supports concurrent access from multiple threads by default. However, if a process is single threaded or uses an external mechanism for synchronization, it can tell heap manager to avoid the overhead of synchronization by specifying HEAP_NO_SERIALIZE either at heap creation or on a per-allocation basis.

A process can also lock the entire heap and prevent other threads from performing heap operations for operations that would require consistent states across multiple heap calls.
If heap synchronization is enabled, there is one lock per heap that protects all internal heap structures.

+ The Low Fragmentation Heap (LFH)

Manage allocated blocks in predetermined different block-size ranges called buckets. When a process allocates memory from the heap, the LFH chooses the bucket that maps to the smallest block large enough to hold the required size.

To address scalability, the LFH expands the frequently accessed internal structures to a number of slots that is two times larger than the current number of processors on the machine. Even if the LFH is enabled as a front-end heap, the less frequent allocation sizes may still continue to use the core heap functions to allocate memory, while the most popular allocation classes will be performed from the LFH.

+ Heap Security Features

The metadata used by the heap for internal management is packed with a high degree of randomization to make it difficult for an attempted exploit to patch the internal structures to prevent crashes or conceal the attack attempt. These blocks are also subject to an integrity check mechanism on the header to detect simple corruptions such as buffer overruns. Finally, the heap also uses a small degree of randomization of the base address (or handle).

+ Heap Debugging Features

The heap manager leverages the 8 bytes used to store internal metadata as a consistency checkpoint, which makes potential heap usage errors more obvious, and also includes several features to help detect bugs by using the following heap functions:

Enable tail checking. The end of each block carries a signature that is checked when the block is released. If a buffer overrun destroyed the signature entirely or partially, the heap will report this error.
Enable free checking. A free block is filled with a pattern that is checked at various points when the heap manager needs to access the block (such as at removal from the free list to satisfy an allocate request). If the process continued to write to the block after freeing it, the heap manager will detect changes in the pattern and the error will be reported.
Parameter checking. This function consists of extensive checking of the parameters passed to the heap functions.
Heap validation. The entire heap is validated at each heap call.
Heap tagging and stack traces support. This function supports specifying tags for allocation and/or captures user-mode stack traces for the heap calls to help narrow the possible causes of a heap error.

The first three options are enabled by default if the loader detects that a process is started under the control of a debugger. (A debugger can override this behavior and turn off these features). Enabling heap debugging options affect all heaps in the process. Also, if any of the heap debugging options are enabled, the LFH will be disabled automatically and the core heap will be used with the required debugging options enabled.

+ Page Heap

Because the tail and free checking options might be discovering corruptions that occurred well before the problem was detected, an additional heap debugging capability, called pageheap, is provided that directs all or part of the heap calls to a different heap manager. When enabled, the heap manager places allocations at the end of pages and reserves the immediately following page. Since reserved pages are not accessible, if a buffer overrun occurs it will cause an access violation, making it easier to detect the offending code. Optionally, page heap allows placing the blocks at the beginning of the pages, with the preceding page reserved, to detect buffer underrun problems. The page heap can also protect freed pages against any access to detect references to heap blocks after they have been freed.

Note that using the page heap can result in running out of address space because of the significant overhead added for small allocations.

+ Fault Tolerant Heap (FTH)

The fault tolerant heap is implemented in two primary components: the detection component, or FTH server, and the mitigation component, or FTH client. The detection component is a DLL, Fthsvc.dll, that is loaded by the Windows Security Center service (Wscsvc.dll, which in turn runs in one of the shard service service processes under the local service account). The FTH client is an application shim. This mechanism has been used since Windows XP to allow applications that depend on particular behavior of older Windows system to run on later systems.

pedbg

Wednesday, November 19, 2014

Windows 7 Memory Management IV

No comments:

Post a Comment