pedbg: Windows 7 Memory Management II

Services provided by the memory manager

allocate and free virtual memory
share memory between processes
map files into memory
flush virtual pages to disk
retrieve information about a range of virtual pages
change the protection of virtual pages
lock the virtual pages into memory
Window API:

heap functions (Heapxxx, Localxxx, and Globalxxx)
virtual memory functions (Virtualxxx)
memory mapped file functions (CreateFileMapping, CreateFileMappingNuma, MapViewOfFile, MapViewOfFileEx, and MapViewOfFileExNuma)

+ Large and Small Pages

virtual space is divided into pages, because hardware MMU translates in pages

    Arch SmallPageSize LargePageSize     Small Pages per Large Page
   x86        4KB      4MB (2MB in PAE)      1,024 (512 in PAE)
     x64    4KB    2MB    512

Large page

speed up address translation for references to other data within the large page
why? because the first reference to any byte within a large page will cause the hardware's translation look-aside buffer (TLB) to have in its cache the information necessary to translate references to any other byte within the large page.
If small pages are used, more TLB entries are needed for the same range of virtual addresses, thus increasing recycling of entries as new virtual addresses require translation.
This means having to go back to the page table when references are made to virtual addresses outside the scope of a small page whose translation has been cached.

Large page usage

core operation system images (Ntoskrnl.exe and Hal.dll)
core operation system data (initial part of non-paged pool and the data structures that describe the state of each physical memory page)
allow applications to map their images, private memory, and page-file-backed sections with large pages

Large page details

must occupy a significant number of physically contiguous small pages
must begin on a large page boundary (e.g., physical page 0 ~ 511 can be used as a large page on x64, and could 512 ~ 1,023, but not 10 ~ 521)
fragmentation happens when using large pages as system runs. not a problem for small pages, but large page allocations may fail

Large page side effects

can only specify read/write access to large pages. So code may become writable.
memory is non-pageable, because the page file system does not support large pages
because non-pageable, it is not considered part of the process working set
large page allocations are not subject to job-wide limits on virtual memory usage
If small pages are used to map the OS's kernel-mode code, the read-only portions of Ntoskrnl.exe and Hal.dll can be mapped as read-only pages. (Protection.)

+ Reserving and Committing Pages

Pages in a process virtual space are free, reserved, committed (private), or shareable. Committed and shareable pages are pages that, when accessed, ultimately translate to valid pages in physical memory.
Private pages are allocated by VirtualAlloc, VirtualAllocEx, and VirtualAllocExNuma. The intermediate "reserved" state allows the thread to set aside a range of contiguous virtual addresses for possible future use, and then commit portions of the reserved space as needed as the application runs.
Access to free or reserved memory results in an exception because the page isn't mapped to any storage that can resolve the reference.
Committed pages are created at the first time of first access as zero-initialized pages (demand zero).

committed pages may be written to the paging file when necessary.

Shared pages are usually mapped to a view of a section.

is part or all of a file, but may instead represent a portion of page file space.
sections are exposed in the Windows API as file mapping objects.
when first accessed by any process, it will be read from the associated mapped file (unless the section is associated with the paging file, in which case it is created as a 0-initialized page.

Pages are written back to disk through "modified page writing"

This occurs as pages are moved from a process's working set to a systemwide list called the modified page list.
This list will be written to disk.

De-commit and/or release address space with the VirtualFree or VirtualFreeEx

de-committed memory is still reserved
released memory has been freed

Benefits of this two-step process of reserving and the committing:

defers adding to the system "commit charge" until needed, but keeps the convenience of virtual contiguity.
reserving is an inexpensive operation because it consumes very little actual memory (only need to update the small internal data structures that represent the state of the process address space).

One extremely common use for reserving a large space and committing portions of it as needed:

user-mode stack
when a thread is created, a stack is created by reserving a contiguous portion of the process address space (1MB by default).
The initial page in the stack is committed and the next page is marked as a guard page.
The guard page is not committed and traps references beyond the end of the committed portion of the stack and expands it.

+ Commit Limit

definition: system-wide limit on the amount of committed virtual memory that can exist at any one time.
corresponds to current total size of all paging files + RAM that is usable by the OS.

+ Locking Memory

Windows applications can call VirtualLock to lock pages in their process working set.

remain in memory until explicitly unlocked or the process that locked them terminates
The number of pages a process can lock can't exceed its minimum working set size - 8 pages
If a process needs to lock more pages, it can increase its working set minimum by SetProcessWorkingSetSizeEx.

Device drivers can call the kernel-mode functions MmProbeAndLockPages, MmLockPagableCodeSection, MmLockPagableDataSection, or MmLockPagableSectionByHandle.

remain in memory until explicitly unlocked

+ Shared Memory and Mapped Files

shared memory is present in more than one process virtual address space (DLL)

     Process 1 virtual memory
      +----+
      +----+ -----------------------    physical memory
      +----+     |      +----+
                                    |      +----+
     Process 2 virtual memory       |====> +----+ DLL code
      +----+                        |      +----+
      +----+ -----------------------
      +----+

code pages in executable images (.exe and .dll) are mapped as execute-only and writable pages are mapped as copy-on-write
shared memory are implemented as section objects, which are exposed as file mapping objects in the Windows API.
section object

can be opened by one process or by many (don't necessarily equate to shared memory)
can be connected to an open file on disk (called a mapped file) or to committed memory (to provide shared memory, page-file-backed section)
To create a section object, call CreateFileMapping or CreateFileMappingNuma.
a section object can refer to files that are much larger than can fit in the address space of a process
To access a large section object, a process can map only the portion of the section object that it requires (called a view of the section) by calling MapViewOfFile, MapViewOfFileEx, or MapViewOfFileNuma and then specifying the range to map

Windows applications can use mapped files to conveniently perform I/O to files by simply making them appear in their address space.

+ Protecting Memory

System-wide data structures and memory pools used by kernel-mode system components can be accessed only while in kernel mode. User-mode threads cannot access these pages.
Each process has a separate, private address space, protected from being accessed by any thread belonging to another process. Shared memory is not an exception because each process accesses the shared regions using addresses that are part of its own virtual address space. The only exception is if another process has virtual memory read or write access to the process object (or holds SeDebugPrivilege) and thus can use ReadProcessMemory or WriteProcessMemory.
All processors supported by Windows provide some form of hardware-controlled memory protection (read/write, read only, etc).

PAGE_NOACCESS, PAGE_READONLY, PAGE_READWRITE, PAGE_EXECUTE, PAGE_EXECUTE_READ, PAGE_EXECUTE_READWRITE, PAGE_WRITECOPY, PAGE_EXECUTE_WRITECOPY, PAGE_GUARD, PAGE_NOCACHE, PAGE_WRITECOMBINE.

Shared memory section objects have standard Windows access control lists (ACL) that are checked when processes attempt to open them, thus limiting access of shared memory to those processes with the proper rights.

+ No Execute Page Protection

Aka data execution prevention, or DEP.
It causes an attempt to transfer control to an instruction in a page marked as "no execute" to generate an access fault.

If an attempt is made in kernel mode to execute code in a page marked as no execute, the system will crash with ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY bugcheck code.
User-mode: STATUS_ACCESS_VIOLATION (0xc0000005)
If a process allocates memory that needs to be executable, it must explicitly mark such pages by specifying the PAGE_EXECUTE, PAGE_EXECUTE_READ, PAGE_EXECUTE_READWRITE, or PAGE_EXECUTE_WRITECOPY flags on the page granularity memory allocation functions.

architecture dependent

On 32-bit x86 systems that support DEP, bit 63 in the page table entry (PTE) is used to mark a page as non-executable. Therefore, the DEP feature is available only when the processor is running in Physical Address Extension (PAE) mode, without which page table entries are only 32 bits wide.

The OS loader automatically loads the PAE kernel (%SystemRoot%\System32\Ntkrnlpa.exe) on 32-bit systems that support hardware DEP. To force the non-PAE kernel to load on a system that supports hardware DEP, the BCD option nx must be set to AlwaysOff, and the pae option must be set to ForceDisable.
execution protection is applied only to thread stacks and user-mode pages, not to paged pool and session pool.

On 64-bit versions of Windows, execution protection is always applied to all 64-bit processes and device drivers and can be disabled only by setting the nx BCD option to AlwaysOff.

execution protection is applied to thread stacks (both user and kernel mode), user-mode pages not specifically marked as executable, kernel paged pool, and kernel session pool.

Even if you force DEP to be enabled, there are still other methods through which applications can disable DEP for their own images.

E.g., the image loader will verify the signature of the executable against known copy-protection mechanisms (such as SafeDisc and SecurROM) and disable execution protection to provide compatibility with older copy-protected software such as computer games.

+ Software Data Execution Prevention

For older processors that do not support hardware no execute protection, Windows supports limited software data execution prevention (DEP).

+ Copy-on-Write

When a process maps a copy-on-write view of a section object that contains read/write pages, instead of making a process private copy at the time the view is mapped, the memory manager defers making a copy of the pages until the page is written to.
If a thread in either process writes to a page, a memory management fault is generated. The memory manager sees that the write is to a copy-on-write page, so instead of reporting the fault as an access violation, it allocates a new read/write page in physical memory, copies the contents of the original page to the new page, updates the corresponding page-mapping information in this process to point to the new location, and dismisses the exception, thus causing the instruction that generated the fault to be re-executed.
The newly copied page is now private to the process that did the writing and isn't visible to the other process still sharing the copy-on-write page.
Each new process that writes to that same shared page will also get its own private copy.
Application: breakpoint support in debuggers

code pages start out as execute-only
The debugger first changes the protection on the page to PAGE_EXECUTE_READWRITE and then changes the instruction stream.
Because the code page is part of a mapped section, the memory manager creates a private copy for the process with the breakpoint set, while other processes continue using the unmodified code page.

        Process 1                             Process 2
  +--------------+   +--------------+
|    |----> page 1 <-------|              |
        +--------------+ +--------------+
        |original data |----> page 2     ----|Modified data |
  +--------------+                 |   +--------------+
|    |----> page 3 <-------|              |
  +--------------+         | +--------------+
                        copy of page 2 <--

+ Address Windowing Extensions

Each 32-bit user process has by default only 2-GB virtual address space.
AWE allows a process to allocate more physical memory than can be represented in its virtual address space. It then can access the physical memory by mapping a portion of its virtual address space into selected portions of he physical memory at various times.

Allocating the physical memory to be used (AllocateUserPhysicalPages or AllocateUserPhysicalPagesNuma).
Creating one or more regions of virtual address space to act as winodws to map views of the physical memory (Win32 VirtualAlloc, VirtualAllocEx, or VirtualAllocExNuma with the MEM_PHYSICAL flag).
The preceding steps are initialization steps. To actually use the memory, the application uses MapUserPhysicalPages or MapUserPhysicalPagesScatter to map a portion of the physical region allocated in step 1 into one of the virtual regions, or windows, allocatd in step 2.

   4 GB +------------+    64 GB +------------+
            |            |              |            |
            +------------+    +------------+
|    |    |    | <-------
+------------+    +------------+    |
            |    |    |    |        |
       2 GB +------------+    +------------+        |
        |            |              |            | <---- |
        +------------+--------------+------------+     | |
        | AWE Window |              |            | <-- AWE Mem
          +------------+--------------+------------+     | |
          |            |              |            | <---- |
      0 +------------+    +------------+        |
   server application         |            |        |
   address space              +------------+        |
                                        |            | <-------
                                      0 +------------+
                                         physical memory
                  Using AWE to map physical memory

AWE functions exist on all editions of Windows and are usable reardless of how much physical memory a system has.
Because AWE memory is never paged out, the data in AWE memory can never thave a copy in the paging file that someone could examine by rebooting into an alternate operationg system.
Restrictions on memory allocated and mapped by AWE

Pages can't be shared between processes.
The same physical page can't be mapped to more than one virtual address in the same process.
Page protection is limited to read/write, read-only, and no access.

pedbg

Wednesday, October 15, 2014

Windows 7 Memory Management II

No comments:

Post a Comment