|
|
Faster Method to Enumerate Heaps on Windows |
|
|
|
|
This article explains a fast method
of enumerating heaps on Windows. Traditional Windows heap
enumeration functions are slower and takes lot of time while
traversing large number of heap blocks. This article uncovers the
reason behind it and shows you a new efficient way of
enumerating process heaps based on reverse engineering of Windows heap
API functions. |
|
|
|
Some days back I was doing password strength related research on
Yahoo Messenger. It used to store the password on the heap and I wrote
an sample code using normal heap functions to locate and retrieve the
password. The password was basically stored on one of the heap block
which was near the end of 60,000th block. So I had to traverse all the
60,000 heap blocks to read the target block and the execution took more
than 10 minutes..! I tried running the program on multiple machines but
no change. Naturally I was getting irritated as I had to wait for so
long just to see the results.
To find a way around this problem, I tried looking on the internet for
answers but found nothing. Then I finally resort to finding the truth
myself and started reverse engineering the Windows heap
functions. Finally after few hours of work, I discovered the reason
behind the delay and wrote my own
tool which enumerated the complete heaps within few seconds. |
|
|
|
Typical heap traversing process on Windows involves following
steps
|
- Initialization
CreateToolhelp32Snapshot function used to take snapshot of the specified
process. This is the first function called before using any of the heap
enumeration functions.
- Listing heap nodes of process
After the initialization, Heap32ListFirst & Heap32ListNext functions are
used to retrieve all heap nodes within the process.
- Enumerating heap blocks within each node
Heap32First & Heap32Next functions lists all the heap blocks within each
node
|
|
This enumeration method does not present any problem when listing
small number of heap blocks. However when it comes to enumerating large
number of heap blocks (more than 50,000) in each node then it takes more
than 10 minutes. This delay cannot be justified in today's computing
world. |
|
|
|
The reason lies in the poor implementation of Heap32First and
Heap32Next functions. Code for both the functions are almost similar
except some arithmetic routines used to move to right block. Here is the
assembly code for Heap32Next function from kernel32.dll.Let us see what
is happening behind the screen... |
|
.text:7C863BF8 ; BOOL __stdcall
Heap32Next(LPHEAPENTRY32 lphe)
.text:7C863BF8 public _Heap32Next@4
.text:7C863BF8 _Heap32Next@4 proc near
.text:7C863BF8
.text:7C863BF8 lphe = dword ptr 8
.text:7C863BF8
.text:7C863BF8 mov edi, edi
.text:7C863BFA push ebp
.text:7C863BFB mov ebp, esp
.text:7C863BFD push ebx
.text:7C863BFE push esi
.text:7C863BFF mov esi, [ebp+lphe]
.text:7C863C02 test esi, esi
.text:7C863C04 push edi
.text:7C863C05 jz loc_7C863D0C
.text:7C863C0B cmp dword ptr [esi], 24h
.text:7C863C0E jnz loc_7C863D0C
.text:7C863C14 push 0
.text:7C863C16 push 0
.text:7C863C18 call ds:__imp__RtlCreateQueryDebugBuffer@8
.text:7C863C1E mov ebx, eax
.text:7C863C20 test ebx, ebx
.text:7C863C22 jnz short loc_7C863C2E
.text:7C863C24 mov eax, 0C0000001h
.text:7C863C29 jmp loc_7C863D18 |
|
|
|
First it invokes RtlCreateQueryDebugBuffer(0, FALSE) which allocates
buffer for storing heap data.On success, it returns pointer to allocated
debug buffer which is stored in ebx. If the function fails to allocate
buffer then it jumpts to the end of the function otherwise continues
with the code below. |
|
.text:7C863C2E
.text:7C863C2E loc_7C863C2E: ; CODE XREF: Heap32Next(x)+2Aj
.text:7C863C2E push ebx ; Debug Buffer pointer
.text:7C863C2F push 14h ; PDI_HEAPS | PDI_HEAP_BLOCKS
.text:7C863C31 push dword ptr [esi+1Ch] ; Process id
.text:7C863C34 call ds:__imp__RtlQueryProcessDebugInformation@12
.text:7C863C3A mov edi, eax
.text:7C863C3C test edi, edi
.text:7C863C3E jge short loc_7C863C4D ; continue next
.text:7C863C40 push ebx
.text:7C863C41 call ds:__imp__RtlDestroyQueryDebugBuffer@4
.text:7C863C47 push edi
.text:7C863C48 jmp loc_7C863D11 ; exit |
|
|
|
Next RtlQueryProcessDebugInformation is called to load all heap
blocks of the process. This function in fact loads entire heap nodes and
corresponding heap blocks of the process. |
|
.text:7C863C4D
.text:7C863C4D loc_7C863C4D: ; CODE XREF: Heap32Next(x)+46j
.text:7C863C4D mov ecx, [ebx+38h] ; ecx = heap information structure
.text:7C863C50 mov edx, [ecx] ;edx = count of heap nodes within process
.text:7C863C52 xor eax, eax
.text:7C863C54 test edx, edx
.text:7C863C56 jbe short loc_7C863C6A
.text:7C863C58 mov edi, [esi+20h]
.text:7C863C5B add ecx, 4 ;ecx now point to first heap node |
|
|
|
Debug Buffer contains the pointer to heap information
structure at offset 0x38. First parameter of this heap structure is node
count and after that it contains array of DEBUG_HEAP_INFORMATION
structure which represent each heap node. The heap information structure
is shown below. |
|
Struct HeapInfo
{
DWORD heapNodeCount;
DEBUG_HEAP_INFORMATION heapNode[heapNodeCount];
}; |
|
|
|
Once the node count is retrieved ecx is incremented by 4 to point to
the first heap node. |
|
.text:7C863C5E
.text:7C863C5E loc_7C863C5E: ; CODE XREF: Heap32Next(x)+70j
.text:7C863C5E cmp [ecx], edi
.text:7C863C60 jz short loc_7C863C6A
.text:7C863C62 inc eax
.text:7C863C63 add ecx, 40h ; Move to the next heap node
.text:7C863C66 cmp eax, edx ; Check if current count > total heap node
count
.text:7C863C68 jb short loc_7C863C5E
|
|
|
|
This is simple loop which checks if we are at the right heap node.
If not then ecx is incremented by size of DEBUG_HEAP_INFORMATION to move
to the next heap node. Here is its structure information. |
|
// It represents each heap node
typedef struct _DEBUG_HEAP_INFORMATION
{
ULONG Base; // 0x00
ULONG Flags; // 0x04
USHORT Granularity; // 0x08
USHORT Unknown; // 0x0A
ULONG Allocated; // 0x0C
ULONG Committed; // 0x10
ULONG TagCount; // 0x14
ULONG BlockCount; // 0x18
ULONG Reserved[7]; // 0x1C
PVOID Tags; // 0x38
PVOID Blocks; // 0x3C Heap block pointer for this
node.
} DEBUG_HEAP_INFORMATION, *PDEBUG_HEAP_INFORMATION;
|
|
.text:7C863C6A
.text:7C863C6A loc_7C863C6A: ; CODE XREF: Heap32Next(x)+5Ej
.text:7C863C6A ; Heap32Next(x)+68j
.text:7C863C6A cmp eax, edx
.text:7C863C6C jnb short loc_7C863CB8 ; did not find the right node, so
exit
.text:7C863C6E inc dword ptr [esi+18h] ; HEAPENTRY32->dwResvd++
.text:7C863C71 mov ecx, [esi+18h]
.text:7C863C74 shl eax, 6
.text:7C863C77 mov edi, eax
.text:7C863C79 mov eax, [ebx+38h]
.text:7C863C7C lea edx, [edi+eax]
.text:7C863C7F cmp ecx, [edx+1Ch] ; Check if current count > total block
count
.text:7C863C82 jnb short loc_7C863CB8 ; take the exit
.text:7C863C84 mov eax, ecx
.text:7C863C86 shl eax, 4
.text:7C863C89 add eax, [edx+40h] ; now eax points to current heap block
.text:7C863C8C test byte ptr [eax+4], 2
.text:7C863C90 jz short loc_7C863CC8 |
|
|
|
Once the right heap node is found, it uses the curHeapNode->Blocks
pointer to traverse the heap blocks to find the current heap block. Here
HEAPENTRY32-> dwResvd is internally used to keep track of block count. |
|
.text:7C863C92
.text:7C863C92 loc_7C863C92: ; CODE XREF: Heap32Next(x)+BEj
.text:7C863C92 mov edx, [ebx+38h]
.text:7C863C95 movzx edx, word ptr [edi+edx+0Ch] ; edx = curHeapNode->Granularity
.text:7C863C9A add edx, [eax+0Ch] ; edx += curBlock->address
.text:7C863C9D mov [esi+8], edx ; HEAPENTRY32->dwAddress = edx;
.text:7C863CA0 mov edx, [ebx+38h]
.text:7C863CA3 cmp ecx, [edi+edx+1Ch] ; check block count
.text:7C863CA7 jnb short loc_7C863CB8 ; Go to exit
.text:7C863CA9 inc ecx
.text:7C863CAA add eax, 10h ; move to next block
.text:7C863CAD mov [esi+18h], ecx ; HEAPENTRY32->dwResvd = ecx
.text:7C863CB0 test byte ptr [eax+4], 2 ; curBlock->flag & 2
.text:7C863CB4 jz short loc_7C863CCE
.text:7C863CB6 jmp short loc_7C863C92 ; continue until valid block found |
|
|
|
The above code looks for valid heap block based on certain
flags. Each heap block has following structure |
struct HeapBlock
{
DWORD size;
DWORD flag;
DWORD unknown;
DWORD address;
}; |
|
.text:7C863CB8
.text:7C863CB8 loc_7C863CB8: ; CODE XREF: Heap32Next(x)+74j
.text:7C863CB8 ; Heap32Next(x)+8Aj ...
.text:7C863CB8 push ebx
.text:7C863CB9 call ds:__imp__RtlDestroyQueryDebugBuffer@4
.text:7C863CBF push 12h ; dwErrCode
.text:7C863CC1 call _SetLastError@4 ; SetLastError(x)
.text:7C863CC6 jmp short loc_7C863D16 |
|
|
|
The above code is error handler which just destroys the buffer
created at the beginning, sets the last error code and returns to the
caller. |
|
.text:7C863CC8
.text:7C863CC8 loc_7C863CC8: ; CODE XREF: Heap32Next(x)+98j
.text:7C863CC8 mov ecx, [esi+0Ch] ; ecx = HEAPENTRY32->dwBlockSize
.text:7C863CCB add [esi+8], ecx ; HEAPENTRY32->dwAddress += ecx |
|
|
|
For the first block, if the value curBlock->flag & 2 != 2 then the
current block address is computed by adding previos block address and
block size. |
|
.text:7C863CCE
.text:7C863CCE loc_7C863CCE: ; CODE XREF: Heap32Next(x)+BCj
.text:7C863CCE mov ecx, [eax]
.text:7C863CD0 xor edi, edi
.text:7C863CD2 mov [esi+0Ch], ecx ; HEAPENTRY32->dwBlockSize = ecx
.text:7C863CD5 mov ax, [eax+4]
.text:7C863CD9 inc edi
.text:7C863CDA test al, 0F1h
.text:7C863CDC jnz short loc_7C863CFE
.text:7C863CDE test ah, 2
.text:7C863CE1 jnz short loc_7C863CFE
.text:7C863CE3 test al, 20h
.text:7C863CE5 jz short loc_7C863CF0
.text:7C863CE7 mov dword ptr [esi+10h], 4 ; HEAPENTRY32->dwFlags=4
.text:7C863CEE jmp short loc_7C863D01
.text:7C863CF0
.text:7C863CF0 loc_7C863CF0: ; CODE XREF: Heap32Next(x)+EDj
.text:7C863CF0 test ah, 1
.text:7C863CF3 jz short loc_7C863D01
.text:7C863CF5 mov dword ptr [esi+10h], 2 ; HEAPENTRY32->dwFlags=2
.text:7C863CFC jmp short loc_7C863D01
.text:7C863CFE
.text:7C863CFE loc_7C863CFE: ; CODE XREF: Heap32Next(x)+E4j
.text:7C863CFE ; Heap32Next(x)+E9j
.text:7C863CFE mov [esi+10h], edi ; HEAPENTRY32->dwFlags=4 |
|
|
|
Once the right heap block is found, its type is determined
which can be either LF32_FIXED, LF32_FREE or LF32_MOVEABLE. Using this
information, HEAPENTRY32->dwFlags value is updated. |
|
.text:7C863D01
.text:7C863D01 loc_7C863D01: ; CODE XREF: Heap32Next(x)+F6j
.text:7C863D01 ; Heap32Next(x)+FBj ...
.text:7C863D01 push ebx
.text:7C863D02 call ds:__imp__RtlDestroyQueryDebugBuffer@4
.text:7C863D08 mov eax, edi
.text:7C863D0A jmp short loc_7C863D18 |
|
|
|
At the end, the allocated buffer is destroyed by calling the
function RtlDestroyQueryDebugBuffer. |
|
.text:7C863D0C
.text:7C863D0C loc_7C863D0C: ; CODE XREF: Heap32Next(x)+Dj
.text:7C863D0C ; Heap32Next(x)+16j
.text:7C863D0C push 0C0000004h
.text:7C863D11
.text:7C863D11 loc_7C863D11: ; CODE XREF: Heap32Next(x)+50j
.text:7C863D11 call _BaseSetLastNTError@4 ; BaseSetLastNTError(x)
.text:7C863D16
.text:7C863D16 loc_7C863D16: ; CODE XREF: Heap32Next(x)+CEj
.text:7C863D16 xor eax, eax
.text:7C863D18
.text:7C863D18 loc_7C863D18: ; CODE XREF: Heap32Next(x)+31j
.text:7C863D18 ; Heap32Next(x)+112j
.text:7C863D18 pop edi
.text:7C863D19 pop esi
.text:7C863D1A pop ebx
.text:7C863D1B pop ebp
.text:7C863D1C retn 4
.text:7C863D1C Heap32Next@4 endp |
|
|
|
If there is an error then function BaseSetLastNTError is
called to set last error. Otherwise function returns normally.
To summarize, following action takes place during execution of
Heap32Next function
|
- First RtlCreateQueryDebugBuffer is invoked to allocate large
buffer to store entire process heap data.
- Next it calls RtlQueryProcessDebugInformation to load the all the heap
data into the buffer.
- After that it traverses the heap nodes to find the right heap node.
- Then it traces through the all the heap blocks in that heap node to
find the next valid heap block.
- At the end RtlDestroyQueryDebugBuffer is used to free the allocated
buffer.
|
|
This sequence is executed every time Heap32Next is called.
Assume that you are enumerating large number of heap blocks (say 50,000)
then for every heap block, buffer is created, entire process heap is
loaded, traversed and finally the buffer is destroyed. This is the
reason behind the long delay while enumerating thousands of heap blocks.
Clearly, its the poor design, one could have just done the allocation &
storing heaps during initialization and whenever heap32next is called,
it can just traverse the blocks to return the next valid heap block.
Using the later approach any large number of heaps can be enumerated in
few seconds. |
|
|
|
Based on above information found during my reverse engineering of
Heap32Next and several other heap functions, I wrote my own heap
enumeration function which is much more faster and efficient than using
Windows heap functions. Here are the complete details of the
implementation
To keep the code simple & readable, I have removed all error checking
and other unnecessary instructions. |
|
|
//Debug Buffer used by RtlCreateQueryDebugBuffer
typedef struct _DEBUG_BUFFER
{
HANDLE SectionHandle;
PVOID SectionBase;
PVOID RemoteSectionBase;
ULONG SectionBaseDelta;
HANDLE EventPairHandle;
ULONG Unknown[2];
HANDLE RemoteThreadHandle;
ULONG InfoClassMask;
ULONG SizeOfInfo;
ULONG AllocatedSize;
ULONG SectionSize;
PVOID ModuleInformation;
PVOID BackTraceInformation;
PVOID HeapInformation;
PVOID LockInformation;
PVOID Reserved[8];
} DEBUG_BUFFER, *PDEBUG_BUFFER;
//This represents each heap node
typedef struct _DEBUG_HEAP_INFORMATION
{
ULONG Base; // 0x00
ULONG Flags; // 0x04
USHORT Granularity; // 0x08
USHORT Unknown; // 0x0A
ULONG Allocated; // 0x0C
ULONG Committed; // 0x10
ULONG TagCount; // 0x14
ULONG BlockCount; // 0x18
ULONG Reserved[7]; // 0x1C
PVOID Tags; // 0x38
PVOID Blocks; // 0x3C
} DEBUG_HEAP_INFORMATION, *PDEBUG_HEAP_INFORMATION;
//Internal structure used to store heap block information.
struct HeapBlock
{
PVOID dwAddress;
DWORD dwSize;
DWORD dwFlags;
ULONG reserved;
}; |
|
|
|
Here is the sample code demonstrating the traversing through all the
heap nodes of a process. It starts with creating debug buffer and then
loading entire process heaps using RtlQueryProcessDebugInformation
function.Then it enumerates through all the heap nodes and displays the
relevant information. At the end, internal buffer is destroyed. |
|
void DisplayHeapNodes(DWORD pid)
{
// Create debug buffer
PDEBUG_BUFFER db = RtlCreateQueryDebugBuffer(0, FALSE);
// Get process heap data
RtlQueryProcessDebugInformation( pid, PDI_HEAPS | PDI_HEAP_BLOCKS, db);
ULONG heapNodeCount = db->HeapInformation ? *PULONG(db->HeapInformation):0;
PDEBUG_HEAP_INFORMATION heapInfo = PDEBUG_HEAP_INFORMATION(PULONG(db->
HeapInformation) + 1);
// Go through each of the heap nodes and dispaly the information
for (unsigned int i = 0; i < heapNodeCount; i++)
{
printf("\n Base Address = 0x%.8x", heapInfo[i].Base);
printf("\n Block count = %d", heapInfo[i].BlockCount);
printf("\n Committed Size= 0x%.8x", heapInfo[i].Committed);
printf("\n Allocated Size = 0x%.8x", heapInfo[i].Allocated);
printf("\n Flags = 0x%.8x", heapInfo[i].Flags);
}
// Clean up the buffer
RtlDestroyQueryDebugBuffer( db );
}
|
|
|
|
The code below illustrates the enumerating heap blocks within the
specified node for the process. |
|
void DisplayHeapBlocks(DWORD pid, DWORD
nodeAddress)
{
HeapBlock hb ={0,0,0,0};
// Create debug buffer
PDEBUG_BUFFER db = RtlCreateQueryDebugBuffer(0, FALSE);
// Get process heap data
LONG ret = RtlQueryProcessDebugInformation( pid, PDI_HEAPS |
PDI_HEAP_BLOCKS, db);
ULONG heapNodeCount = db->HeapInformation ? *PULONG(db->HeapInformation)
: 0;
PDEBUG_HEAP_INFORMATION heapInfo = PDEBUG_HEAP_INFORMATION(PULONG(db->HeapInformation)
+ 1);
// Go through each of the heap nodes
for (unsigned int i = 0; i < heapNodeCount; i++)
{
if(heapInfo[i].Base == nodeAddress)
{
// Now enumerate all blocks within this heap
node...
int c = 0;
memset(&hb,0,sizeof(hb));
if( GetFirstHeapBlock(&heapInfo[i] , &hb) )
{
do
{
printf("\n Block count =
%d", c+1);
printf("\n Block address
= 0x%.8x", hb.dwAddress);
printf("\n Block Size =
0x%.8x", hb.dwSize);
if( hb.dwFlags ==
LF32_FREE )
printf("\n
Status = Free");
else
printf("\n
Status = Allocated");
c++;
}
while( GetNextHeapBlock( &heapInfo[i],
&hb) );
}
break;
}
}
// Clean up the buffer
RtlDestroyQueryDebugBuffer( db );
} |
|
|
|
This function initialize heaps and then traverses through all the
heap nodes to find the specified heap node. Next it enumerates all the
heap blocks within this node using GetFirstHeapBlock & GetNextHeapBlock
functions and displays the information. |
|
BOOL GetFirstHeapBlock( PDEBUG_HEAP_INFORMATION
curHeapNode, HeapBlock *hb)
{
int *block;
hb->reserved = 0;
hb->dwAddress = 0;
hb->dwFlags = 0;
block = (int*) curHeapNode->Blocks;
while( ( *(block+1) & 2 ) == 2 )
{
hb->reserved++;
hb->dwAddress = (void *) ( *(block+3) + curHeapNode->Granularity
);
block = block+4;
hb->dwSize = *block;
}
// Update the flags...
USHORT flags = *(block+1);
if( ( flags & 0xF1 ) != 0 || ( flags & 0x0200 ) != 0 )
hb->dwFlags = 1;
else if( (flags & 0x20) != 0 )
hb->dwFlags = 4;
else if( (flags & 0x0100) != 0 )
hb->dwFlags = 2;
return TRUE;
}
BOOL GetNextHeapBlock( PDEBUG_HEAP_INFORMATION curHeapNode, HeapBlock *hb)
{
int *block;
hb->reserved++;
block = (int*) curHeapNode->Blocks;
// Make it point to next block address entry
block = block + hb->reserved * 4;
if( ( *(block+1) & 2 ) == 2 )
{
do
{
// new address = curBlockAddress + Granularity ;
hb->dwAddress = (void *) ( *(block+3) +
curHeapNode->Granularity );
// If all the blocks have been enumerated....exit
if( hb->reserved > curHeapNode->BlockCount)
return FALSE;
hb->reserved++;
blockAddress = block + 4; //move to next block
hb->dwSize = *block;
}
while( ( *(block+1) & 2 ) == 2 );
}
else
{
// New Address = prev Address + prev block size ;
hb->dwAddress = (void*) ( (int)hb->dwAddress + hb->dwSize );
hb->dwSize = *block;
}
// Update the flags...
USHORT flags = *( block+1);
if( ( flags & 0xF1 ) != 0 || ( flags & 0x0200 ) != 0 )
hb->dwFlags = 1;
else if( (flags & 0x20) != 0 )
hb->dwFlags = 4;
else if( (flags & 0x0100) != 0 )
hb->dwFlags = 2;
return TRUE;
} |
|
|
|
Much of the functionality for the above functions is derived from
the assembly code of Heap32First and Heap32Next functions.
GetFirstHeapBlock returns the first valid heap block in the current heap
node. GetNextHeapBlock does the similar task to find next valid heap
block. Information about the previous heap block is stored in the
HeapBlock structure and it should not be modified by the caller.
If you want to implement heap enumeration functionality in your project,
then you can use the following template. |
|
1. InitHeap()
This will call RtlCreateQueryDebugBuffer &
RtlQueryProcessDebugInformation to initialize the process heap data.
2. GetHeapNode(int index)
This function traverses the list of nodes and returns the node at
specified index.
3. GetFirstHeapBlock(PDEBUG_HEAP_INFORMATION curHeapNode, HeapBlock *hb)
Return the first valid heap block within the specified node.
4. GetNextHeapBlock(PDEBUG_HEAP_INFORMATION curHeapNode, HeapBlock *hb)
Return the next valid heap block within the specified node. HeapBlock
will contain the information about the previous heap block.
5. DestroyHeap()
This function invokes RtlDestroyQueryDebugBuffer to free the internal
buffer. |
|
|
|
|
This article gives insight on implementation of Windows heap API
functions and explains reason for their slower execution. Then it
presents a new way to implement faster and efficient heap enumeration
functionality. |
|
|
|
Process Heap Viewer: Faster heap
enumeration tool based on this article. |
NetShareMonitor: Monitor your file
shares from intruders. |
Exposing the covert way to find the reference count of DLL. |
Uncovering hidden process on Windows system. |
Recover Windows password in
seconds using Rainbow crack. |
|
|
|
|
|
|
|
|
|