Nerdworks logo "The nerd shall inherit the earth."

Nerdworks Blogorama

Nerdspeak

On appreciating music
Irrelevant Stuff
9/30/2006 1:17:37 PM  

There is a certain superior kind of person who is frequently fond of making scandalous remarks in order to appear unconventional and witty with the equally frequent consequence of appearing merely impertinently conspicuous. He1 insists upon being decidedly disagreeable - solely counting upon the possibility that the contrary opinion will often prove to be the correct one. While this is found to be so in the general case (and all generalizations are of course, false, not excluding this particular one) when given that most of those who are party to an argument participate in it merely on account of a habitual inclination towards exercising their vocal chords rather than a genuine desire for expressing original opinions, one would err to rely entirely on it if he wishes to maintain credibility.

When the topic under discussion is music for instance, this superior person will be found waxing eloquent in resolute criticism of artistes the public in general considers as being masters of their art. He will further extol upon the virtues of other, sometimes lesser known equivalents, who in his opinion are better deserving of the public adulation. Now, it goes without saying that we live in an unfair world where all too frequently factors that have little to do with music or with ones proficiency in it are the factors that in the end contribute towards an artiste's success (success by popular definition that is). It is certainly tragic that deserving practitioners are often not paid their due. That is however no justification for the vilification of artistes who do become successful on account of genuine musical ability.

It is the bane of mankind (and sometimes a blessing too!) to occasionally err. An artiste in this respect is no different. But to seize upon such instances paying little attention to all of their other innumerable successes is, to say the least, being shockingly myopic apart from also being a dreadful display of lack of compassion.

Appreciation of music, in my opinion, demands a certain kind of open mindedness that is willing to acknowledge excellence where it is found. If the patron is unable to influence mediocrity in a positive manner then he would do well to ignore it. Now, this is not to be mistaken as an appeal for toleration of incompetence (as might happen if you were made to put up with my singing for instance); merely that a greater display of compassion is in order when considering brilliant but fallible priests of that most divine of art forms - music!

[1]

No gender bias is intended with the use of the pronoun "he" and is used here merely as a matter of convenience. The reader is free to read all instances of it as "she" if she so chooses to.

Link Comment (5)
 
System API call hooking
Technobabble
9/24/2006 1:38:49 PM  

I have for sometime been meaning to investigate into how exactly one set about hooking system API calls, i.e., be able to intercept/instrument calls to Win32 APIs made by any given process on the system. Surprisingly, there are quite a few good, informed articles on the subject. Here're some links to a few of them:

API hooking revealed
A good article that covers all the options available to achieve this.
Process-wide API spying - an ultimate hack
Describes Import Address Table (IAT) patching in fair detail.
Three Ways to Inject Your Code into Another Process
Another API spying DLL injection article.
Windows NT System-Call Hooking
A great article from Mark Russinovich and Bryce Cogswell of Sysinternals fame detailing interception of system calls by patching system call dispatch tables from the kernel mode.
Tracing NT Kernel-Mode Calls
Talks about intercepting kernel mode APIs such as IoAllocateIrp and IoCallDriver.

My primary interest was in being able to intercept calls to APIs like CopyFile, MoveFile and DeleteFile. Having recently developed an interest in kernel mode programming I initially figured that I'll write this as some sort of kernel mode filter driver and roll a super-cool interception system. But I came to realise in the end that this was not going to be possible without writing some fairly intricate and basically shaky code. As the articles I've given links to above indicate, it is quite possible to do this with a lot less fuss from user mode itself.

To avoid duplicating information already available in these articles I'll just briefly describe the approach I took:

  • I created a DLL that would hook routines that I am interested in from DllMain.
  • I would then inject this DLL into the process that I am interested in using the CreateRemoteThread technique.
  • The injected DLL would call back to the EXE whenever the relevant APIs were called by sending WM_COPY_DATA messages.

That's all! One thing that I did not do however is implementing the fancy IAT patching code myself. I used the Microsoft Research Detours library for doing this which does it in a very clean structured fashion. And finally, the whole thing will work only on systems running Windows 2000 and later (who uses Windows 95, 98 and ME anyway!).

Here's a screen shot of what the UI for this program that I wrote looks like:

IOSpy screenshot

And here're the binaries and the source code should you feel like taking a look. Please note that I haven't included the Detours library here. You'll have to download it from the link given here yourself (it's only 519 KB in size) and set your build environment up so that the compiler and the linker can find the "detours.h", "detours.lib" and the "detoured.lib" files.

Link Comment
 
Ultimate list of developer/power user tools
Technobabble
9/11/2006 6:27:24 PM  

Find Scott Hanselman's 2006 Ultimate Developer and Power Users Tool List for Windows here:

http://www.hanselman.com/tools

Who is Scott Hanselman eh? Err.. I don't really know but the tools that he lists I know (well, some of them at least)! From his top 10 life/work changing utilities I am already using 5 (Notepad++, Lutz's Reflector for .NET, Google Desktop, ZoomIt and various other Sysinternals tools) and I am going to give his other suggestions a try as they sound like they're going to be equally super cool!

Put simply these are tools that you cannot afford to leave home without!

Link Comment
 
Self deleting executables
Technobabble
9/10/2006 5:52:57 PM  

I read an interesting article the other day that spoke about the various mechanisms a Win32 application can employ for deleting itself from the disk once execution completes. The basic issue is of course that while the module is being executed the operating system has the file locked. So something like this will just not work:

    TCHAR szModule[MAX_PATH];
    GetModuleFileName( NULL, szModule, MAX_PATH );
    DeleteFile( szModule );

Of the various options available, the author of the said article had suggested the following approach as being the definitive one as it has the added benefit of functioning correctly on all versions of Microsoft Windows (starting with '95).

Now would be a good time to hop over to the article and see what it's about (and while you're there make sure you look at some of the other articles - pretty neat). Here's the link:

http://www.catch22.net/tuts/selfdel.asp

And here's the approach in brief:

  • When it's time to delete ourselves we first spawn an external process that is guaranteed to exist on all Windows computers (explorer.exe for example) in the suspended state. We do this by calling CreateProcess passing CREATE_SUSPENDED for the dwCreationFlags parameter. Note that when a process is launched this way there's really no telling at what point the primary thread of the process will get suspended. But it does appear to get suspended long before the entry point gets invoked and in fact it occurs even before the Win32 environment for the process has been fully initialized.

  • After this we get the CONTEXT data (basically, the CPU register state) for the suspended primary thread (in the remote process) via GetThreadContext.

  • We then manipulate the stack pointer (ESP) to allocate some space on the remote stack for storing some of our data (like the path to the executable to be deleted). After this we plonk the binary code for a local routine that we've written for deleting files over to the remote process (along with the data it needs) by calling WriteProcessMemory.

  • Next we mess around with the instruction pointer (EIP) so that it points to the binary code we've copied to the remote process and update the suspended thread's context (via SetThreadContext).

  • And finally, we resume execution of the remote process (via ResumeThread). Since the EIP in the remote thread is now pointing to our code, it executes it; which of course, happily deletes the original executable. And that's it!

While this approach does get the job done, the fact that our deletion code executes in the remote process even before Windows has had a chance to initialize it fully places some restrictions on the kind of APIs that we can invoke. It so turns out that APIs like DeleteFile and ExitProcess do work while the process is in this half-baked state. So I figured I'll modify the approach somewhat so that it allows us to call any API we want from our injected code. Here's what I did:

  • As before we launch the external process in a suspended state. However, instead of plonking our code at the location that ESP happens to be pointing at when it got suspended, we put it over the executable's entry-point routine, i.e., we replace the remote process's entry point with our own injected code. And when the entry-point code executes we can be pretty sure that the Win32 environment is fully initialized and primed for use!

  • Figuring out where the entry point of a module lives requires us to parse PE file format structures. In your own program for example, the following code would give you a pointer to the entry point routine in the process's executable image:

#pragma pack( push, 1 )

struct coff_header
{
    unsigned short machine;
    unsigned short sections;
    unsigned int timestamp;
    unsigned int symboltable;
    unsigned int symbols;
    unsigned short size_of_opt_header;
    unsigned short characteristics;
};

struct optional_header
{
    unsigned short magic;
    char linker_version_major;
    char linker_version_minor;
    unsigned int code_size;
    unsigned int idata_size;
    unsigned int udata_size;
    unsigned int entry_point;
    unsigned int code_base;
};

#pragma pack( pop )

//
// get the module address
//
char *module = (char *)GetModuleHandle( NULL );

//
// get the sig
//
int *offset = (int*)( module + 0x3c );
char *sig = module + *offset;

//
// get the coff header
//
coff_header *coff = (coff_header *)( sig + 4 );

//
// get the optional header
//
optional_header *opt = (optional_header *)( (char *)coff + sizeof( coff_header ) );

//
// get the entry point
//
char *entry_point = (char *)module + opt->entry_point;
  • The entry point that you define by the way - main or WinMain - isn't the actual entry point routine. The compiler inserts its own entry point which in turn calls our function. This entry point typically does stuff like CRT initialization and cleanup. In an ANSI console app for instance the actual entry point routine is something called mainCRTStartup.

  • It appears logical that we should be able to find the entry point routine in remote processes also in a similar fashion using ReadProcessMemory. While that is so, finding the equivalent of the module variable in the code given above for remote processes turned out to be trickier than anticipated. The problem is that there is no convenient GetModuleHandle routine that'll work for remote processes.

  • As it turns out GetModuleHandle returns a virtual address that is valid only within the relevant process's address space. ReadProcessMemory however requires real addresses to work with. So the question is, how do we get to know the remote process's base address in memory? The solution as it turned out requires us to dig deep into the OS's internals! The credit for this solution goes to Ashkbiz Danehkar whose article called Injective Code inside Import Table on Code Project outlines a method for finding this.

  • In brief, the operating system maintains a user-mode data structure for every thread in the system called the Thread Environment Block (TEB) which describes pretty much everything you'd want to know about the thread including a pointer to another data structure called the Process Environment Block (PEB) which, as may be apparent describes processes including, happily for us, a pointer to the image's base address in memory! These structures are not however documented (by Microsoft that is ;). But some very very clever folks at http://undocumented.ntinternals.net/ managed to figure out the layout for these structures all by themselves!

  • So all we need to do is:

    • Figure out where the TEB for the primary thread lives in the remote process. This information is stored in the thread's FS register which is accessible via the GetThreadSelectorEntry API.
    • Read the PEB using the pointer to it in the thread's TEB via ReadProcessMemory.
    • Use the pointer to the image's base address in the PEB and parse the PE structures till we are left with a reference to the remote process's entry point routine.
    • Phew!

    Here's the code that achieves this:

//
// Gets the address of the entry point routine given a
// handle to a process and its primary thread.
//
DWORD GetProcessEntryPointAddress( HANDLE hProcess, HANDLE hThread )
{
    CONTEXT             context;
    LDT_ENTRY           entry;
    TEB                 teb;
    PEB                 peb;
    DWORD               read;
    DWORD               dwFSBase;
    DWORD               dwImageBase, dwOffset;
    DWORD               dwOptHeaderOffset;
    optional_header     opt;
    
    //
    // get the current thread context
    //
    context.ContextFlags = CONTEXT_FULL | CONTEXT_DEBUG_REGISTERS;
    GetThreadContext( hThread, &context );
    
    //
    // use the segment register value to get a pointer to
    // the TEB
    //
    GetThreadSelectorEntry( hThread, context.SegFs, &entry );
    dwFSBase = ( entry.HighWord.Bits.BaseHi << 24 ) |
                     ( entry.HighWord.Bits.BaseMid << 16 ) |
                     ( entry.BaseLow );
    
    //
    // read the teb
    //
    ReadProcessMemory( hProcess, (LPCVOID)dwFSBase,
                       &teb, sizeof( TEB ), &read );
    
    //
    // read the peb from the location pointed at by the teb
    //
    ReadProcessMemory( hProcess, (LPCVOID)teb.Peb,
                       &peb, sizeof( PEB ), &read );
    
    //
    // figure out where the entry point is located;
    //
    dwImageBase = (DWORD)peb.ImageBaseAddress;
    ReadProcessMemory( hProcess, (LPCVOID)( dwImageBase + 0x3c ),
                       &dwOffset, sizeof( DWORD ), &read );
    
    dwOptHeaderOffset = ( dwImageBase + dwOffset + 4 + sizeof( coff_header ) );
    ReadProcessMemory( hProcess, (LPCVOID)dwOptHeaderOffset,
                       &opt, sizeof( optional_header ), &read );
    
    return ( dwImageBase + opt.entry_point );
}
  • If you're wondering what the weird code initializing dwFSBase means all I can do is direct you to the documentation for the LDT_ENTRY data structure in MSDN. Structures of this kind are partly the reason why system programmers tend to go bald early in life.

  • Now that we know where the entry point lives in the remote process it should be really straightforward right? Wrong! There still is that itsy bitsy problem of figuring out how we are to pass data to the remote process!

    The routine that deletes our executable looks like this:

#pragma pack(push, 1)

//
//  Structure to inject into remote process. Contains 
//  function pointers and code to execute.
//
typedef struct _SELFDEL
{
    HANDLE  hParent;                // parent process handle
    FARPROC fnWaitForSingleObject;
    FARPROC fnCloseHandle;
    FARPROC fnDeleteFile;
    FARPROC fnSleep;
    FARPROC fnExitProcess;
    FARPROC fnRemoveDirectory;
    FARPROC fnGetLastError;
    FARPROC fnLoadLibrary;
    FARPROC fnGetProcAddress;
    BOOL    fRemDir;
    TCHAR   szFileName[MAX_PATH];   // file to delete
} SELFDEL;

#pragma pack(pop)

//
//  Routine to execute in remote process. 
//
static void remote_thread(SELFDEL *remote)
{
    // wait for parent process to terminate
    remote->fnWaitForSingleObject(remote->hParent, INFINITE);
    remote->fnCloseHandle(remote->hParent);

    // try to delete the executable file 
    while(!remote->fnDeleteFile(remote->szFileName))
    {
        // failed - try again in one second's time
        remote->fnSleep(1000);
    }

    // finished! exit so that we don't execute garbage code
    remote->fnExitProcess(0);
}
  • As you might have noticed the function remote_thread makes all system calls via function pointers instead of calling them directly. This is done because, in the normal course, the compiler generates tiny stubs whenever calls to routines in dynamically loaded DLLs are made from a program. This stub jumps to a function pointer stored in a table initialized by the operating system's loader at runtime. Since we don't want these fancy stubs generated for code that is meant to be injected into a remote process, we deal exclusively with function pointers.

    Fortunately for us, the system APIs (in kernel32, user32 etc.) always get loaded at the same virtual address in all processes. So all we need to do is initialize a data structure with pointers to all the system calls we want to make from the remote process and pass this structure along also. With our entry-point overwrite strategy of course, how are we to do this? To make a long story short, I settled for the following approach.

  • First, I modified remote_thread to look like this:

//
//  Routine to execute in remote process. 
//
static void remote_thread()
{
    //
    // this will get replaced with a
    // real pointer to the data when it
    // gets injected into the remote
    // process
    //
    SELFDEL *remote = (SELFDEL *)0xFFFFFFFF;

    //
    // wait for parent process to terminate
    //
    remote->fnWaitForSingleObject(remote->hParent, INFINITE);
    remote->fnCloseHandle(remote->hParent);

    //
    // try to delete the executable file 
    //
    while(!remote->fnDeleteFile(remote->szFileName))
    {
        //
        // failed - try again in one second's time
        //
        remote->fnSleep(1000);
    }

    //
    // finished! exit so that we don't execute garbage code
    //
    remote->fnExitProcess(0);
}
  • I then converted this into shellcode (the exact mechanics of which I'll outline in another post) to arrive at what looks like this (this is just representative shellcode and not the one that got generated for the routine shown above):

char shellcode[] = {
    '\x55', '\x8B', '\xEC', '\x83', '\xEC', 
    '\x10', '\x53', '\xC7', '\x45', '\xF0',
    '\xFF', '\xFF', '\xFF', '\xFF',   // replace these 4 bytes
                                      // with actual address
    '\x8B', '\x45', '\xF0', '\x8B', '\x48',
    '\x20', '\x89', '\x4D', '\xF4', '\x8B',
    '\x55', '\xF0', '\x8B', '\x42', '\x24',
    '\x89', '\x45', '\xFC', '\x6A', '\xFF', ... more shell code here

  • shellcode, if you didn't know, is the technical term used (in security circles) to refer to binary machine code that is typically used in exploits as the payload. As it turns out in our case the value 0xFFFFFFFF that we initialized the pointer remote with in remote_thread shows up the exact same way in the shellcode also. Since we know where the entry point lives in the remote process, all we need to do is to first replace 0xFFFFFFFF in the shellcode with the actual pointer to the data before over-writing the entry point. Here's how this looks:

STARTUPINFO             si = { sizeof(si) };
PROCESS_INFORMATION     pi;
SELFDEL                 local;
DWORD                   data;
TCHAR                   szExe[MAX_PATH] = _T( "explorer.exe" );
DWORD                   process_entry;

//
// this shellcode self-deletes and then shows a messagebox
//
char shellcode[] = {
    '\x55', '\x8B', '\xEC', '\x83',
    '\xEC', '\x10', '\x53', '\xC7',
    '\xFF', '\xFF', '\xFF', '\xFF',   // replace these 4 bytes
                                      // with actual address
    '\x8B', '\x45', '\xF0', '\x8B',
    '\x48', '\x20', '\x89', '\x4D',

    ... snipped lots of meaningless shellcode here! ...

    '\xFF', '\xD0', '\x5B', '\x8B',
    '\xE5', '\x5D', '\xC3'
};

//
// initialize the SELFDEL object
//
local.fnWaitForSingleObject     = (FARPROC)WaitForSingleObject;
local.fnCloseHandle             = (FARPROC)CloseHandle;
local.fnDeleteFile              = (FARPROC)DeleteFile;
local.fnSleep                   = (FARPROC)Sleep;
local.fnExitProcess             = (FARPROC)ExitProcess;
local.fnRemoveDirectory         = (FARPROC)RemoveDirectory;
local.fnGetLastError            = (FARPROC)GetLastError;
local.fnLoadLibrary             = (FARPROC)LoadLibrary;
local.fnGetProcAddress          = (FARPROC)GetProcAddress;

//
// Give remote process a copy of our own process handle
//
DuplicateHandle(GetCurrentProcess(), GetCurrentProcess(), 
    pi.hProcess, &local.hParent, 0, FALSE, 0);
GetModuleFileName(0, local.szFileName, MAX_PATH);

//
// get the process's entry point address
//
process_entry = GetProcessEntryPointAddress( pi.hProcess, pi.hThread );

//
// replace the address of the data inside the
// shellcode (bytes 10 to 13)
//
data = process_entry + sizeof( shellcode );
shellcode[13] = (char)( data >> 24 );
shellcode[12] = (char)( ( data >> 16 ) & 0xFF );
shellcode[11] = (char)( ( data >> 8 ) & 0xFF );
shellcode[10] = (char)( data & 0xFF );

//
// copy our code+data at the exe's entry-point
//
VirtualProtectEx( pi.hProcess,
                  (PVOID)process_entry,
                  sizeof( local ) + sizeof( shellcode ),
                  PAGE_EXECUTE_READWRITE,
                  &oldProt );
WriteProcessMemory( pi.hProcess,
                    (PVOID)process_entry,
                    shellcode,
                    sizeof( shellcode ), 0);
WriteProcessMemory( pi.hProcess,
                    (PVOID)data,
                    &local,
                    sizeof( local ), 0);

//
// Let the process continue
//
ResumeThread(pi.hThread);

There! That's all there is to it. Please find the code for a self-deleting executable (that among other things also displays a message box from the remote process's hijacked entry point) here:

myselfdel.c
ntundoc.h
Link Comment (7)
 
blogorama home
about this blog
email the author
where on earth am i?
subscribe to mailing list
feeds Use these links for feed syndication
rss  |  atom
by category
technobabble (60)
philosophical crud (3)
irrelevant stuff (7)
archive
november, 2011 (2)
october, 2011 (1)
september, 2011 (7)
july, 2011 (3)
june, 2011 (2)
may, 2011 (3)
april, 2011 (1)
march, 2011 (1)
february, 2011 (1)
february, 2010 (1)
october, 2009 (1)
september, 2009 (1)
july, 2009 (5)
march, 2009 (2)
august, 2008 (2)
march, 2008 (1)
january, 2008 (1)
september, 2007 (2)
april, 2007 (1)
february, 2007 (2)
december, 2006 (1)
october, 2006 (1)
september, 2006 (4)
august, 2006 (3)
july, 2006 (4)
june, 2006 (3)
may, 2006 (6)
april, 2006 (2)
recent entries
Implementing variab...
Debugging existing...
Screen scraping wit...
Building an Instagr...
Building an Instagr...
Organizing your Jav...
298010 hits