stone1.htm: In memory patching: three approaches

In memory patching: three approaches
(how to introduce breakpoints in an automated debugger and other marvels)
by Stone

(20 March 1997)

Advanced Cracking	Papers
Courtesy of fravia's page of reverse engineering
A very good essay by Stone, a great cracker and one of the few fine reversers around that produces his own VERY GOOD TOOLS.
This essay has a very high theoretical value and should IMO be read by ALL reversers: you'll find inside it matters like "how it's possible to introduce breakpoints in an automated debugger", "making the target load a DLL for me"... and other marvels. Stone intends to update this work in fieri, therefore your contributions on all these matters are welcomed. Enjoy! (Beginners shouldn't touch this stuff IMO)
In memory patching

Three approaches
by Stone
20 March 1998


After reading MadMax's essay on kernel patching I decided that perhaps
it was time for an essay on "in memory patching". Contrary to the general
+HCU philosophy my approach will be purely theoretical - the sourcecode I
provide will serve as an example for you to build on. 

Is something preventing a patch? Is your target encrypted, packed,
CRC'ed or you need the program to run sometimes with the patch applied 
sometimes without (A game-trainer for instance).Wouldn't you just love 
if you could patch the program in memory after it loaded, unpacked, did 
the CRC checks etc.? 
You can. In the dos days we had TSR's to do this job. In the
windows world it's a bit more difficult as the programming interface (Win32 
API) is dynamic in contrast to dos's static interupt system. However new 
methods which in many ways are similar to TSR's are now avaible.

Kernel patching as MadMax pointed out is generally a bad idea. We need a
more gentle approach. Which critereas would we like our solution to conform
to?
The critereas I'll use are:
1) The approach should perform ok in terms of compatability. That is
work on both NT and 95 and hopefully on future versions as well.

2) The operating system should not suffer any long term effects of the
crack. That is after termination of the target the OS should be left 
unchanged.

3) Only ring 3 measures should be used. (Some of the API-functions I'll
use from ring 3 will actually switch to ring 0, but atleast there will be 
no foreign code introduced at ring 0)


Common ground
Our immediate problem is that in a preemptive operating system like
windows each process runs in it's own addressing space. Each time that the
operating system switches to another process the virtual mapping is changed 
to fit that of the current process. 
The whole idea with memory patching is providing means of patching the target 
in it's addressing space at a certain time (after unpacking, CRC'ing or 
whatever is done). However since a criterium of the memory patch is that we 
can't patch the operating system nor the program itself we need to find a way 
of gaining access to the target addressing space from another process.

The next problem we got is one of timing. Obviously the target needs to
be patched after the CRC check has been performed or after it is unpacked
in memory. And possibly it needs to be unpatched again to pass later
checks. In other words we need a reliable trigger mechanism. It is in this 
respect that the three methods I'll present here differ.


The loader approach
The critical assumption I'll make here is that the USER of the program
can tell us how to time the patch thru another program. 
This basically means we assume that the user can:
1) Identify when patching is appropriate.
2) Switch to another program to activate.

About the first assumption it can be said - if it's a trainer this will
never be a problem. Obviously the user will know when he want's to have
infinate lives. Often a messagebox or some other visable sign shows itself 
when a patch is needed. E.g. A messagebox saying "Insert correct CD in 
drive and press OK" 
It'd be easy to write a doc saying that when this occurs the
dear user should press OK in another window first, and then in the
target's obnoxious messagebox. However this is a serious shortcomming. 
Who said the program will actually let the user make a retry? Most 30-day 
trials tell the user the program has expired and the just exit or get into 
trial mode or whatever.
Perhaps many different locations has to be patch at many different time
making user-controlled patching a cumbersome solution.

On assumption 2 can it be said that many games don't like switching
tasks and it's not likely that users will enjoy having to switch out of 
their game to get a new handful of bullets or whatever.

Let's get a bit more technical. Windows is so nice to provide us with an
interface to write in other processes addressing space. The API needed
is: kernel32!WriteProcessMemory
Taking a closer look at this you'll find that what it actually does
utilize Windows's int 2eh interface to switch to ring 0 meaning that it
has ring 0 priveledges and thus is able to override the page protection. 
However the interface has build in a security feature so you cannot override
ring 0 data/code. (The int 2eh interface is for NT - I figure Windows 95 
does something similar but I havn't checked it. Anyways the result is the
same)

For WriteProcessMemory to work we need to identify by handle which
process we want patched. IMHO the best to find such a handle is to create 
the target process yourself - that is do a good old fashioned EXEC from 
within your patch/trainer code. 

The API is Kernel32!CreateProcessA
Ofcause there are different means of finding process handles.

To synthetize a in-memory-patcher of this kind:
CreateProcessA (Target)
Wait for the user to say apply patch - e.g. amessageboxWriteProcessMemory

Sourcecodes at:
http://www.one.se/~stone/general/trainnt.zip (or something)
----------------------------------------------------------------------

The API-Hook/Debug Approach

Obviously the assumptions made for the Loader Approach can be too 
restrictive. For instance 30-day trials often exit prior to offering the
user any obvious point of introducing a patch. So does a dongle. Players
might not like to switch task out of their beloved game to get another
10 bullets or whatever. What we really need is the target to trigger the
patch and this section is a way of doing this.

The whole idea here is to hook an API-call, and make it perform to our
desire. That can be return fake values under certain circumstances
it could be to patch the main program or it's dll's in memory. In short
what we wish to do is to let the api-call the program performs be
surrounded by our code so that we can make it perform in every way we wish. 
Certain side benefits will come along as well. The code I present will
show how it's possible to introduce breakpoints in an automated debugger
which is indeed something very useful for the creation of for instance
unpackers.

Again let's get down to it. A PE-file "imports" the functions it wishes
to make use of. Because MS-developers decieded on a dynamic structure for
API's it's obviously neasesary for each program to declare what functions it
uses.
This is done in a so called import table. Let's now take a deeper look
into what takes place between the importtable in the PE-file and the
execution of an API call by the target.


3 basic types of information is stored in the importtable. The first is
DLL names, the second is function names and the third is a Thunk-RVA.
The information is stored in a structure that looks something like this:

	DLL1-Name
		Function1-from-dll1- name or ordinal
			Thunk-RVA of Function 1 of DLL 1
		Function2-from-1dll-name or ordinal
			Thunk-RVA of Function 2 of DLL 1
		....

	DLL2-Name
		Function1-from-dll2- name or ordinal
			Thunk-RVA of Function 1 of DLL 2
		Function2-from-1dl2-name or ordinal
			Thunk-RVA of Function 2 of DLL 2
		....
	...

What windows does while loading the PE-file is traverse thru this table 
following this "pseudo code":

  While more DLL's do
     { Load DLL into process addressing space 
         While More Functions imported from current DLL do 
            { Find address of Function and write this to the Thunk-VA
for
              this Function 
             } 
      }
	END Load Imports

The function may be listed by name or something called ordinal. In every
DLL each function that it exports for use by other programs is listed in an 
export directory (which is where windows find the address of the
imported function) in this list each DLL is assigned a number and usually 
a name too.
The number is called ordinal. Importing can be done either by
referencing this ordinal value or by using the name.
             
What the program then does when it's in need of the API-function it is
this: CALL Dword ptr [Thunk VA of needed function]

Lets for a second imagine that we could stop execution of the target
process right before it started and then inject our own code in to it's
addressing space. Then we could simply replace the value at any Thunk-VA 
with a pointer to our own code and our code would be executed every time 
the program decieded to use this API. 

We could even save the old pointer and use this to chain
the original intended API-code. Weeeeeee.. "Isn't this just great?" as
Oprah Winfrey would say. "No, it is not", as I would reply.
  
We are left with a new problem. Or rather two. The first is stopping 
Execution of the target process before the program runs the first 
instruction so that we can be sure that our new pointers are in order.
Second we're left the great problem of having code in the target's addressing
space.

Solving a problem at the time we start by examining how we can stop our 
target process. Many people always state that windows is overbloated and
perhaps they are right - but in this case I'd say that it's damn
convinient that MS-engeneers made a full-featured debug interface while 
designing API calls so that we could with the greatest of ease program a 
debugger without having to do the low-level work ourselves. 

Infact they made it so that not one line of ring 0 code has to be 
written to make an application debugger.
"Isn't this just great?" as Oprah would phrase it? "Yes it is, maam" as
I would reply. Because it get's even better. Windows engineers must've 
actually been thinking the day they made windows. What good is a 
full-featured debug interface if the poor programmer has to make a
PE-loader before he can even start debugging. Hey after all they already 
made a loader and they decieded to be helpful. CreateProcessA can open a
process in Debug mode. 
This means that inside of most windows's procedures hides status
breakpoints that'll turn over the control to our debugger thru that
interface. One of these status breakpoints triggers just before windows
is about to turn over control to the just loaded PE-file. Convinient!

Obviously if a process is in debug mode execution is suspended everytime
a debug event occurs. A debug event is any non-handled exception.
Pagefaults, breakpoints, division overflows, etc. 

And there are 6 different types of status breakpoints inside windows 
that'll be triggering like Rambo in Iraq.
So basically we need to send a message from our debugger process that
it's ok to continiue every time we have encountered such an event. Ofcause if
it's the event we've been looking for we need to do whatever it is we wish to
do before giving the green light to run on. This is the reason behind the
loop of kernel32!WaitForDebugEvent and kernel32!ContiniueDebugEvent
in my code.

So now we know how to stop the program before it actually started. If
you read the previous section you'll know how to exchange pointers. This
leaves us with a grave problem. Injecting our code into the target's 
addressing space.
Now this can be done in many ways indeed. We'll just be looking the one
I chose.

What I'll try to obtain is making the program load a DLL for me. This
ofcause isn't something th program is willing to do without force. Fortunately
for the moment I'm President Clinton and the security counsil has agreed to
bomb the target until it conforms to my ideas. The scene is set at the status
breakpoint just before the target is about to start execution. It is
fully loaded and ready to go. However we're sitting comfortably with it
suspended far far away in our own addressing space. The first thing we got to
agree on is how it is we actually want's the target to do. Load OUR dll, find
the process address of OUR function, replace the one found at the THunk-VA
of the original. We now constuct code that will do just that in deltaoffset
so that it can be inserted anywhere. Prior to actually running the program
we found a page within the target that allowed execution. Most pages in the
target allows execution but we just need one. We now read the page out
the Process space of the target into our own and stores it safely. This is
done thru another subfunction of INT 2eh which ofcause also overrides
pageprotection etc. The API is: kernel32!ReadProcessMemory
See Natzguls essay for a more thourough breakdown of this function. Now
we write our own code that loads a DLL, finds the address of our function
and replaces the Thunk-VA entry of the function with ours. 

Now were ready to go? No. We're left with the problem that execution
should be left otherwise unchanged so that we've written a page somewhere is
bad news.  So in addition to the code we appended we add an INT 3 which will
when executed cause a debug event and once more suspend the target allowing
us to restore the page. Unfortunately EIP of the target does not
neassesarely point to our page, further we use all the registers and those 
needs to be restored too. 

So where do we turn? Windows internal knowledge. Upon creation
that is prior to running any actual program code any one process has one
and one thread only. Further windows allows debuggers to fetch the Context
of a thread. That is all relevant information about the threads current
status. Such a context was originally intended for preemptive multitasking so
that when ever the OS suspended execution of the thread to do another the
context was saved, the address space swapped and another threads context was
restored it's process's address space swapped in place and it was allowed to
continue.
One should be aware that while a thread indeed has full context it's
partly shared with that of the other threads in the process. E.g. the FPU is
shared between threads in a process. Since we only got one thread in our
process the terms of thread and process is incidental.
We ofcause now read the context of our target's single thread, saves it
then changes the EIP in it an resets it to point to our page of code in the 
target processspace. Ofcause our code will now execute till the int 3 we
inserted is reached, then it's suspended and control is back with us. We
now reset the context of the thread and restore the page we abused for
our code. Then we simply let it run.

There is one last unfortunate thing about letting it run. If a process
was created in Debug mode it stays in debug mode till it's terminated. That
means that we need to stay in a loop of WaitForDebugEvent/COntiniueDebugEvent
until that time where the process is actually terminated or the program will 
suspend itself and wait for our instructions. This wasn't too smart MS!

Practical notes on the debug approach
A last side note should be mentioned here. While I was doing this code I
encountered a bug in windows NT workstation 4.0 build 1381. It might
exist on other versions too. Code inside windows looks like this:
mov eax, [offset of Context Storeing space in debugger code]
		; this is obvioulsy a parameter
mov ebx, [Temporaly variable containing ring level of debugger]
test eax,ebx
jnz insuficient_security
everything Ok.
Obviously this is wrong. To overcome this bug make sure that the offset
where you store your context and'ed with 3 is 0.

Further finding the ChunkVA of an imported function can easily be done
by dumping the PE-file with Matt Pietreks PE-dump or similar. He gives the
first chunk for each DLL, if your function isn't the first you add 4 bytes
each time you need to move a line down to find our function.

The sourcecodes for this can be found at:
http://www.one.se/~stone/general/stnapih.arj

---------------------------------------------------------------------

The MessageHook Approach

Forthcoming


source is forthcoming 

---------------

Literature

MadMax! (1998) - madmasu.htm: Cracking useing kernel32??, by MadMax Feb 1998.
	       @ http://fravia.org
Natzgul (1998) - natz_mp2.htm: How to access the memory of a process, a Tutorial, 
               by Natzgul Feb 1998, @ http://fravia.org

Pietrek, Matt - Windows 95 System Programming Secrets, IDG books 1995.


Various sourcecodes by Me :).. all can be found on my page
http://www.one.se/~stone

Thanks must go to:
Patriarch / PWA, friend roomate and local expert.
Random / Xforce, God of the PE-format
Net Walker / Brazil
United Cracking Force, my personal benefactor.
All of which I had many enlightning discussions with.


email: stone(at)one(point)se
http://www.one.se
Stone/UCF'98
2nd&mi!
-----
doc end

kind regards

Stone / United Cracking Force '98
You are deep inside fravia's page of reverse engineering, choose your way out:
Back to Advanced cracking
Back to the Papers section
links
+ORC
tools
Is reverse engineering legal?