I wish, as a decompiler writer, there would be no loops. Programmers use thousands of goto statements with if statements. But as it is not the case I must understand how to parse MSIL instructions that were generated from the loops.
Of the three types of most common loops (for, while, do-while) the while loop is the basic one. The block that is generated from any type of these loops usually has a conditional jump (usually a brtrue.s
Wednesday, June 6, 2007
Monday, June 4, 2007
MISL to C# (Sharp) -> Iff (if and only if)
Today I’ll talk about the most basic and useful if-then structure. The compiler generates a conditional branch. We create a block of instructions for the structure. The block is separated by the branch instruction (like brtrue) and the branch label (like IL_0019 - where the code jumps). And our condition is on the stack. If we find a true condition branch we negate it and put it as a if statement condition. The block is initially a block of MSIL that we will convert to C# code later (
may need recursion here??). Please note that the labels are not stored in MSIL. It is just the byte offset of the MSIL in a method.
If you have not already read you are requested to read my previous post.
We take a simple method to test our theory.
public int IfStructure(int a, int b)
{
bool CS_4_0001;
if (a < b)
{
System.Console.Write("Condition is true");
}
return b;
}
Here is the MSIL code generated by Visual Studio 2005 compiler.
.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: clt
IL_0005: ldc.i4.0
IL_0006: ceq
IL_0008: stloc.1
IL_0009: ldloc.1
IL_000a: brtrue.s IL_0019
IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop
IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret
} // end of method ControlStructures::IfStructure
We now start parsing:
-------------------------------------------------------------------------------------
.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
--
These lines generate output that we do without any processing of MSIL. There is method definition and local variables. We change the local variable names to C# current names without conflict. For simplicity here we just replace '$' with '_'. One thing to evaluate the MSIL instructions we must keep a map of local variables with variable number. In this example CS$1$0000 is local variable 0 of type int. For clarity we do not show the map here. A simple STL map should work.
Output:
public int IfStructure(int a, int b)
{
int32 CS_1_0000;
bool CS_4_0001;
Stack: [Empty]
-------------------------------------------------------------------------------------
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
--
So we push method argument 1 and 2 to stack:
Output: [None]
Stack:
a,b
-------------------------------------------------------------------------------------
IL_0003: clt
--
This instructs us if stack top-1 is less than stack top. The two elements are popped from stack and result goes to stack. No output of course.
Output: [None]
Stack:
a<b
-------------------------------------------------------------------------------------
IL_0005: ldc.i4.0
--
Load (aka push) constant integer of value 0 on stack.
Output: [None]
Stack: a<b,0
-------------------------------------------------------------------------------------
IL_0006: ceq
--
Check if stack top-1 equals stack top. Result goes to stack.
Output: [None]
Stack: a < b == 0
-------------------------------------------------------------------------------------
IL_0008: stloc.1
IL_0009: ldloc.1
--
What else? Store stack top in local variable 1 and load that on tack again. We decided previously when we store some value in a local variable we use assignment to that variable and output that code. For clarity I added parentheses:
Output:CS_4_0001 = (a < b == 0)
Stack:CS_4_0001
-------------------------------------------------------------------------------------
IL_000a: brtrue.s IL_0019
--
We have got a conditional branch. We create a block starting from here to IL_0019. And put them in curly braces. And our condition is on the stack. We find a true condition branch so we negate it and put it as if structure as I said at the beginning.
Output:
if(!CS_4_0001)
{
IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop
}
Stack: [Empty]
-------------------------------------------------------------------------------------
IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret
We do not parse them here. They are very simple to understand.
===========================================================
Ok we now can work on little more complex code3s than easiest. This will also produce codes that was generated by "for structure" but in a funny way. If we add a little more intelligence to produce "goto" output code for special branching that we cannot handle with if, we get following result.
The code like this:
for(int i=0;i<10;i++)
{
...
}
...
will be converted to:
int i;
i=0;
label_1:
if(i<10)
{
---
i++;
goto label_1;
}
...
For is not our today’s material. We'll look at loops next time.
=====================================================================================
Now you may find that our theory generates a funny code block like:
CS$4$0001=((a<b)==0);
if(!CS$4$0001)
{
---
}
Here the optimization comes to scene. But we skip them for future.
may need recursion here??). Please note that the labels are not stored in MSIL. It is just the byte offset of the MSIL in a method.
If you have not already read you are requested to read my previous post.
We take a simple method to test our theory.
public int IfStructure(int a, int b)
{
bool CS_4_0001;
if (a < b)
{
System.Console.Write("Condition is true");
}
return b;
}
Here is the MSIL code generated by Visual Studio 2005 compiler.
.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: clt
IL_0005: ldc.i4.0
IL_0006: ceq
IL_0008: stloc.1
IL_0009: ldloc.1
IL_000a: brtrue.s IL_0019
IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop
IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret
} // end of method ControlStructures::IfStructure
We now start parsing:
-------------------------------------------------------------------------------------
.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
--
These lines generate output that we do without any processing of MSIL. There is method definition and local variables. We change the local variable names to C# current names without conflict. For simplicity here we just replace '$' with '_'. One thing to evaluate the MSIL instructions we must keep a map of local variables with variable number. In this example CS$1$0000 is local variable 0 of type int. For clarity we do not show the map here. A simple STL map should work.
Output:
public int IfStructure(int a, int b)
{
int32 CS_1_0000;
bool CS_4_0001;
Stack: [Empty]
-------------------------------------------------------------------------------------
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
--
So we push method argument 1 and 2 to stack:
Output: [None]
Stack:
a,b
-------------------------------------------------------------------------------------
IL_0003: clt
--
This instructs us if stack top-1 is less than stack top. The two elements are popped from stack and result goes to stack. No output of course.
Output: [None]
Stack:
a<b
-------------------------------------------------------------------------------------
IL_0005: ldc.i4.0
--
Load (aka push) constant integer of value 0 on stack.
Output: [None]
Stack: a<b,0
-------------------------------------------------------------------------------------
IL_0006: ceq
--
Check if stack top-1 equals stack top. Result goes to stack.
Output: [None]
Stack: a < b == 0
-------------------------------------------------------------------------------------
IL_0008: stloc.1
IL_0009: ldloc.1
--
What else? Store stack top in local variable 1 and load that on tack again. We decided previously when we store some value in a local variable we use assignment to that variable and output that code. For clarity I added parentheses:
Output:CS_4_0001 = (a < b == 0)
Stack:CS_4_0001
-------------------------------------------------------------------------------------
IL_000a: brtrue.s IL_0019
--
We have got a conditional branch. We create a block starting from here to IL_0019. And put them in curly braces. And our condition is on the stack. We find a true condition branch so we negate it and put it as if structure as I said at the beginning.
Output:
if(!CS_4_0001)
{
IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop
}
Stack: [Empty]
-------------------------------------------------------------------------------------
IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret
We do not parse them here. They are very simple to understand.
===========================================================
Ok we now can work on little more complex code3s than easiest. This will also produce codes that was generated by "for structure" but in a funny way. If we add a little more intelligence to produce "goto" output code for special branching that we cannot handle with if, we get following result.
The code like this:
for(int i=0;i<10;i++)
{
...
}
...
will be converted to:
int i;
i=0;
label_1:
if(i<10)
{
---
i++;
goto label_1;
}
...
For is not our today’s material. We'll look at loops next time.
=====================================================================================
Now you may find that our theory generates a funny code block like:
CS$4$0001=((a<b)==0);
if(!CS$4$0001)
{
---
}
Here the optimization comes to scene. But we skip them for future.
Labels:
Conditional branching
Monday, May 28, 2007
MSIL to C# -> The theory development begins
Welcome to my journey of writting a .NET assembly decompiler.
First of all I am trying to develop a theory to decompile MSIL. I just do whatever a MSIL instruction ask me to do. But I do it keeping in mind that I am decompiling MSIL. So when it asks to push me a value of a variable I push the name of that varuiable on stack.
To understand the code it is required that you know or have a reference of what each isntruction
of MSIL actually does.
Here is sample program to test if our concept works:
namespace DisasmIL
{
class Math
{
public int add(int x, int y)
{
return x + y;
}
}
class Program
{
static void Main(string[] args)
{
Math m;
int a, b;
m = new Math();
a = 20;
b = 50;
int p = m.add(a, b);
}
}
}
We only check one method. The Main method.When the Main method is
compiled it takes following form.
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)
IL_0000: nop
IL_0001: newobj instance void DisasmIL.Math::.ctor()
IL_0006: stloc.0
IL_0007: ldc.i4.s 20
IL_0009: stloc.1
IL_000a: ldc.i4.s 50
IL_000c: stloc.2
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3
IL_0016: ret
} // end of method Program::Main
=============================================================
We parse line by line:
------------------------------------------------------------------------
.method private hidebysig static void Main(string[] args) cil managed
It is method declaration with default starting curly brace.
Output code:
static void Main(string[] args)
{
Stack: [empty]
------------------------------------------------------------------------
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)
Need not to be explained. They are self explanatory. Declare variables.
Output code:
DisasmIL.Math m;
int a;
int b;
int p;
Stack:
------------------------------------------------------------------------
IL_0000: nop:
Does nothing (nop).
Output code: [none]
Stack: [empty]
------------------------------------------------------------------------
IL_0001: newobj instance void DisasmIL.Math::.ctor()
Create new instance of DisasmIL.Math using default constructor
so we push "new DisasmIL.Math()" on our stack.
Output code:
Stack: new DisasmIL.Math()
------------------------------------------------------------------------
IL_0006: stloc.0
So we pop top of stack and assign it to local variable 0.
Output code:
m = new DisasmIL.Math();
Stack: [empty]
------------------------------------------------------------------------
IL_0007: ldc.i4.s 20
What we do is push constant 20 on stack.
Output code: [none]
Stack: 20
------------------------------------------------------------------------
IL_0009: stloc.1
So we pop top value and assign it to local variable 1.
Output code:
a = 20;
Stack: [empty]
------------------------------------------------------------------------
IL_000a: ldc.i4.s 50
We push constant 50 on stack.
Output code: [none]
Stack: 50
------------------------------------------------------------------------
IL_000c: stloc.2
We pop top value and assign it to local variable 2.
Output code:
b = 50;
Stack: [empty]
------------------------------------------------------------------------
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2
Push local variable 0, 1 and 2 on stack.
Output code:
Stack: m, a, b
-------------------------------------------------------------------------
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3
We call add method with values top-1, top of stack for instance of
top-2. For any method call if it returns value it is returned on
stack. So check next instruction. If it is a stloc then we assign the
return value. We assign return value to local variable 3.
Output code: p=m.add(a,b);
Stack:
--------------------------------------------------------------------------
IL_0016: ret
Return void. So no code except closing curly brace.
Output code:
}
Stack: [empty]
Now if you add the output codes together you'll find the original C# code is generated. This works for simple cases. Need to test if it works for complex situations. Miles to go....
First of all I am trying to develop a theory to decompile MSIL. I just do whatever a MSIL instruction ask me to do. But I do it keeping in mind that I am decompiling MSIL. So when it asks to push me a value of a variable I push the name of that varuiable on stack.
To understand the code it is required that you know or have a reference of what each isntruction
of MSIL actually does.
Here is sample program to test if our concept works:
namespace DisasmIL
{
class Math
{
public int add(int x, int y)
{
return x + y;
}
}
class Program
{
static void Main(string[] args)
{
Math m;
int a, b;
m = new Math();
a = 20;
b = 50;
int p = m.add(a, b);
}
}
}
We only check one method. The Main method.When the Main method is
compiled it takes following form.
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)
IL_0000: nop
IL_0001: newobj instance void DisasmIL.Math::.ctor()
IL_0006: stloc.0
IL_0007: ldc.i4.s 20
IL_0009: stloc.1
IL_000a: ldc.i4.s 50
IL_000c: stloc.2
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3
IL_0016: ret
} // end of method Program::Main
=============================================================
We parse line by line:
------------------------------------------------------------------------
.method private hidebysig static void Main(string[] args) cil managed
It is method declaration with default starting curly brace.
Output code:
static void Main(string[] args)
{
Stack: [empty]
------------------------------------------------------------------------
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)
Need not to be explained. They are self explanatory. Declare variables.
Output code:
DisasmIL.Math m;
int a;
int b;
int p;
Stack:
------------------------------------------------------------------------
IL_0000: nop:
Does nothing (nop).
Output code: [none]
Stack: [empty]
------------------------------------------------------------------------
IL_0001: newobj instance void DisasmIL.Math::.ctor()
Create new instance of DisasmIL.Math using default constructor
so we push "new DisasmIL.Math()" on our stack.
Output code:
Stack: new DisasmIL.Math()
------------------------------------------------------------------------
IL_0006: stloc.0
So we pop top of stack and assign it to local variable 0.
Output code:
m = new DisasmIL.Math();
Stack: [empty]
------------------------------------------------------------------------
IL_0007: ldc.i4.s 20
What we do is push constant 20 on stack.
Output code: [none]
Stack: 20
------------------------------------------------------------------------
IL_0009: stloc.1
So we pop top value and assign it to local variable 1.
Output code:
a = 20;
Stack: [empty]
------------------------------------------------------------------------
IL_000a: ldc.i4.s 50
We push constant 50 on stack.
Output code: [none]
Stack: 50
------------------------------------------------------------------------
IL_000c: stloc.2
We pop top value and assign it to local variable 2.
Output code:
b = 50;
Stack: [empty]
------------------------------------------------------------------------
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2
Push local variable 0, 1 and 2 on stack.
Output code:
Stack: m, a, b
-------------------------------------------------------------------------
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3
We call add method with values top-1, top of stack for instance of
top-2. For any method call if it returns value it is returned on
stack. So check next instruction. If it is a stloc then we assign the
return value. We assign return value to local variable 3.
Output code: p=m.add(a,b);
Stack:
--------------------------------------------------------------------------
IL_0016: ret
Return void. So no code except closing curly brace.
Output code:
}
Stack: [empty]
Now if you add the output codes together you'll find the original C# code is generated. This works for simple cases. Need to test if it works for complex situations. Miles to go....
Labels:
Initial
Subscribe to:
Comments (Atom)