Wednesday, June 6, 2007

MISL to C# (Sharp) -> Loops - the simple cases

I wish, as a decompiler writer, there would be no loops. Programmers use thousands of goto statements with if statements. But as it is not the case I must understand how to parse MSIL instructions that were generated from the loops.

Of the three types of most common loops (for, while, do-while) the while loop is the basic one. The block that is generated from any type of these loops usually has a conditional jump (usually a brtrue.s

Monday, June 4, 2007

MISL to C# (Sharp) -> Iff (if and only if)

Today I’ll talk about the most basic and useful if-then structure. The compiler generates a conditional branch. We create a block of instructions for the structure. The block is separated by the branch instruction (like brtrue) and the branch label (like IL_0019 - where the code jumps). And our condition is on the stack. If we find a true condition branch we negate it and put it as a if statement condition. The block is initially a block of MSIL that we will convert to C# code later (
may need recursion here??). Please note that the labels are not stored in MSIL. It is just the byte offset of the MSIL in a method.

If you have not already read you are requested to read my previous post.


We take a simple method to test our theory.

public int IfStructure(int a, int b)
{
bool CS_4_0001;
if (a < b)
{
System.Console.Write("Condition is true");
}
return b;
}

Here is the MSIL code generated by Visual Studio 2005 compiler.

.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: clt
IL_0005: ldc.i4.0
IL_0006: ceq
IL_0008: stloc.1
IL_0009: ldloc.1
IL_000a: brtrue.s IL_0019

IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop

IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret
} // end of method ControlStructures::IfStructure

We now start parsing:
-------------------------------------------------------------------------------------
.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
--

These lines generate output that we do without any processing of MSIL. There is method definition and local variables. We change the local variable names to C# current names without conflict. For simplicity here we just replace '$' with '_'. One thing to evaluate the MSIL instructions we must keep a map of local variables with variable number. In this example CS$1$0000 is local variable 0 of type int. For clarity we do not show the map here. A simple STL map should work.

Output:
public int IfStructure(int a, int b)
{
int32 CS_1_0000;
bool CS_4_0001;

Stack: [Empty]

-------------------------------------------------------------------------------------
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
--
So we push method argument 1 and 2 to stack:

Output: [None]

Stack:
a,b

-------------------------------------------------------------------------------------
IL_0003: clt
--

This instructs us if stack top-1 is less than stack top. The two elements are popped from stack and result goes to stack. No output of course.

Output: [None]

Stack:
a<b

-------------------------------------------------------------------------------------
IL_0005: ldc.i4.0
--

Load (aka push) constant integer of value 0 on stack.

Output: [None]
Stack: a<b,0
-------------------------------------------------------------------------------------
IL_0006: ceq
--

Check if stack top-1 equals stack top. Result goes to stack.
Output: [None]
Stack: a < b == 0

-------------------------------------------------------------------------------------
IL_0008: stloc.1
IL_0009: ldloc.1
--

What else? Store stack top in local variable 1 and load that on tack again. We decided previously when we store some value in a local variable we use assignment to that variable and output that code. For clarity I added parentheses:

Output:CS_4_0001 = (a < b == 0)
Stack:CS_4_0001

-------------------------------------------------------------------------------------
IL_000a: brtrue.s IL_0019
--

We have got a conditional branch. We create a block starting from here to IL_0019. And put them in curly braces. And our condition is on the stack. We find a true condition branch so we negate it and put it as if structure as I said at the beginning.

Output:
if(!CS_4_0001)
{
IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop
}

Stack: [Empty]

-------------------------------------------------------------------------------------
IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret

We do not parse them here. They are very simple to understand.
===========================================================

Ok we now can work on little more complex code3s than easiest. This will also produce codes that was generated by "for structure" but in a funny way. If we add a little more intelligence to produce "goto" output code for special branching that we cannot handle with if, we get following result.

The code like this:


for(int i=0;i<10;i++)
{
...
}
...

will be converted to:

int i;
i=0;
label_1:

if(i<10)
{
---
i++;
goto label_1;
}
...

For is not our today’s material. We'll look at loops next time.

=====================================================================================
Now you may find that our theory generates a funny code block like:

CS$4$0001=((a<b)==0);

if(!CS$4$0001)
{

---

}


Here the optimization comes to scene. But we skip them for future.

Monday, May 28, 2007

MSIL to C# -> The theory development begins

Welcome to my journey of writting a .NET assembly decompiler.

First of all I am trying to develop a theory to decompile MSIL. I just do whatever a MSIL instruction ask me to do. But I do it keeping in mind that I am decompiling MSIL. So when it asks to push me a value of a variable I push the name of that varuiable on stack.

To understand the code it is required that you know or have a reference of what each isntruction
of MSIL actually does.

Here is sample program to test if our concept works:

namespace DisasmIL
{
class Math
{
public int add(int x, int y)
{
return x + y;
}
}

class Program
{
static void Main(string[] args)
{
Math m;
int a, b;
m = new Math();
a = 20;
b = 50;
int p = m.add(a, b);
}
}
}

We only check one method. The Main method.When the Main method is
compiled it takes following form.
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)
IL_0000: nop
IL_0001: newobj instance void DisasmIL.Math::.ctor()
IL_0006: stloc.0
IL_0007: ldc.i4.s 20
IL_0009: stloc.1
IL_000a: ldc.i4.s 50
IL_000c: stloc.2
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3
IL_0016: ret
} // end of method Program::Main
=============================================================

We parse line by line:
------------------------------------------------------------------------
.method private hidebysig static void Main(string[] args) cil managed

It is method declaration with default starting curly brace.
Output code:
static void Main(string[] args)
{
Stack: [empty]
------------------------------------------------------------------------
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)

Need not to be explained. They are self explanatory. Declare variables.
Output code:
DisasmIL.Math m;
int a;
int b;
int p;
Stack:
------------------------------------------------------------------------
IL_0000: nop:

Does nothing (nop).
Output code: [none]
Stack: [empty]
------------------------------------------------------------------------
IL_0001: newobj instance void DisasmIL.Math::.ctor()

Create new instance of DisasmIL.Math using default constructor
so we push "new DisasmIL.Math()" on our stack.
Output code:
Stack: new DisasmIL.Math()
------------------------------------------------------------------------
IL_0006: stloc.0

So we pop top of stack and assign it to local variable 0.
Output code:
m = new DisasmIL.Math();
Stack: [empty]
------------------------------------------------------------------------
IL_0007: ldc.i4.s 20

What we do is push constant 20 on stack.
Output code: [none]
Stack: 20
------------------------------------------------------------------------
IL_0009: stloc.1

So we pop top value and assign it to local variable 1.
Output code:
a = 20;
Stack: [empty]
------------------------------------------------------------------------
IL_000a: ldc.i4.s 50

We push constant 50 on stack.
Output code: [none]
Stack: 50
------------------------------------------------------------------------
IL_000c: stloc.2

We pop top value and assign it to local variable 2.
Output code:
b = 50;
Stack: [empty]
------------------------------------------------------------------------
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2

Push local variable 0, 1 and 2 on stack.
Output code:
Stack: m, a, b
-------------------------------------------------------------------------
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3

We call add method with values top-1, top of stack for instance of
top-2. For any method call if it returns value it is returned on
stack. So check next instruction. If it is a stloc then we assign the
return value. We assign return value to local variable 3.

Output code: p=m.add(a,b);
Stack:
--------------------------------------------------------------------------
IL_0016: ret

Return void. So no code except closing curly brace.
Output code:
}

Stack: [empty]

Now if you add the output codes together you'll find the original C# code is generated. This works for simple cases. Need to test if it works for complex situations. Miles to go....