Wednesday, June 6, 2007

MISL to C# (Sharp) -> Loops - the simple cases

I wish, as a decompiler writer, there would be no loops. Programmers use thousands of goto statements with if statements. But as it is not the case I must understand how to parse MSIL instructions that were generated from the loops.

Of the three types of most common loops (for, while, do-while) the while loop is the basic one. The block that is generated from any type of these loops usually has a conditional jump (usually a brtrue.s

Monday, June 4, 2007

MISL to C# (Sharp) -> Iff (if and only if)

Today I’ll talk about the most basic and useful if-then structure. The compiler generates a conditional branch. We create a block of instructions for the structure. The block is separated by the branch instruction (like brtrue) and the branch label (like IL_0019 - where the code jumps). And our condition is on the stack. If we find a true condition branch we negate it and put it as a if statement condition. The block is initially a block of MSIL that we will convert to C# code later (
may need recursion here??). Please note that the labels are not stored in MSIL. It is just the byte offset of the MSIL in a method.

We take a simple method to test our theory.

public int IfStructure(int a, int b)
{
bool CS_4_0001;
if (a < b)
{
System.Console.Write("Condition is true");
}
return b;
}

Here is the MSIL code generated by Visual Studio 2005 compiler.

.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: clt
IL_0005: ldc.i4.0
IL_0006: ceq
IL_0008: stloc.1
IL_0009: ldloc.1
IL_000a: brtrue.s IL_0019

IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop

IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret
} // end of method ControlStructures::IfStructure

We now start parsing:
-------------------------------------------------------------------------------------
.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS$1$0000,
[1] bool CS$4$0001)
--

These lines generate output that we do without any processing of MSIL. There is method definition and local variables. We change the local variable names to C# current names without conflict. For simplicity here we just replace '$' with '_'. One thing to evaluate the MSIL instructions we must keep a map of local variables with variable number. In this example CS$1$0000 is local variable 0 of type int. For clarity we do not show the map here. A simple STL map should work. Output: public int IfStructure(int a, int b) { int32 CS_1_0000; bool CS_4_0001; Stack: [Empty] ------------------------------------------------------------------------------------- IL_0000: nop IL_0001: ldarg.1 IL_0002: ldarg.2 -- So we push method argument 1 and 2 to stack: Output: [None] Stack: a,b ------------------------------------------------------------------------------------- IL_0003: clt -- This instructs us if stack top-1 is less than stack top. The two elements are popped from stack and result goes to stack. No output of course. Output: [None] Stack: a<b ------------------------------------------------------------------------------------- IL_0005: ldc.i4.0 -- Load (aka push) constant integer of value 0 on stack. Output: [None] Stack: a<b,0 ------------------------------------------------------------------------------------- IL_0006: ceq -- Check if stack top-1 equals stack top. Result goes to stack. Output: [None] Stack: a < b == 0 ------------------------------------------------------------------------------------- IL_0008: stloc.1 IL_0009: ldloc.1 -- What else? Store stack top in local variable 1 and load that on tack again. We decided previously when we store some value in a local variable we use assignment to that variable and output that code. For clarity I added parentheses: Output:CS_4_0001 = (a < b == 0) Stack:CS_4_0001 ------------------------------------------------------------------------------------- IL_000a: brtrue.s IL_0019 -- We have got a conditional branch. We create a block starting from here to IL_0019. And put them in curly braces. And our condition is on the stack. We find a true condition branch so we negate it and put it as if structure as I said at the beginning. Output: if(!CS_4_0001) { IL_000c: nop IL_000d: ldstr "Condition is true" IL_0012: call void [mscorlib]System.Console::Write(string) IL_0017: nop IL_0018: nop } Stack: [Empty] ------------------------------------------------------------------------------------- IL_0019: ldarg.2 IL_001a: stloc.0 IL_001b: br.s IL_001d IL_001d: ldloc.0 IL_001e: ret We do not parse them here. They are very simple to understand. =========================================================== Ok we now can work on little more complex code3s than easiest. This will also produce codes that was generated by "for structure" but in a funny way. If we add a little more intelligence to produce "goto" output code for special branching that we cannot handle with if, we get following result. The code like this: for(int i=0;i<10;i++) { ... } ... will be converted to: int i; i=0; label_1: if(i<10) { --- i++; goto label_1; } ... For is not our today’s material. We'll look at loops next time. ===================================================================================== Now you may find that our theory generates a funny code block like: CS$4$0001=((a<b)==0); if(!CS$4\$0001)
{

---

}

Here the optimization comes to scene. But we skip them for future.