Thursday, February 22, 2007

Need to call System.dll internal functions?

Occasionally you will discover a nice class or function you would like to use. I stumbled across [mscorcfg]Microsoft.CLRAdmin.Fusion.AddAssemblytoGac(string strAssembly). Oddly enough there's no way to add an assembly to the gac from .Net code. This would be just great for MSBuild tasks, or even just installing.

If you try and use this class or function you get this result.


c:\dev\test\private test.cs(10,9) : error CS0122: 'Microsoft.CLRAdmin.Fusion' is inaccessible due to its protection level



Turns out using this function is pretty easy in theory. .Net only checks permissions at link time. You could do get around this the hard way by using ILAsm, or Reflection in C#.

Here's how you would do this using Reflection.



public static Int32 AddAssemblyToGac(string strAssembly)
{
object[] args = newobject[] { strAssembly };
BindingFlags bindingFlags = (BindingFlags)314;
return ((Int32)(FusionType.InvokeMember("AddAssemblytoGac", bindingFlags, null, null, args)));
}

Well that just opens up about a million possibilities. Just try and "fix" one class from the ASP.NET framework and you have to drag in 12 million interfaces. Now you are set, just use the same one's that it was using. This is going to save me tons of time. There's no way I'm writing a million Reflection proxy interfaces. Nevermind the fact that you can just call the interface using IL. There just has to be a good way to do this from C#. Worse case we could lie/cheat to the compiler.

The C# team has added a nice Attribute for us to do this.

[InternalsVisibleTo("AndrewsAssembly, PublicKeyToken=0b00fde735121dcc")]

You can read up on it viewing InternalsVisibleToAttribute.

So ILDasm System.Web.dll, or your favorite assembly, and recompile adding this CustomAttribute. Compile your assembly and you are off and running.

Here's a view from Lutz Roeder's Reflector of my test app using an internal System.Web enum.

Tuesday, February 20, 2007

CustomAttributes with ILASM 2.0

There are some great debugging attributes you can add to your .Net project. Sadly I have not found a good way yet to get the debugger to use them on "system" assemblies that have private or internal classes. You can view all this information in the debugger, so I can't say it makes a lot of sense.



[assembly: DebuggerDisplay("{{_completionCallback.Method}}", Target=typeof(System.Web.HttpApplication.SyncEventExecutionStep))]



This fine code gets you this error message.


autoexp.cs(18,103): error CS0122: 'System.Web.HttpApplication.SyncEventExecutionStep' is inaccessible due to its protection level


Lame, very lame.

You can work around this by using ILASM. I mean we just need the type after all, can't be that hard. Here's a working ILDasm of a custom attribute.


.custom instance void [mscorlib]System.Diagnostics.DebuggerDisplayAttribute::.ctor(string) = ( 01 00 0C 5C 7B 7B 4D 65 73 73 61 67 65 7D 7D 01 // ...\{{Message}}.
00 54 50 06 54 61 72 67 65 74 7A 53 79 73 74 65 // .TP.TargetzSyste
6D 2E 52 65 66 6C 65 63 74 69 6F 6E 2E 52 65 66 // m.Reflection.Ref
6C 65 63 74 69 6F 6E 54 79 70 65 4C 6F 61 64 45 // lectionTypeLoadE
78 63 65 70 74 69 6F 6E 2C 20 6D 73 63 6F 72 6C // xception, mscorl
69 62 2C 20 56 65 72 73 69 6F 6E 3D 32 2E 30 2E // ib, Version=2.0.
30 2E 30 2C 20 43 75 6C 74 75 72 65 3D 6E 65 75 // 0.0, Culture=neu
74 72 61 6C 2C 20 50 75 62 6C 69 63 4B 65 79 54 // tral, PublicKeyT
6F 6B 65 6E 3D 62 37 37 61 35 63 35 36 31 39 33 // oken=b77a5c56193
34 65 30 38 39 ) // 4e089



This is just a joy to edit and try and use. Luckily with v2.0 it's much better.


.custom instance void [mscorlib]System.Diagnostics.DebuggerDisplayAttribute::.ctor(string)
= {string('\\{{_completionCallback.Method}}')property type 'Target' = type(class 'System.Web.HttpApplication.SyncEventExecutionStep, System.Web, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a')}

There you go, now it's editable, you just have build into your DLL of choice and you can almost tell what's happening.

Monday, February 12, 2007

Some good .Net debugging info

http://blogs.msdn.com/vancem/archive/2006/09/05/742062.aspx


Vance Morrison's Weblog



Vance Morrison is currently an Architect on the .NET Runtime Team, specializing in performance issues with the runtime or managed code in general.

I have let my blog laps for too long.    I am back to blogging.   I realized reciently that we have simply not written down many interesting facts about how the runtime actually works.  I want to fix this.   Coming up in future blogs I am going to be doing a bit of a 'architectural overview' which describe the differences between managed and unmanaged code, but before I do that I realized that I have not even finished a blog entry I started in March.

In my blog How to use Visual Studio to investigate code generation questions in managed code, I talk about the how to configure Visual Studio so that you can actually look at optimized code in the debugger (which sadly is not as trivial as you would like), and showed how to look at the disassembly of managed code.    Unfortunately manage code is hard to read without a guide, and so in this blog I will show you some very useful tips for reading managed assembly code.

In this blog entry I will show you the instructions ACTUALLY need to get executed to do something as simple as assigning a string to field of a class. Note that I am assuming a familiarity with X86 assembly code. If you are the type who never wants to read assembly code, you should stop reading now, because most of this blog is a step-by-step explanation of it.

I have attached the file InspectingManageCode.zip, which contains a (trivial), project that I used for this example.  You are STRONLY encouraged to open it (you can browse it the main file is Program.cs).  Copy the files (simply drag the 'InspectingManagedCode directory inside the ZIP to a directory of your choosing), launch the InspecingManagedCode.sln file and run the example.   While the project is already set to build and run optimized code, you will still need to turn off ‘just my code’ and turn on JIT optimization as described in my previous blog to follow along.

The code in the attached example is pretty trivial.
class Program
{
    string myString;
    private Program()
    {
        myString = "foo";
    }
    static void Main(string[] args)
    {
        Program p = new Program();
   }
}

If you were to follow the instructions in the previous blog to see what code was generated for the body of ‘Main’ you would find the following code.

00000000  push       esi
00000001  mov        ecx,9181F4h
00000006  call       FFCB1264
0000000b  mov        esi,eax
0000000d  mov         eax,dword ptr ds:[0227307Ch]
00000013  lea         edx,[esi+4]
00000016  call        79222B78
0000001b  pop         esi
0000001c  ret


At first glance this code has little similarity to the source code: the original source has a call the constructor ‘Program’ and the assembly code has two calls to strange hex addresses.  There are also references to magical numbers like 9181F4H and 0227307CH.   In this case the disassembly has not proven to be very valuable.    What can we do?   

Sadly if we try to peer into these CALL instructions we cannot, the debugger comes back with the very unhelpful message ‘There is no code at the specified location’.   Actually Visual Studio is LIEING to you. There really is code there, but it simply will not show you. I will show you techniques to get around this.

The key to unlocking mysteries of managed code, is a debug helper called SOS.DLL (it is a dll that is shipped with the runtime). The DLL is what is called a ‘debugger extension’. Basically it implements functionality that is useful in a debugger implementing functions that are useful for debugging code associated with it (in this case the runtime).   Other bloggers have also commented on the use of this DLL (do a web search of SOS.DLL for more).

In Visual Studio, you load SOS.DLL by opening the immediate window (Ctrl-D I) and typing

.load SOS.dll

If you do this you may get the message
SOS not available while Managed only debugging.


To load SOS, enable unmanaged debugging in your project properties.
This message is actually reasonably helpful. By stopping the debugger (Shift F5) going to Solution Explorer (Right hand pane), right clicking on the InspectingManagedCode project file, and selecting Properties, you will get the properties pane for the project. If you select the ‘Debug’ tab on the left side you will find 3 check boxes at the bottom, one of which is labeled ‘Enable unmanaged code debugging' If you check this, you put the debugger into a mode where it can debug both mananged and unmanaged code, (which means you can then use SOS.DLL).

I have already done this on the InspectingManagedCode project, but you will have to repeat this any time you need to use SOS.(Sadly the instructions for setting the debugger mode are different for C++).

Note that running the debugger to debug both managed and unmanaged code will slow the debugger down a bit (it loads the symbols for all the unmanaged DLLS), so you probably only want do this on projects like this one where you want to use SOS.DLL.

Now you should be able to set a breakpoint in Main(), run the program (F5), and go to the immediate window (CTRL-D I) and type
.load SOS.dll

extension C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll loaded.

If you are curious the SOS.DLL has reasonably good help, if you type the command
!Help


It will give you a list of commands, and you can get help on individual commands by specifying the name eg.

Help u

It will give you help on the ‘u’ (unassembled) command.    All SOS commands need to be prefixed by a ! character so that the Visual Studio Debugger knows that it is an SOS command and not an immediate value to be interpreted (the normal meaning of text typed in the immediate window).

The unassemble SOS command is the command we are interested in.   It will disassemble a managed routine, but do a much better job than Visual Studio presently does.   Unfortunately, we need the address of the routine we want disassemble, and Visual Studio goes to some length to hide this information. If you look at the disassembly for the code (CTRL-ALT-D), you will see that the address of the routine is never given, only the offset from the beginning of the method. 

The way around this is to use the ‘Registers window’ (Ctrl-D R). 
I happen to like to put this window just above the immediate window and shrink it so that only the two lines that actually show values are showing.   One of the registers is ‘EIP’ which stands for Extended Instruction Pointer’.  It is the address of the current instruction pointer.   In my particular invokaction EIP has the value of 00DE0071, so I can do the command

!u 00DE0071

Which will disassemble the ENTIRE routine that the address 00DE0071 lives in.  I like to right click in the immediate window and select ‘Clear All’ before I do this so the only thing in that window is the disassembly.   On my machine I get the result


Normal JIT generated code
Program.Main(System.String[])
Begin 00de0070, size 1d
00DE0070 56 push esi
00DE0071 B904309100 mov ecx,913004h
00DE0076 E8A11FB2FF call 0090201C (JitHelp: CORINFO_HELP_NEWSFAST)
00DE007B 8BF0 mov esi,eax
00DE007D 8B053C302B02 mov eax,dword ptr ds:[022B303Ch]
00DE0083 8D5604 lea edx,[esi+4]
00DE0086 E8A5380979 call 79E73930
00DE008B 5E pop esi
00DE008C C3 ret

It is not unlike the version the Visual Studio produced, but there are differences

1. You will note that the ‘call instruction is annoted with ‘JitHelp: CORINFO_HELP_NEWFAST’, which makes it at least a bit clearer that this helper is used to create a New object (and is the fast version, we have many variations).

2. It printed the whole routine that 00DE0071 lives in and prints a >>> on the instruction corresponding to the 00DE0071 address.

3. While it did not print the name for the ‘call 79E73930’, notice that the HEX value is different than the value in the Visual Studio Disassembly (79222B78). The value in the VS disassembly is simply WRONG (it is bug no one bothered to fix).

So let’s take a look at the first two instructions.

00DE0071 B904309100 mov ecx,913004h
00DE0076 E8A11FB2FF call 0090201C (JitHelp: CORINFO_HELP_NEWSFAST)

I mentioned that this helper call creates a new object from the GC heap. To do so it needs to know that type of the object to be created. This is what the magic number 913004 does.  Internally in the runtime types are described by a structure called a MethodTable, and 913004 is the address of the MethodTable to create.  We can find out what type 913004 corresponds to by using the !DumpMT (dump Method Table) SOS command. 

DumpMT 913004h

Produces the output

EClass: 00911254
Module: 00912c14
Name: Program
mdToken: 02000002  (C:\Documents and Settings\vancem\My Documents\Visual Studio 2005\Projects\InspectingManagedCode\bin\Release\InspectingManagedCode.exe)
BaseSize: 0xc
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 6

The only output of this that is interesting at this point is the ‘Name’ field, which as you can see, indicates that 913004 cooresponds to the ‘Program’ type.   Thus these first two instructions create a program object.   This program object comes back from the helper with all its fields zeroed, so the next instructions in the program are the body of the constructor (the Program() constructor has been inlined into the body of Main(). 

The next instructions

00DE007B 8BF0 mov esi,eax
00DE007D 8B053C302B02 mov eax,dword ptr ds:[022B303Ch]
00DE0083 8D5604 lea edx,[esi+4]
00DE0086 E8A5380979 call 79E73930

Basically implement the statement ‘myString = "foo"’ The helper returns a pointer into the uninitialized object in the EAX register.  The mov saves this into the ESI register.  EAX is then loaded with what is at the address 022B303Ch.  This happens to be the string “foo” (more on how it go there in a later blog).   You can confirm this by going to the disassembly code, setting a breakpoing right after the eax,dword ptr ds:[022B303Ch] instruction and looking at the value of the EAX register in the ‘registers’ window.   In my example it happens to be the value 012B1D44.   You can then use the command

!DumpObj 012B1D44

Which will dump the managed object at this address.  This will print .

DumpObj 012B1D44
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 24(0x18) bytes
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: foo
Fields:

MT Field Offset Type VT Attr Value Name

790fed1c 4000096 4 System.Int32 0 instance 4 m_arrayLength
790fed1c 4000097 8 System.Int32 0 instance 3 m_stringLength
790fbefc 4000098 c System.Char 0 instance 66 m_firstChar
790fa3e0 4000099 10 System.String 0 shared static Empty >> Domain:Value 0014c550:790d6584 <<
79124670 400009a 14 System.Char[] 0 shared static WhitespaceChars >> Domain:Value 0014c550:012b186c <<

Again, most of the output is uninteresting at this point, except the Name field (which says its a string), and the ‘String’ field (which shows the string value is ‘foo’).  So we have confirmed that this instruction loads up the address of the ‘foo’ string into the EAX register.  What is left is


00DE0083 8D5604 lea edx,[esi+4]
00DE0086 E8A5380979 call 79E73930

The first instruction ‘LEA’ may not be familiar to you.  It is Load Effective Address (LEA).  Basically it works just like a MOV instruction, but instead of moving what was AT the memory specified, it loads the ADDRESS of the memory.   Another way of looking at this is to imagine a MOV instruction with the [] dropped (which represent memory fetching).  Thus

00DE0083 8D5604 lea edx,[esi+4]

Can be thought of as

00DE0083 8D5604 mov edx, esi+4

That is it adds 4 to ESI and places it in EDX.   Now remember ESI points at our newly created ‘Program’ object.   We could find out all the fields of this object by dumping it,  In my debugger ESI has the value of 012B1D5C so I can do

!DumpObj 012B1D5C

And get
Name: Program
MethodTable: 00913004
EEClass: 00911254
Size: 12(0xc) bytes
(C:\Documents and
Settings\vancem\My Documents\Visual Studio 2005\Projects\InspectingManagedCode\bin\Release\InspectingManagedCode.exe)
Fields:
MT Field Offset Type VT Attr Value Name
790fa3e0 4000001 4 System.String 0 instance 00000000 myString

Which tells us that ESI points at a ‘Program’ object and that the total size of the object is 12 (more on that in a later blog), and that at offset 4 there is a field calls ‘myString’ of type System.String that currently has the value of 0 (null). So now we can make a pretty good guess that the LEA instruction is setting EDX to the address of the ‘myString’ field of the program object. EAX has been set to the ‘Foo’ String, and next comes the mysterious

00DE0086 E8A5380979 call 79E73930

Ideally SOS would have annotated this helper.   It is what we call a ‘WriteBarrier’.  More on exactly what a write barrier is later,  but for now the important thing to know is that ALL updates to OBJECT REFERENCES that live in the GC heap need to be done by calling a write barrier helper.    Since the Program object lives in the heap, and we are updating a object reference pointer inside it we need to use the write barrier.

The runtime actually has many write barriers.  All the write barriers have an unusual calling convention. They all take the address to be updated in the EDX register.   Then depending on the write barrier, they take the value to update in some other register (this particular write barrier is the most commonly used, and takes its argument in the EAX register).    Logically all the write barrier does is do (*EDX = EAX)  (that is update what EDX points at to be the value in EAX).

That is about it for this example  The only instructions we did not cover are the PUSH ESI, and POP ESI at the beginning and end of the routine.  As anyone who deals with assembly code this is simply saving and restoring ESI since we used it in the routine itself. 

To recap here are the instructions that actually got executed in the ‘Main’ program and what they do. 

push esi

// save ESI
mov ecx,913004h

// ECX = MethodTable(Program)
call 0090201C

// EAX = New Object (Program)
mov esi,eax

// ESI = this (new object)
mov eax,dword ptr ds:[022B303Ch] // EAX = “foo”
lea edx,[esi+4]

// EDX = &this.myString
call 79E73930

// this.myString = EAX (“foo”)
pop esi

// restore ESI
ret
// return.

We just understood very deaply EXACTLY what happens when a particular piece of managed code executes. Hopefully that wasn’t so bad. Next time we will dig a bit into this WriteBarrier is and exactly what it does (how expensive is it?). We will also dig into exactly what went on inside the ‘New’ helper. In later blogs I will go into how exactly other run time features get converted to native code.

I hope you are enjoying this peek under the hood of the .NET Runtime.