Some performance notes that i got from MSDN.
JIT compiler performs the following optimization given the small amount of time:
•Constant and copy propagation
•Common subexpression elimination
•Code motion of loop invariants
•Dead store and dead code elimination
•Loop unrolling (small loops with small bodies)
Value types, including integral types, floating point types, enums, and structs, typically live on the STACK.
Reference types and boxed value types live in the HEAP. They are addressed by object references, which are simply machine pointers just like object pointers in C/C++.
NGEN, a tool which "ahead-of-time" compiles the CIL into native code assemblies
9 million allocation
Type Size of Allocation Execution Time
string 575,783 00:00:2739811
int 8620 bytes - 40 instances 00:00:2515444
short 10472 bytes - 238 instances 00:00:2538425
employee 1,440,457,523 00:04:3698716
Virtual method call incurs two additional loads compared to an instance call, one to fetch the method table address (always found at *(this+0)), and another to fetch the appropriate virtual method address from the method table and call it.
Interface calls typically involves method table, its interface map, the interface's entry in that map, and then call indirect through appropriate entry in the interface's section of the method table
Reference type are instantiated and then initialized.
1. CLR performs initialization for you, therefore you don't need to perform any initialization unless it is required. Array will get a length field, method table.
2. Then calls constructor
3. Lastly run user defined initialization
Extended types -> E extend D extend C extend B extend A -> five method calls is involved. Large types fills up Gen 0 heap faster.
A cast from a derived type to a base type is always safe—and free. Cast from a base type to a derived type must be type-checked. The C# as keyword, is versatile which allows casting to return null rather than an exceptions if type-safety is violated.
Properties call are the same as method call. Through get_ and set_ method.
Loading and storing of value type is faster as compare to reference type. Reason for the higher delay is a) bounds need to be checked b) the stored type need to be checked to make sure it is of the same type c) perform write barrier.
Write barrier is a method to store references from older to new generation. for example, A (in generation 1) reference object in Reference B. This is possible.
Avg Min Primitive
1.9 1.9 load int array elem
1.9 1.9 store int array elem
2.5 2.5 load obj array elem
16.0 16.0 store obj array elem
Boxing and Unboxing
To "box" a value type is to create a reference type object that holds a copy of its value type. This is conceptually the same as creating a class with an unnamed instance field of the same type as the value type. To unbox is to do the reverse, into a new value type. Boxing an integer is really expensive task.
Table 9 Box and Unbox int Times (ns)
Avg Min Primitive
29.0 21.6 box int
3.0 3.0 unbox int
Delegate (inherit from Multiclass delegate)
Delegate is much slower with an average invocation time of 41.1
Calling reflection in .Net is rather slow.
Low-level considerations (several being C# (default TypeAttributes.SequentialLayout) and x86 specific):
•The size of a value type is generally the total size of its fields, with 4-byte or smaller fields aligned to their natural boundaries.
•It is possible to use [StructLayout(LayoutKind.Explicit)] and [FieldOffset(n)] attributes to implement unions.
•The size of a reference type is 8 bytes plus the total size of its fields, rounded up to the next 4-byte boundary, and with 4-byte or smaller fields aligned to their natural boundaries.
•In C#, enum declarations may specify an arbitrary integral base type (except char)—so it is possible to define 8-bit, 16-bit, 32-bit, and 64-bit enums.
•As in C/C++, you can often shave a few tens of percent of space off a larger object by sizing your integral fields appropriately.
•You can inspect the size of an allocated reference type with the CLR Profiler.
•Large objects (many dozens of KB or more) are managed in a separate large object heap, to preclude expensive copying.
•Finalizable objects take an additional GC generation to reclaim—use them sparingly and consider using the Dispose Pattern.
Big picture considerations:
•Each AppDomain currently incurs a substantial space overhead. Many runtime and Framework structures are not shared across AppDomains.
•Within a process, jitted code is typically not shared across AppDomains. If the runtime is specifically hosted, it is possible to override this behavior. See the documentation for CorBindToRuntimeEx and the STARTUP_LOADER_OPTIMIZATION_MULTI_DOMAIN flag.
•In any event, jitted code is not shared across processes. If you have a component that will be loaded into many processes, consider precompiling with NGEN to share the native code.