Archive

Posts Tagged ‘recursion’

Performance of IronPython vs. C#

October 21, 2009 25 comments

I’ve been using IronPython for a while, and every now and then I do a cursory check to see how it’s performs against the equivalent C# code. C# is generally a touch faster at most logic, but IronPython is faster at dynamic invocation of methods than reflection in C#. In general it’s a wash…over the course of an application you dabble in things that different languages are better optimized to handle. When performance is a wash, you get the freedom to choose entirely based on the features of each language, which is nice.

This was going well, until I had a need to do some recursion. I found that the recursive IronPython code was executing extremely slowly – code that was taking milliseconds in C# was taking minutes in IronPython. That’s not a wash anymore…it’s a few orders of magnitude. So to really figure out what’s going on with these recursive calls, I made two implementations of calculating Fibonacci numbers – one in C# and one in IronPython.

Updated 10/23 – I really need to rework this code a bit. The more I look at it, I don’t think I’m comparing apples to apples here, partly because Fibonacci isn’t all that representative of my actual problem where I’m recursing through complex types rather than primitives. After I do a little more research, maybe I can find out what performance metrics to look at when IPy (and dynamic code in general) executes appreciably slower than statically-typed C# code. The problem now is, I can get this code to run without a lot of GC, but it still does a lot of something else that doesn’t seem to show up in perfmon.

IronPython:

def fib(num):
   if num == 0:
      return 0
   elif num == 1:
      return 1
   else:
      return fib(num-1) + fib(num-2)

for i in range(0, 41): print "%i %i" % (i, fib(i))

C#:

class testfib
{
   private static int fib(int num)
   {
      if(num == 0)
         return 0;
      else if(num == 1)
         return 1;
      else
         return fib(num - 1) + fib(num - 2);
   }

   public static void Main(string[] args)
   {
      int i;
      for(i = 0; i < 41; i++)
         System.Console.WriteLine("{0} {1}", i, fib(i));
   }
}

I ran the C# version – 5.5 seconds on average. I ran the IronPython version, 12 2-1/2 minutes. What’s going on here?

Updated 10/23 – Perfmon turned out to tell me very little in compared to some profiling.

Function Time Consumed(units) Hit Count Time Consumed(%)
IronPython.Compiler.PythonScriptCode::Run 5876406555 14 27.7169028
IronPython.Runtime.PythonFunction.FunctionCaller::Call1 5870624528 37860134 27.68963105
IronPython.Runtime.Operations.PythonOps::GetLocal 2794453317 37860205 13.18043438
IronPython.Runtime.Operations.PythonOps::GetVariable 2532912650 37860205 11.94684082
IronPython.Runtime.Binding.PythonBinaryOperationBinder::IntSubtract 607137805 37860099 2.863651344
IronPython.Runtime.Operations.PythonOps::GetGlobalContext 325908336 37860134 1.53719277
IronPython.Runtime.Operations.Int32Ops::Subtract 320598896 37860099 1.512150045
IronPython.Runtime.Operations.PythonOps::GetParentContextFromFunction 315597320 37860134 1.488559404
IronPython.Runtime.Binding.PythonBinaryOperationBinder::IntAdd 282632451 18930035 1.333075937
MS.Win32.HwndSubclass::SubclassWndProc 260663364 4048 1.22945563
MS.Win32.HwndSubclass::DispatcherCallbackOperation 258788088 4048 1.220610625
MS.Win32.HwndWrapper::WndProc 258788088 4048 1.220610625

Compare this to my C# version, which has a lot less going on, since all it’s work is in the single fib method:

Function Time Consumed(units) Hit Count Time Consumed(%)
testfib::Main 1182320483 1 53.01288057
testfib::fib 1047930863 133691744 46.98711938
testfib::.cctor 0 1 0

The information from profiling turns out to be much more useful than the GC numbers I get from perfmon. What’s really going on? First off, IronPython is making millions of calls to PythonFunction.FunctionCaller.Call1. That casts the function to a PythonFunction, then does a second cast to a Func delegate, which then gets invoked.

After that, IronPython is making millions of calls to PythonOps.GetLocal, which calls PythonOps.GetVariable, which goes to the CodeContext and starts trying to lookup the variable in a few dictionaries, which appears to include the global dictionary by the number of calls to PythonOps.GetGlobalContext. The variables are local to the fib() function, but the fib() function itself is in the global context.

What else is in here? Int32Ops.Add and Int32Ops.Subtract are obviously called a lot to find Fibonacci numbers. These both have an overflow check in them in case the result is out of the range of an Int32, in which the result will be a boxed Int64 value. If the result happens to fit in a Int32, it will be performed again, with BigIntegerOps. All of this is to hide the work of handling integer overflow.

All this extra work – the casting, invoking, and dictionary lookups – are all by-products of working with a dynamic language. There isn’t a lot you can do to avoid it, but one thing that can cut down tremendously is to avoid recursion here. With an iterative solution, IronPython doesn’t have to continually lookup the functions to call, and you’re not going to be moving between function scopes so there is no need for looking up variables in dictionaries. The iterative solution in IronPython is much, much faster:

def fib(n):
   previous = -1
   result = 1
   for i in range(0,n+1):
      sum = result + previous
      previous = result
      result = sum
   return result

On with my post, and the GC numbers that are more relevant if recursing through objects, but less relevant to the code I presented with value types.

I fired up perfmon.exe and started adding various counters, but the one where you see the most visible difference is in the .NET CLR Memory, specifically the “# Gen 0 Collections”. In the IronPython version, the generation 0 collections numbered around 4 million 30,000! On my C# implementation, there actually were no collections. Everything was done on the stack. To be fair to IronPython which has to box everything in PythonType, I changed the C# code to box everything in objects, like the following:

public static object fib(object num)

The boxed C# implementation averaged 22 seconds to run, and also had about 12 6,000 generation 0 garbage collections. It’s still nowhere near the 4 million 30,000 collections taken by IronPython.

Two good things I learned:
1) Be careful with recursion in IronPython, as those PythonType objects carry some baggage that can keep the GC pretty busy.
2) Perfmon is a very important part of your toolbox if you’re fixing performance problems in IronPython code, or just .NET in general.

And, just for those Python purists out there, running it in CPython averaged 3 minutes, 18 seconds. Yes, much faster than IronPython, much slower than C#, and I have no idea how to figure out why (I have to get back to you on the CPython numbers). I also made a C version, and it averaged 3.8 6.5 seconds to execute. Obviously no garbage collection, but mysteriously slower than the C# version. Hopefully nobody cares about the C version, but here it is in case you want to try it for yourself:

int fib(int num)
{
   if(num == 0)
      return 0;
   else if(num == 1)
      return 1;
   else
      return fib(num - 1) + fib(num - 2);
}

int main(void)
{
   int i;
   for(i = 0; i < 41; i++)
      printf("%i %i\r\n", i, fib(i));
   return 0;
}