Home > DLR, IronPython > Performance of IronPython vs. C#

Performance of IronPython vs. C#

I’ve been using IronPython for a while, and every now and then I do a cursory check to see how it’s performs against the equivalent C# code. C# is generally a touch faster at most logic, but IronPython is faster at dynamic invocation of methods than reflection in C#. In general it’s a wash…over the course of an application you dabble in things that different languages are better optimized to handle. When performance is a wash, you get the freedom to choose entirely based on the features of each language, which is nice.

This was going well, until I had a need to do some recursion. I found that the recursive IronPython code was executing extremely slowly – code that was taking milliseconds in C# was taking minutes in IronPython. That’s not a wash anymore…it’s a few orders of magnitude. So to really figure out what’s going on with these recursive calls, I made two implementations of calculating Fibonacci numbers – one in C# and one in IronPython.

Updated 10/23 – I really need to rework this code a bit. The more I look at it, I don’t think I’m comparing apples to apples here, partly because Fibonacci isn’t all that representative of my actual problem where I’m recursing through complex types rather than primitives. After I do a little more research, maybe I can find out what performance metrics to look at when IPy (and dynamic code in general) executes appreciably slower than statically-typed C# code. The problem now is, I can get this code to run without a lot of GC, but it still does a lot of something else that doesn’t seem to show up in perfmon.

IronPython:

def fib(num):
   if num == 0:
      return 0
   elif num == 1:
      return 1
   else:
      return fib(num-1) + fib(num-2)

for i in range(0, 41): print "%i %i" % (i, fib(i))

C#:

class testfib
{
   private static int fib(int num)
   {
      if(num == 0)
         return 0;
      else if(num == 1)
         return 1;
      else
         return fib(num - 1) + fib(num - 2);
   }

   public static void Main(string[] args)
   {
      int i;
      for(i = 0; i < 41; i++)
         System.Console.WriteLine("{0} {1}", i, fib(i));
   }
}

I ran the C# version – 5.5 seconds on average. I ran the IronPython version, 12 2-1/2 minutes. What’s going on here?

Updated 10/23 – Perfmon turned out to tell me very little in compared to some profiling.

Function Time Consumed(units) Hit Count Time Consumed(%)
IronPython.Compiler.PythonScriptCode::Run 5876406555 14 27.7169028
IronPython.Runtime.PythonFunction.FunctionCaller::Call1 5870624528 37860134 27.68963105
IronPython.Runtime.Operations.PythonOps::GetLocal 2794453317 37860205 13.18043438
IronPython.Runtime.Operations.PythonOps::GetVariable 2532912650 37860205 11.94684082
IronPython.Runtime.Binding.PythonBinaryOperationBinder::IntSubtract 607137805 37860099 2.863651344
IronPython.Runtime.Operations.PythonOps::GetGlobalContext 325908336 37860134 1.53719277
IronPython.Runtime.Operations.Int32Ops::Subtract 320598896 37860099 1.512150045
IronPython.Runtime.Operations.PythonOps::GetParentContextFromFunction 315597320 37860134 1.488559404
IronPython.Runtime.Binding.PythonBinaryOperationBinder::IntAdd 282632451 18930035 1.333075937
MS.Win32.HwndSubclass::SubclassWndProc 260663364 4048 1.22945563
MS.Win32.HwndSubclass::DispatcherCallbackOperation 258788088 4048 1.220610625
MS.Win32.HwndWrapper::WndProc 258788088 4048 1.220610625

Compare this to my C# version, which has a lot less going on, since all it’s work is in the single fib method:

Function Time Consumed(units) Hit Count Time Consumed(%)
testfib::Main 1182320483 1 53.01288057
testfib::fib 1047930863 133691744 46.98711938
testfib::.cctor 0 1 0

The information from profiling turns out to be much more useful than the GC numbers I get from perfmon. What’s really going on? First off, IronPython is making millions of calls to PythonFunction.FunctionCaller.Call1. That casts the function to a PythonFunction, then does a second cast to a Func delegate, which then gets invoked.

After that, IronPython is making millions of calls to PythonOps.GetLocal, which calls PythonOps.GetVariable, which goes to the CodeContext and starts trying to lookup the variable in a few dictionaries, which appears to include the global dictionary by the number of calls to PythonOps.GetGlobalContext. The variables are local to the fib() function, but the fib() function itself is in the global context.

What else is in here? Int32Ops.Add and Int32Ops.Subtract are obviously called a lot to find Fibonacci numbers. These both have an overflow check in them in case the result is out of the range of an Int32, in which the result will be a boxed Int64 value. If the result happens to fit in a Int32, it will be performed again, with BigIntegerOps. All of this is to hide the work of handling integer overflow.

All this extra work – the casting, invoking, and dictionary lookups – are all by-products of working with a dynamic language. There isn’t a lot you can do to avoid it, but one thing that can cut down tremendously is to avoid recursion here. With an iterative solution, IronPython doesn’t have to continually lookup the functions to call, and you’re not going to be moving between function scopes so there is no need for looking up variables in dictionaries. The iterative solution in IronPython is much, much faster:

def fib(n):
   previous = -1
   result = 1
   for i in range(0,n+1):
      sum = result + previous
      previous = result
      result = sum
   return result

On with my post, and the GC numbers that are more relevant if recursing through objects, but less relevant to the code I presented with value types.

I fired up perfmon.exe and started adding various counters, but the one where you see the most visible difference is in the .NET CLR Memory, specifically the “# Gen 0 Collections”. In the IronPython version, the generation 0 collections numbered around 4 million 30,000! On my C# implementation, there actually were no collections. Everything was done on the stack. To be fair to IronPython which has to box everything in PythonType, I changed the C# code to box everything in objects, like the following:

public static object fib(object num)

The boxed C# implementation averaged 22 seconds to run, and also had about 12 6,000 generation 0 garbage collections. It’s still nowhere near the 4 million 30,000 collections taken by IronPython.

Two good things I learned:
1) Be careful with recursion in IronPython, as those PythonType objects carry some baggage that can keep the GC pretty busy.
2) Perfmon is a very important part of your toolbox if you’re fixing performance problems in IronPython code, or just .NET in general.

And, just for those Python purists out there, running it in CPython averaged 3 minutes, 18 seconds. Yes, much faster than IronPython, much slower than C#, and I have no idea how to figure out why (I have to get back to you on the CPython numbers). I also made a C version, and it averaged 3.8 6.5 seconds to execute. Obviously no garbage collection, but mysteriously slower than the C# version. Hopefully nobody cares about the C version, but here it is in case you want to try it for yourself:

int fib(int num)
{
   if(num == 0)
      return 0;
   else if(num == 1)
      return 1;
   else
      return fib(num - 1) + fib(num - 2);
}

int main(void)
{
   int i;
   for(i = 0; i < 41; i++)
      printf("%i %i\r\n", i, fib(i));
   return 0;
}
Advertisements
  1. October 22, 2009 at 11:44 am

    The big question for real world performance is how representative of your production code is that benchmark? If C# is *really* fast (compared to Python / IronPython) on code paths that account for < 1% of your application time then it isn't particularly relevant…

    • loosexaml
      October 22, 2009 at 2:39 pm

      The majority of my production code isn’t using recursion at all, so in the big scheme of things, this isn’t all that relevant. But if I were writing some code that had a need for a lot of recursion for some reason, I’d probably avoid IronPython unless I can figure this out. It seems like a silly reason to eliminate a great language from my development toolbox.

  2. October 22, 2009 at 11:46 am

    However… your garbage collection point is well made and I’m sure that there is plenty that IronPython can do to improve on performance. (It should at least be as fast as CPython.)

    What version of IronPython were you using? If you weren’t using 2.6 can you try it with that and post the results – if you were using 2.6 can you compare with 2.0. I can post the results, and a link to this blog entry, on the IronPython list for the IPy team to see…

    • loosexaml
      October 22, 2009 at 3:12 pm

      I ran these tests under 2.6 on a testing server at work. No access to that right now, so I’m running these on my development machine.

      IPy 2.6 – 2 minutes, 6 seconds
      IPy 2.0 – 3 minutes, 20 seconds
      C# – 3.397 seconds

      While my dev machine runs the tests quite a bit faster, we’re still looking at several orders of magnitude slower than C#.

    • loosexaml
      October 23, 2009 at 1:52 pm

      After some profiling, it turns out the GC isn’t the only problem. It definitely kicks in if I’m doing a lot of recursion with complex types (which is the real-world problem). However, the bigger problem in this code comes from the variable and function lookups in dictionaries (like you said, a by-product of dynamic typing). I updated the post with my profiling data in case you’re interested.

  3. October 22, 2009 at 12:42 pm

    Thanks for the interesting post!

    By the way, pay attention that the IronPython code does 41 loops while C# does 40.

    I just tried that in IronRuby (V0.9.1). 41 loops took 11.5 minutes, 40 loops took less than 6 minutes.

    The code (it runs 41 times – from 0 to 40 including 40):
    def fib(num)
    if num == 0
    return 0
    elsif num == 1
    return 1
    else
    return fib(num-1) + fib(num-2)
    end
    end

    (0..40).each do |i|
    puts “#{i} #{fib(i)}”
    end

    Shay.

    • loosexaml
      October 22, 2009 at 3:23 pm

      Good catch. I updated my C# code to use <= 40, and the C# code executes in 5.5 seconds now. Still, it seems like could be dangerous to do recursion in dynamic languages.

  4. October 22, 2009 at 1:29 pm

    When I ran your code 10 times using the Python timeit module (under IronPython 2.6 on a Windows 7 VM) it took 1483 seconds for ten repetitions. That’s under 2 1/2 minutes per iteration – faster than CPython. I think you’re seeing an insanely slow first time because the code isn’t JITed.

    • loosexaml
      October 22, 2009 at 3:16 pm

      Isn’t the code going to be JITed on the first recursion? If a lot of this time was due to JITing, I’d expect it to be pretty bad just for a range of low numbers that won’t recurse as much, like range(0,10). But calculating fibs for 0 to 10 happens very quickly. It’s the higher numbers that have a lot of recursion where things start to really slow down. Do you think there more JITing going on after the first recursion?

    • October 22, 2009 at 8:38 pm

      These numbers would certainly be more interesting with startup time removed. The slow [Iron]Python startup time is certainly annoying, but it doesn’t seem relevant here, since you’re unlikely to be starting a new IronPython process just to do the recursion task.

  5. October 22, 2009 at 3:16 pm

    Right – but if IronPython 2.6 is in fact faster than CPython then it’s about par for the course for a dynamic language (and probably about twice as fast as Ruby for example).

    (By the way, all my comments are still awaiting moderation so no-one else can see them.)

    • loosexaml
      October 22, 2009 at 3:36 pm

      Sorry about that. Comments approved.

      And yes after trying a few more languages, I agree it’s on par for a dynamic language. CPython, Javsscript, Applescript, they all seem to do this on the order of minutes. Do you know of any other gotchas people might run into? Recursion seems to be the worst, but are there other programming techniques that you’ve seen work well in static-typed languages but perform poorly in dynamic languages?

      • October 22, 2009 at 5:54 pm

        I don’t think your raw performance numbers are particularly interesting (except to geeks like me who always like benchmarks).

        If you look at the Programming Languages Shootout (for example) it should come as no surprise that statically typed compiled languages tend to run faster than dynamically typed languages.

        In most programs you still may be database, network or I/O bound rather than computationally bound – and as I’ve seen work out practically at Resolver Systems, even where you are computationally bound there are huge wins to be had improving your algorithms rather than switching language.

        However I think your performance numbers may be interesting to the IronPython team – and even more so the data you gathered on garbage collection. I’m sure there are *still* plenty of opportunities for performance improvements in the DLR and Iron* languages even though it is something that has been concentrated on recently (it is nice to see the big improvement between IronPython 2.0 and 2.6 for example).

      • loosexaml
        October 23, 2009 at 9:15 am

        I’ll do a little more research and do a follow-up post. I did a cursory run in IPy 2.6 for CLR 4.0 Beta 2, and it performed a bit slower than either 2.0 or 2.6, so I’d really like to see what’s happening there.

        And yes, the iterative solution is ultimately much faster. If I switch to iteration, I get a sub-second response. This is fine for my contrived problem here of finding Fibonacci numbers, but in reality, my app is using recursion in a WPF app to find all child UIElement’s below my root. The iterative solution in this case is ugly, confusing to maintain, and not noticeably slower to the end user at this point, so in consideration of everything, recursion is still a sensible solution.

        Also, I’ve noticed that performance of my recursive fib() is much better (5x) in IPy console over using embedding IronPython in my app (basically using the sample from your book). My fib() function isn’t updating the UI, so I don’t really see why it would execute so much slower. I can probably dig through the code from the CodePlex project and figure out what they’re doing that I’m not, but maybe you’ve got some recommendations?

      • loosexaml
        October 23, 2009 at 10:31 am

        Actually, scratch that last part about it being slower. I had SetTrace set on my ScriptEngine instance…that’s causing a lot of extra work on every recursive call. If I take SetTrace off, it performs just like IPy.exe.

  6. October 22, 2009 at 3:18 pm

    Right, each iteration calls the function many times – I’d forgotten that. With IronPython 2.6 it should be compiled and JITed after the first few calls.

  7. Jan Rouvillain
    October 23, 2009 at 7:30 am

    I do not understand, why C is so slow. A guess could be the printf(…) call. It is normally very time consuming.

    Which C compiler are you using, and which paramters are given to the compiler?

    May be you have to specify the correct CPU type or 32 or 64 bit word alignment.

    • loosexaml
      October 23, 2009 at 8:17 am

      Using the Microsoft C compiler (cl.exe), no parameters. The printf call is made 40 times whereas the fib function is called once per loop in the first and second times (i=0 and 1), but after that, the fib function gets called many, many more times. For i=2, fib gets called 3 times, i=3, fib gets called 5 times, i=4, fib gets called 9 times, and so on. By i=10, fib is getting called 97 times. The fib call is made a total of 866,988,831 times throughout the life of this code, so the performance impact of printf is negated as the number of iterations increases. You’re welcome to try it for yourself on your favorite C compiler with other options.

      Of course, as a few people have pointed out, the best way to improve performance in this case is not to use recursion at all, which can work for finding Fibonacci numbers where there is an iterative solution. The iterative solution to this problem executes in under a second, even in a dynamic language.

      • October 23, 2009 at 8:21 am

        Well, for every recursive algorithm there is *always* an iterative solution. Recursion is a technique that can be elegant and expressive, so just because you don’t *have* to use it doesn’t mean you shouldn’t.

  8. l0r3nz1
    December 2, 2010 at 5:32 am

    hello,

    there is problem with garbage collector and ironpython. you have to free memory yourself with a: object.finalize and i think free memory is not compacted as in c#. finaly there is most and most use of memory.

    i like ironpython but i have no solution yet. have you one?

    • loosexaml
      January 19, 2011 at 11:48 pm

      I haven’t run into this problem. If I create a lot of objects, they do seem to get collected. Can you post an example of your code?

  9. l0r3nz1
    January 20, 2011 at 5:12 am

    hello,
    I had to change job, so i haven’t exemple of this code anymore. sorry.

    the problem was: i had to correct a code from another people, object where create and add in virtual memory. that wasn’t good.

    i had solve this problem with “destroying”, objects that program need not anymore.

    • Samuel Warren
      May 2, 2011 at 10:50 am

      This sounds like they were objects that had some reference to unmanaged code. If an object references any unmanaged code, then the references for the object are considered valid by the GC, and it doesn’t collect them.

  10. Anonymous
    January 28, 2012 at 1:57 am

    Here’s a waaayyyyy faster version of fibonacci sequence in python:
    def fib(a):
    x,y=0,1
    c=0
    while c<a:
    c=c+1
    x,y=y,x+y
    return x

    and for loop version:
    def fib(a):
    x,y=0,1
    for c in range(a):
    x,y=y,x+y
    return x

    if you use the recursion way, each new number's calculation time is slightly more than the sum of the calculation time of the two previous numbers. Moreover, python has limited recursion depth. The max recursion is about 995 times. So next time better use while or for loop. And with the for loop, I tried fib(1000) and got the answer in a flash…that is a fraction of a second. If you calculate fib(1000) with the recursion version, it will simply return an error.

    • loosexaml
      January 28, 2012 at 8:59 am

      Right, there is a penalty to the recursive method, and the iterative solutions (while & for) are much faster. However, the point here is understanding *why* the recursive version is so much slower in python than in C or C#. Dynamic languages incur a penalty for method and variable lookup, which an iterative solution will avoid.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: