16 April 2011

Looking where Nuitka stands

In case you wonder, [what Nuitka is](/pages/overview.html), look here. Over the 0.3.x release cycle, I have mostly looked at its performance with “pystone”. I merely wanted to have a target to look at and enjoy the progress we have made there.

In the context of the Windows port then, Khalid Abu Bakr used the pybench on Windows and that got me interested. It’s a nice collection of micro benchmarks, which is quite obviously aimed for looking CPython implementations only. In that it’s quite good to check where Nuitka is good at, and where it can still take improvements for the milestone 2 stuff.

Enhancements to PyBench

  • The pybench refused to accept that Nuitka could use so little time on some tests, I needed to hack it to allow it.

  • Then it had “ZeroDivisionError” exceptions, because Nuitka can run fully predictable code not at all, thus with a time of 0ms, which gives interesting factors.

  • Also these are many results, we are going to care for regressions only, so there is an option now to output only tests with negative values.

The Interesting Parts

  • Nuitka currently has some fields where optimizations are already so effective as to render the whole benchmark pointless. Longterm, most of PyBench will not be looked at anymore, where the factor becomes “infinity”, there is little point in looking at it. We will likely just use it as a test that optimizations didn’t suddenly regress. Publishing the numbers will not be as interesting.

  • Then there are slow downs. These I take seriously, because of course I expect that Nuitka shall only be faster than CPython. Sometimes the implementation of Nuitka for some rarely used features is sub par though. I color coded these in red in the table below.

  • ComplexPythonFunctionCalls: These are twice as slow, which is an tribute to the fact, that the code in this domain is only as good as it needs to be. Of course function calls are very important, and this needs to be addressed.

  • TryRaiseExcept: This is much slower because of the cost of the raise statement, which is extremely high currently. For every raise, a frame object with a specific code object is created, so the traceback will point to the correct location. This is very inefficient, and wasteful. We need to be able to create code objects that can be used for all lines needed, and then we can re-use it and only have one frame object per function, which then can be re-used itself. There is already some work for that in [current git](/doc/download.html) (0.3.9 pre 2), but it’s not yet complete at all.

  • WithRaiseExcept: Same problem as TryRaiseExcept, the exception raising is too expensive.

  • Note also that -90% is in fact much worse that +90%, the “diff” numbers from pybench make improvements look much better than regressions do. You can also checkout the comparison on the new [benchmark pages](https://speedcenter.nuitka.net) that I am just creating, they are based on codespeed, which I will blog upon separately.

Look at this table of results as produced by pybench:

Benchmark Results

**Test Name** **min CPython** **min Nuitka** **diff**
BuiltinFunctionCalls 76ms 54ms +41.0%
BuiltinMethodLookup 57ms 47ms +22.1%
CompareFloats 79ms 0ms +inf%
CompareFloatsIntegers 75ms 0ms +inf%
CompareIntegers 76ms 0ms +inf%
CompareInternedStrings 68ms 32ms +113.0%
CompareLongs 60ms 0ms +inf%
CompareStrings 86ms 62ms +38.2%
CompareUnicode 61ms 50ms +21.9%
ComplexPythonFunctionCalls 86ms 179ms -52.3%
ConcatStrings 98ms 99ms -0.6%
ConcatUnicode 127ms 124ms +2.3%
CreateInstances 76ms 52ms +46.8%
CreateNewInstances 58ms 47ms +22.1%
CreateStringsWithConcat 85ms 90ms -6.5%
CreateUnicodeWithConcat 74ms 68ms +9.5%
DictCreation 58ms 36ms +60.9%
DictWithFloatKeys 67ms 44ms +51.7%
DictWithIntegerKeys 64ms 30ms +113.8%
DictWithStringKeys 60ms 26ms +130.6%
ForLoops 47ms 15ms +216.2%
IfThenElse 67ms 16ms +322.5%
ListSlicing 69ms 70ms -0.9%
NestedForLoops 72ms 25ms +187.4%
NestedListComprehensions 87ms 42ms +105.9%
NormalClassAttribute 62ms 77ms -18.9%
NormalInstanceAttribute 56ms 24ms +129.7%
PythonFunctionCalls 72ms 34ms +116.1%
PythonMethodCalls 84ms 38ms +120.0%
Recursion 97ms 56ms +73.1%
SecondImport 61ms 47ms +31.6%
SecondPackageImport 66ms 29ms +125.4%
SecondSubmoduleImport 86ms 32ms +172.0%
SimpleComplexArithmetic 74ms 62ms +18.3%
SimpleDictManipulation 65ms 35ms +89.7%
SimpleFloatArithmetic 77ms 56ms +39.3%
SimpleIntFloatArithmetic 58ms 39ms +48.3%
SimpleIntegerArithmetic 59ms 37ms +57.7%
SimpleListComprehensions 75ms 33ms +128.7%
SimpleListManipulation 57ms 27ms +109.4%
SimpleLongArithmetic 68ms 57ms +19.9%
SmallLists 69ms 41ms +66.6%
SmallTuples 66ms 98ms -32.2%
SpecialClassAttribute 63ms 49ms +29.1%
SpecialInstanceAttribute 130ms 24ms +434.5%
StringMappings 67ms 62ms +8.5%
StringPredicates 69ms 59ms +16.6%
StringSlicing 73ms 47ms +54.8%
TryExcept 57ms 0ms +3821207.1%
TryFinally 65ms 26ms +153.4%
TryRaiseExcept 64ms 610ms -89.5%
TupleSlicing 76ms 67ms +12.7%
UnicodeMappings 88ms 91ms -2.9%
UnicodePredicates 64ms 59ms +8.8%
UnicodeProperties 69ms 63ms +8.8%
UnicodeSlicing 80ms 68ms +17.6%
WithFinally 84ms 26ms +221.2%
WithRaiseExcept 67ms 1178ms -94.3%