Nuitka Release 0.5.11

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

The last release represented a significant change and introduced a few regressions, which got addressed with hot fix releases. But it also had a focus on cleaning up open optimization issues that were postponed in the last release.

New Features

  • The filenames of source files as found in the __file__ attribute are now made relative for all modes, not just standalone mode.

    This makes it possible to put data files along side compiled modules in a deployment. This solves Issue#170.

Bug Fixes

  • Local functions that reference themselves were not released. They now are.

    def someFunction():
        def f():
            f() # referencing 'f' in 'f' caused the garbage collection to fail.
    

    Recent changes to code generation attached closure variable values to the function object, so now they can be properly visited. This corrects Issue#45. Fixed in 0.5.10.1 already.

  • Python2.6: The complex constants with real or imaginary parts -0.0 were collapsed with constants of value 0.0. This became more evident after we started to optimize the complex built-in. Fixed in 0.5.10.1 already.

    complex(0.0, 0.0)
    complex(-0.0, -0.0) # Could be confused with the above.
    
  • Complex call helpers could leak references to their arguments. This was a regression. Fixed in 0.5.10.1 already.

  • Parameter variables offered as closure variables were not properly released, only the cell object was, but not the value. This was a regression. Fixed in 0.5.10.1 already.

  • Compatibility: The exception type given when accessing local variable values not initialized in a closure taking function, needs to be NameError and UnboundLocalError for accesses in the providing function. Fixed in 0.5.10.1 already.

  • Fix support for "venv" on systems, where the system Python uses symbolic links too. This is the case on at least on Mageia Linux. Fixed in 0.5.10.2 already.

  • Python3.4: On systems where long and Py_ssize_t are different (e.g. Win64) iterators could be corrupted if used by uncompiled Python code. Fixed in 0.5.10.2 already.

  • Fix, generator objects didn't release weak references to them properly. Fixed in 0.5.10.2 already.

  • Compatiblity: The __closure__ attributes of functions was so far not supported, and rarely missing. Recent changes made it easy to expose, so now it was added. This corrects Issue#45.

  • MacOS: A linker warning about deprecated linker option -s was solved by removing the option.

  • Compatibility: Nuitka was enforcing that the __doc__ attribute to be a string object, and gave a misleading error message. This check must not be done though, __doc__ can be any type in Python. This corrects Issue#177.

New Optimization

  • Variables that need not be shared, because the uses in closure taking functions were eliminated, no longer use cell objects.

  • The try/except and try/finally statements now both have actual merging for SSA, allowing for better optimization of code behind it.

    def f():
    
        try:
            a = something()
        except:
            return 2
    
        # Since the above exception handling cannot continue the code flow,
        # we do not have to invalidate the trace of "a", and e.g. do not have
        # to generate code to check if it's assigned.
        return a
    

    Since try/finally is used in almost all re-formulations of complex Python constructs this is improving SSA application widely. The uses of try/except in user code will no longer degrade optimization and code generation efficiency as much as they did.

  • The try/except statement now reduces the scope of tried block if possible. When no statement raised, already the handling was removed, but leading and trailing statements that cannot raise, were not considered.

    def f():
    
        try:
            b = 1
            a = something()
            c = 1
        except:
            return 2
    

    This is now optimized to.

       def f():
    
           b = 1
           try:
               a = something()
           except:
               return 2
           c = 1
    
    The impact may on execution speed may be marginal, but it is definitely
    going to improve the branch merging to be added later. Note that ``c``
    can only be optimized, because the exception handler is aborting.
    
  • The creation of code objects for standalone mode and now all code objects was creating a distinct filename object for every function in a module, despite them being same content. This was wasteful for module loading. Now it's done only once.

    Also, when having multiple modules, the code to build the run time filename used for code objects, was calling import logic, and doing lookups to find os.path.join again and again. These are now cached, speeding up the use of many modules as well.

Cleanups

  • Nuitka used to have "variable usage profiles" and still used them to decide if a global variable is written to, in which case, it stays away from doing optimization of it to built-in lookups, and later calls.

    The have been replaced by "global variable traces", which collect the traces to a variable across all modules and functions. While this is now only a replacement, and getting rid of old code, and basing on SSA, later it will also allow to become more correct and more optimized.

  • The standalone now queries its hidden dependencies from a plugin framework, which will become an interface to Nuitka internals in the future.

Testing

  • The use of deep hashing of constants allows us to check if constants become mutated during the run-time of a program. This allows to discover corruption should we encounter it.
  • The tests of CPython are now also run with Python in debug mode, but only on Linux, enhancing reference leak coverage.
  • The CPython test parts which had been disabled due to reference cycles involving compiled functions, or usage of __closure__ attribute, were reactivated.

Organizational

  • Since Google Code has shutdown, it has been removed from the Nuitka git mirrors.

Summary

This release brings exciting new optimization with the focus on the try constructs, now being done more optimal. It is also a maintenance release, bringing out compatibility improvements, and important bug fixes, and important usability features for the deployment of modules and packages, that further expand the use cases of Nuitka.

The git flow had to be applied this time to get out fixes for regression bug fixes, that the big change of the last release brought, so this is also to consolidate these and the other corrections into a full release before making more invasive changes.

The cleanups are leading the way to expanded SSA applied to global variable and shared variable values as well. Already the built-in detect is now based on global SSA information, which was an important step ahead.

Nuitka Release 0.5.10

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release has a focus on code generation optimization. Doing major changes away from "C++-ish" code to "C-ish" code, many constructs are now faster or got looked at and optimized.

Bug Fixes

  • Compatibility: The variable name in locals for the iterator provided to the generator expression should be .0, now it is.
  • Generators could leak frames until program exit, these are now properly freed immediately.

New Optimization

  • Faster exception save and restore functions that might be in-lined by the backend C compiler.

  • Faster error checks for many operations, where these errors are expected, e.g. instance attribute lookups.

  • Do not create traceback and locals dictionary for frame when StopIteration or GeneratorExit are raised. These tracebacks were wasted, as they were immediately released afterwards.

  • Closure variables to functions and parameters of generator functions are now attached to the function and generator objects.

  • The creation of functions with closure taking was accelerated.

  • The creation and destruction of generator objects was accelerated.

  • The re-formulation for in-place assignments got simplified and got faster doing so.

  • In-place operations of str were always copying the string, even if was not necessary. This corrects Issue#124.

    a += b # Was not re-using the storage of "a" in case of strings
    
  • Python2: Additions of int for Python2 are now even faster.

  • Access to local variable values got slightly accelerated at the expense of closure variables.

  • Added support for optimizing the complex built-in.

  • Removing unused temporary and local variables as a result of optimization, these previously still allocated storage.

Cleanup

  • The use of C++ classes for variable objects was removed. Closure variables are now attached as PyCellObject to the function objects owning them.
  • The use of C++ context classes for closure taking and generator parameters has been replaced with attaching values directly to functions and generator objects.
  • The indentation of code template instantiations spanning multiple was not in all cases proper. We were using emission objects that handle it new lines in code and mere list objects, that don't handle them in mixed forms. Now only the emission objects are used.
  • Some templates with C++ helper functions that had no variables got changed to be properly formatted templates.
  • The internal API for handling of exceptions is now more consistent and used more efficiently.
  • The printing helpers got cleaned up and moved to static code, removing any need for forward declaration.
  • The use of INCREASE_REFCOUNT_X was removed, it got replaced with proper Py_XINCREF usages. The function was once required before "C-ish" lifted the need to do everything in one function call.
  • The use of INCREASE_REFCOUNT got reduced. See above for why that is any good. The idea is that Py_INCREF must be good enough, and that we want to avoid the C function it was, even if in-lined.
  • The assertObject function that checks if an object is not NULL and has positive reference count, i.e. is sane, got turned into a preprocessor macro.
  • Deep hashes of constant values created in --debug mode, which cover also mutable values, and attempt to depend on actual content. These are checked at program exit for corruption. This may help uncover bugs.

Organizational

  • Speedcenter has been enhanced with better graphing and has more benchmarks now. More work will be needed to make it useful.
  • Updates to the Developer Manual, reflecting the current near finished state of "C-ish" code generation.

Tests

  • New reference count tests to cover generator expressions and their usage got added.
  • Many new construct based tests got added, these will be used for performance graphing, and serve as micro benchmarks now.
  • Again, more basic tests are directly executable with Python3.

Summary

This is the next evolution of "C-ish" coming to pass. The use of C++ has for all practical purposes vanished. It will remain an ongoing activity to clear that up and become real C. The C++ classes were a huge road block to many things, that now will become simpler. One example of these were in-place operations, which now can be dealt with easily.

Also, lots of polishing and tweaking was done while adding construct benchmarks that were made to check the impact of these changes. Here, generators probably stand out the most, as some of the missed optimization got revealed and then addressed.

Their speed increases will be visible to some programs that depend a lot on generators.

This release is clearly major in that the most important issues got addressed, future releases will provide more tuning and completeness, but structurally the "C-ish" migration has succeeded, and now we can reap the benefits in the coming releases. More work will be needed for all in-place operations to be accelerated.

More work will be needed to complete this, but it's good that this is coming to an end, so we can focus on SSA based optimization for the major gains to be had.

Nuitka progress 2014

Again, not much has happened publicly to Nuitka, except for some releases, so it's time to make a kind of status post, about the really exciting news there is, also looking back at 2014 for Nuitka, and forward of course.

I meant to post this basically since last year, but never got around to it, therefore the 2014 in the title.

SSA (Single State Assignment Form)

For a long, long time already, each release of Nuitka has worked towards enabling "SSA" usage in Nuitka. There is a component called "constraint collection", which is tasked with driving the optimization, and collecting variable traces.

Based on these traces, optimizations can be made. Having SSA or not, is (to me) the difference between Nuitka as a compiler, and Nuitka as an optimizing compiler.

The news is, SSA has carried the day, and is used throughout code generation for some time now, and gave minor improvements. It has been applied to the temporary and local variable values.

And currently, work is on the way to expand it to module and shared variables, which can get invalidated quite easily, as soon as unknown code is executed. An issue there is to identify all those spots reliably.

And this spring, we are finally going to see the big jump that is happening, once Nuitka starts to use that information to propagate things.

Still, right now, this code assigns to a local variable, then reads from it to return. But not much longer.

def f():
    a = 1
    return a

This is going to instantly give gains, and more importantly, will enable analysis, that leads to avoiding e.g. the creation of function objects for local functions, being able to in-line, etc.

Improved Code Generation

Previously, under the title "C-ish", Nuitka moved away from C++ based code generation to less C++ based code generated, and more C-ish code. This trend continues, and has lead to removing more code generation improvements.

The important change recently was to remove the usage of the blocking holdouts, the C++ classes used for local variables are closure taking, and release, and move those to be done manually.

This enabled special code generation for in-place operations, which are the most significant improvements of the upcoming release. These were held back on, as with C++ destructors doing the release, it's practically impossible to deal with values suddenly becoming illegal. Transfer of object ownership needs to be more fluid than could be presented to C++ objects.

Currently, this allows to speed up string in-place operations, which very importantly then, can avoid to memcpy large values potentially. And this is about catching up to CPython in this regard. After that, we will likely be able to expand it to cases where CPython could never do it, e.g. also int objects

Scalability

The scalability of Nuitka depends much on generated code size. With it being less stupid, the generated code is now not only faster, but definitely smaller, and with more optimization, it will only become more practical.

Removing the many C++ classes already gave the backend compiler an easier time. But we need to do more, to e.g. have generic parameter parsing, instead of specialized per function, and module exclusive constants should not be pre-created, but in the module body, when they are used.

Compatibility

There is not a whole lot to gain in the compatibility domain anymore. Nothing important certainly. But there are these minor things.

Cells for Closure

However, since we now use PyCell objects for closure, we could start and provide a real __closure__ value, that could even be writable. We could start supporting that easily.

Local Variable Storage

Currently, local variables use stack storage. Were we to use function object or frame object attached storage, we could provide frame locals that actually work. This may be as simple as to put those in an array on the stack and use the pointer to it.

Suddenly locals would become writable. I am not saying this is useful, just that it's possible to do this.

Performance

Graphs and Benchmarks

The work on automated performance graphs has made progress, and they are supposed to show up on Nuitka Speedcenter each time, master, develop or factory git branches change.

There currently is no structure to these graphs. There is no explanations or comments, and there is no trend indicators. All of which makes it basically useless to everybody except me. And even harder for me than necessary.

At least it's updated to latest Nikola, and uses PyGal for the graphics now, so it's easier to expand. The plan here, is to integrate with special pages from a Wiki, making it easy to provide comments.

Standalone

The standalone mode of Nuitka is pretty good, and as usual it continued to improve only.

The major improvements came from handling case collisions between modules and packages. One can have Module.py and module/__init__.py and they both are expected to be different, even on Windows, where filenames are case insenstive.

So, giving up on implib and similar, we finally have our own code to scan in a compatible way the file system, and make these determinations, whereas library code exposing functionality, doesn't handling all things in really the proper way.

Other Stuff

Funding

Nuitka receives some, bit not quite enough donations. There is no support from organizations like e.g. the PSF, and it seems I better not hold my breath for it. I will travel to Europython 2015, and would ask you to support me with that, it's going to be expensive.

In 2014, with donations, I bought a "Cubox i4-Pro", which is an ARM based machine with 4 cores, and 2GB RAM. Works from flash, and with the eSATA disk attached, it works nice for continous integration, which helps me a lot to deliver extremely high quality releases. It's pretty nice, except that when using all 4 cores, it gets too hot. So "systemd" to the rescue and just limited the Buildbot slave's service to use 3 cores of CPU maximum and now it runs stable.

Also with donations I bought a Terrabyte SSD, which I use on the desktop to speed up hosting the virtual machines, and my work in general.

And probably more important, the host of "nuitka.net" became a real machine with real hardware last year, and lots more RAM, so I can spare myself of optimizing e.g. MySQL for low memory usage. The monthly fee of that is substantial, but supported from your donations. Thanks a lot!

Collaborators

Things are coming along nicely. When I started out, I was fully aware that the project is something that I can do on my own if necessary, and that has not changed. Things are going slower than necessary though, but that's probably very typical.

But you can join and should do so now, just follow this link or become part of the mailing list and help me there with request I make, e.g. review posts of mine, test out things, pick up small jobs, answer questions of newcomers, you know the drill probably.

Nuitka is about to make break through progress. And you can be a part of it. Now.

Future

So, there is multiple things going on:

  • More "C-ish" code generation

    The next release is going to be more "C-ish" than before, and we can start to actually migrate to really "C" language. You can help out if you want to, this is fairly standard cleanups. Just pop up on the mailing list and say so.

    This prong of action is coming to a logical end. The "C-ish" project, while not planned from the outset, turns out to be a full success. Initially, I would not have started Nuitka, should I have faced the full complexity of code generation that there is now. So it was good to start with "C++", but it's a better Nuitka now.

  • More SSA usage

    The previous releases consolidated on SSA. A few missing optimizations were found, because SSA didn't realize things, which were then highlighted by code generation being too good, e.g. not using exception variables.

    We seem to have an SSA that can be fully trusted now, and while it can be substantially improved (e.g. the try/finally removes all knowledge, although it only needs to do a partial removing of knowledge for the finally block, not for afterwards at all), it will already allow for many nice things to happen.

    Once we take it to that next level, Nuitka will be able to speed up some things by much more than the factor it basically has provided for 2 years now, and it's probably going to happen before summer, or so I hope.

  • Value propagation

    Starting out with simple cases, Nuitka will forward propagate variable values, and start to eliminate variable usages entirely, where they are not needed.

    That will make many things much more compact, and faster at run time. We will then try and build "gates" for statements that they cannot pass, so we can e.g. optimize constant things outside of loops, that kind of thing.

When these 3 things come to term, Nuitka will make a huge step ahead. I look forward to demoing function call in-lining, or at least avoiding the argument parsing at EuroPython 2015, making direct calls, which will be way faster than normal calls.

From then on, a boatload of work remains. The infrastructure in place, still there is going to be plenty of work to optimize more and more things conretely.

Let me know, if you are willing to help. I really need that help to make things happen faster.

Nuitka Release 0.5.9

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is mostly a maintenance release, bringing out minor compatibility improvements, and some standalone improvements. Also new options to control the recursion into modules are added.

Bug Fixes

  • Compatibility: Checks for iterators were using PyIter_Check which is buggy when running outside of Python core, because it's comparing pointers we don't see. Replaced with HAS_ITERNEXT helper which compares against the pointer as extracting for a real non-iterator object.

    class Iterable:
        def __init__(self):
            self.consumed = 2
    
        def __iter__(self):
            return Iterable()
    
    iter(Iterable()) # This is suppose to raise, but didn't with Nuitka
    
  • Python3: Errors when creating class dictionaries raised by the __prepare__ dictionary (e.g. enum classes with wrong identifiers) were not immediately raised, but only by the type call. This was not observable, but might have caused issues potentially.

  • Standalone MacOS: Shared libraries and extension modules didn't have their DLL load paths updated, but only the main binary. This is not sufficient for more complex programs.

  • Standalone Linux: Shared libraries copied into the .dist folder were read-only and executing chrpath could potentially then fail. This has not been observed, but is a conclusion of MacOS fix.

  • Standalone: When freezing standard library, the path of Nuitka and the current directory remained in the search path, which could lead to looking at the wrong files.

Organizational

  • The getattr built-in is now optimized for compile time constants if possible, even in the presence of a default argument. This is more a cleanup than actually useful yet.
  • The calling of PyCFunction from normal Python extension modules got accelerated, especially for the no or single argument cases where Nuitka now avoids building the tuple.

New Features

  • Added the option --recurse-pattern to include modules per filename, which for Python3 is the only way to not have them in a package automatically.

  • Added the option --generate-c++-only to only generate the C++ source code without starting the compiler.

    Mostly used for debugging and testing coverage. In the later case we do not want the C++ compiler to create any binary, but only to measure what would have been used.

Organizational

  • Renamed the debug option --c++-only to --recompile-c++-only to make its purpose more clear and there now is --generate-c++-only too.

Tests

  • Added support for taking coverage of Nuitka in a test run on a given input file.
  • Added support for taking coverage for all Nuitka test runners, migrating them all to common code for searching.
  • Added uniform way of reporting skipped tests, not generally used yet.

Summary

This release marks progress towards having coverage testing. Recent releases had made it clear that not all code of Nuitka is actually used at least once in our release tests. We aim at identifying these.

Another direction was to catch cases, where Nuitka leaks exceptions or is subject to leaked exceptions, which revealed previously unnoticed errors.

Important changes have been delayed, e.g. the closure variables will not yet use C++ objects to share storage, but proper PyCellObject for improved compatibility, and to approach a more "C-ish" status. These is unfinished code that does this. And the forward propagation of values is not enabled yet again either.

So this is an interim step to get the bug fixes and improvements accumulated out. Expect more actual changes in the next releases.

Nuitka Release 0.5.8

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release has mainly a focus on cleanups and compatibility improvements. It also advances standalone support, and a few optimization improvements, but it mostly is a maintenance release, attacking long standing issues.

Bug Fixes

  • Compatibility Windows MacOS: Fix importing on case insensitive systems.

    It was not always working properly, if there was both a package Something and something, by merit of having files Something/__init__.py and something.py.

  • Standalone: The search path was preferring system directories and therefore could have conflicting DLLs. Issue#144.

  • Fix, the optimization of getattr with predictable result was crashing the compilation. This was a regression, fixed in 0.5.7.1 already.

  • Compatibility: The name mangling inside classes also needs to be applied to global variables.

  • Fix, proving clang++ for CXX was mistakingly thinking of it as a g++ and making version checks on it.

  • Python3: Declaring __class__ global is now a SyntaxError before Python3.4.

  • Standalone Python3: Making use of module state in extension modules was not working properly.

New Features

  • The filenames of source files as found in the __file__ attribute are now made relative in standalone mode.

    This should make it more apparent if things outside of the distribution folder are used, at the cost of tracebacks. Expect the default ability to copy the source code along in an upcoming release.

  • Added experimental standalone mode support for PyQt5. At least headless mode should be working, plug-ins (needed for anything graphical) are not yet copied and will need more work.

Cleanup

  • No longer using imp.find_module anymore. To solve the casing issues we needed to make our own module finding implementation finally.
  • The name mangling was handled during code generation only. Moved to tree building instead.
  • More code generation cleanups. The compatible line numbers are now attached during tree building and therefore better preserved, as well as that code no longer polluting code generation as much.

Organizational

  • No more packages for openSUSE 12.1/12.2/12.3 and Fedora 17/18/19 as requested by the openSUSE Build Service.
  • Added RPM packages for Fedora 21 and CentOS 7 on openSUSE Build Service.

Tests

  • Lots of test refinements for the CPython test suites to be run continuously in Buildbot for both Windows and Linux.

Summary

This release brings about two major changes, each with the risk to break things.

One is that we finally started to have our own import logic, which has the risk to cause breakage, but apparently currently rather improved compatibility. The case issues were not fixable with standard library code.

The second one is that the __file__ attributes for standalone mode is now no longer pointing to the original install and therefore will expose missing stuff sooner. This will have to be followed up with code to scan for missing "data" files later on.

For SSA based optimization, there are cleanups in here, esp. the one removing the name mangling, allowing to remove special code for class variables. This makes the SSA tree more reliable. Hope is that the big step (forward propagation through variables) can be made in one of the next releases.

Article about Nuitka Standalone Mode

There is a really well written article about Nuitka written by Tom Sheffler.

It inspired me to finally become clean with __file__ attributes in standalone mode. Currently it points to where your source was when things were compiled. In the future (in standalone mode, for accelerated mode that continues to be good), it will point into the .dist folder, so that the SWIG workaround may become no longer necessary.

Thanks Tom for sharing your information, and good article.

Yours, Kay

Nuitka Release 0.5.7

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is brings a newly supported platform, bug fixes, and again lots of cleanups.

Bug Fixes

  • Fix, creation of dictionary and set literals with non-hashable indexes did not raise an exception.

    {[]: None} # This is now a TypeError
    

New Optimization

  • Calls to the dict built-in with only keyword arguments are now optimized to mere dictionary creations. This is new for the case of non-constant arguments only of course.

    dict(a = b, c = d)
    # equivalent to
    {"a" : b, "c" : d}
    
  • Slice del with indexable arguments are now using optimized code that avoids Python objects too. This was already done for slice look-ups.

  • Added support for bytearray built-in.

Organizational

  • Added support for OpenBSD with fiber implementation from library, as it has no context support.

Cleanups

  • Moved slicing solutions for Python3 to the re-formulation stage. So far the slice nodes were used, but only at code generation time, there was made a distinction between Python2 and Python3 for them. Now these nodes are purely Python2 and slice objects are used universally for Python3.

Tests

  • The test runners now have common code to scan for the first file to compile, an implementation of the search mode. This will allow to introduce the ability to search for pattern matches, etc.
  • More tests are directly executable with Python3.
  • Added recurse_none mode to test comparison, making using extra options for that purpose unnecessary.

Summary

This solves long standing issues with slicing and subscript not being properly distinguished in the Nuitka code. It also contains major bug fixes that really problematic. Due to the involved nature of these fixes they are made in this new release.

Nuitka Release 0.5.6

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release brings bug fixes, important new optimization, newly supported platforms, and important compatibility improvements. Progress on all fronts.

Bug Fixes

  • Closure taking of global variables in member functions of classes that had a class variable of the same name was binding to the class variable as opposed to the module variable.

  • Overwriting compiled function's __doc__ attribute more than once could corrupt the old value, leading to crashes. Issue#156. Fixed in 0.5.5.2 already.

  • Compatibility Python2: The exec statement execfile were changing locals() was given as an argument.

    def function():
       a = 1
    
       exec code in locals() # Cannot change local "a".
       exec code in None     # Can change local "a"
       exec code
    

    Previously Nuitka treated all 3 variants the same.

  • Compatibility: Empty branches with a condition were reduced to only the condition, but they need in fact to also check the truth value:

    if condition:
        pass
    # must be treated as
    bool(condition)
    # and not (bug)
    condition
    
  • Detection of Windows virtualenv was not working properly. Fixed in 0.5.5.2 already.

  • Large enough constants structures are now unstreamed via marshal module, avoiding large codes being generated with no point. Fixed in 0.5.5.2 already.

  • Windows: Pressing CTRL-C gave two stack traces, one from the re-execution of Nuitka which was rather pointless. Fixed in 0.5.5.1 already.

  • Windows: Searching for virtualenv environments didn't terminate in all cases. Fixed in 0.5.5.1 already.

  • During installation from PyPI with Python3 versions, there were errors given for the Python2 only scons files. Issue#153. Fixed in 0.5.5.3 already.

  • Fix, the arguments of yield from expressions could be leaked.

  • Fix, closure taking of a class variable could have in a sub class where the module variable was meant.

    var = 1
    
    class C:
       var = 2
    
       class D:
          def f():
             # was C.var, now correctly addressed top level var
             return var
    
  • Fix, setting CXX environment variable because the installed gcc has too low version, wasn't affecting the version check at all.

  • Fix, on Debian/Ubuntu with hardening-wrapper installed the version check was always failing, because these report a shortened version number to Scons.

New Optimization

  • Local variables that must be assigned also have no side effects, making use of SSA. This allows for a host of optimization to be applied to them as well, often yielding simpler access/assign code, and discovering in more cases that frames are not necessary.
  • Micro optimization to dict built-in for simpler code generation.

Organizational

  • Added support for ARM "hard float" architecture.
  • Added package for Ubuntu 14.10 for download.
  • Added package for openSUSE 13.2 for download.
  • Donations were used to buy a Cubox-i4 Pro. It got Debian Jessie installed on it, and will be used to run an even larger amount of tests.
  • Made it more clear in the user documentation that the .exe suffix is used for all platforms, and why.
  • Generally updated information in user manual and developer manual about the optimization status.
  • Using Nikola 7.1 with external filters instead of our own, outdated branch for the web site.

Cleanups

  • PyLint clean for the first time ever. We now have a Buildbot driven test that this stays that way.
  • Massive indentation cleanup of keyword argument calls. We have a rule to align the keywords, but as this was done manually, it could easily get out of touch. Now with a "autoformat" tool based on RedBaron, it's correct. Also, spacing around arguments is now automatically corrected. More to come.
  • For exec statements, the coping back to local variables is now an explicit node in the tree, leader to cleaner code generation, as it now uses normal variable assignment code generation.
  • The MaybeLocalVariables became explicit about which variable they might be, and contribute to its SSA trace as well, which was incomplete before.
  • Removed some cases of code duplication that were marked as TODO items. This often resulted in cleanups.
  • Do not use replaceWith on child nodes, that potentially were re-used during their computation.

Summary

The release is mainly the result of consolidation work. While the previous release contained many important enhancements, this is another important step towards full SSA, closing one loop whole (class variables and exec functions), as well as applying it to local variables, largely extending its use.

The amount of cleanups is tremendous, in huge part due to infrastructure problems that prevented release repeatedly. This reduces the technological debt very much.

More importantly, it would appear that now eliminating local and temporary variables that are not necessary is only a small step away. But as usual, while this may be easy to implement now, it will uncover more bugs in existing code, that we need to address before we continue.

Nuitka Release 0.5.5

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is finally making full use of SSA analysis knowledge for code generation, leading to many enhancements over previous releases.

It also adds support for Python3.4, which has been longer in the making, due to many rather subtle issues. In fact, even more work will be needed to fully solve remaining minor issues, but these should affect no real code.

And then there is much improved support for using standalone mode together with virtualenv. This combination was not previously supported, but should work now.

New Features

  • Added support for Python3.4

    This means support for clear method of frames to close generators, dynamic __qualname__, affected by global statements, tuples as yield from arguments, improved error messages, additional checks, and many more detail changes.

New Optimization

  • Using SSA knowledge, local variable assignments now no longer need to check if they need to release previous values, they know definitely for the most cases.

    def f():
        a = 1 # This used to check if old value of "a" needs a release
        ...
    
  • Using SSA knowledge, local variable references now no longer need to check for raising exceptions, let alone produce exceptions for cases, where that cannot be.

    def f():
        a = 1
        return a # This used to check if "a" is assigned
    
  • Using SSA knowledge, local variable references now are known if they can raise the UnboundLocalError exception or not. This allows to eliminate frame usages for many cases. Including the above example.

  • Using less memory for keeping variable information.

  • Also using less memory for constant nodes.

Bug Fixes

  • The standalone freezing code was reading Python source as UTF-8 and not using the code that handles the Python encoding properly. On some platforms there are files in standard library that are not encoded like that.

  • The fiber implementation for Linux amd64 was not working with glibc from RHEL 5. Fixed to use now multiple int to pass pointers as necessary. Also use uintptr_t instead of intprt_t to transport pointers, which may be more optimal.

  • Line numbers for exceptions were corrupted by with statements due to setting line numbers even for statements marked as internal.

  • Partial support for win32com by adding support for its hidden __path__ change.

  • Python3: Finally figured out proper chaining of exceptions, given proper context messages for exception raised during the handling of exceptions.

  • Corrected C++ memory leak for each closure variable taken, each time a function object was created.

  • Python3: Raising exceptions with tracebacks already attached, wasn't using always them, but producing new ones instead.

  • Some constants could cause errors, as they cannot be handled with the marshal module as expected, e.g. (int,).

  • Standalone: Make sure to propagate sys.path to the Python instance used to check for standard library import dependencies. This is important for virtualenv environments, which need site.py to set the path, which is not executed in that mode.

  • Windows: Added support for different path layout there, so using virtualenv should work there too.

  • The code object flag "optimized" (fast locals as opposed to locals dictionary) for functions was set wrongly to value for the parent, but for frames inside it, one with the correct value. This lead to more code objects than necessary and false co_flags values attached to the function.

  • Options passed to nuitka-python could get lost.

    nuitka-python program.py argument1 argument2 ...
    

    The above is supposed to compile program.py, execute it immediately and pass the arguments to it. But when Nuitka decides to restart itself, it would forget these options. It does so to e.g. disable hash randomization as it would affect code generation.

  • Raising tuples exception as exceptions was not compatible (Python2) or reference leaking (Python3).

Tests

  • Running 2to3 is now avoided for tests that are already running on both Python2 and Python3.
  • Made XML based optimization tests work with Python3 too. Previously these were only working on Python2.
  • Added support for ignoring messages that come from linking against self compiled Pythons.
  • Added test case for threaded generators that tortures the fiber layer a bit and exposed issues on RHEL 5.
  • Made reference count test of compiled functions generic. No more code duplication, and automatic detection of shared stuff. Also a more clear interface for disabling test cases.
  • Added Python2 specific reference counting tests, so the other cases can be executed with Python3 directly, making debugging them less tedious.

Cleanups

  • Really important removal of "variable references". They didn't solve any problem anymore, but their complexity was not helpful either. This allowed to make SSA usable finally, and removed a lot of code.
  • Removed special code generation for parameter variables, and their dedicated classes, no more needed, as every variable access code is now optimized like this.
  • Stop using C++ class methods at all. Now only the destructor of local variables is actually supposed to do anything, and their are no methods anymore. The unused var_name got removed, setVariableValue is now done manually.
  • Moved assertions for the fiber layer to a common place in the header, so they are executed on all platforms in debug mode.
  • As usual, also a bunch of cleanups for PyLint were applied.
  • The locals built-in code now uses code generation for accessing local variable values instead having its own stuff.

Organizational

  • The Python version 3.4 is now officially supported. There are a few problems open, that will be addressed in future releases, none of which will affect normal people though.

  • Major cleanup of Nuitka options.

    • Windows specific stuff is now in a dedicated option group. This includes options for icon, disabling console, etc.
    • There is now a dedicated group for controlling backend compiler choices and options.
  • Also pickup g++44 automatically, which makes using Nuitka on CentOS5 more automatic.

Summary

This release represents a very important step ahead. Using SSA for real stuff will allow us to build the trust necessary to take the next steps. Using the SSA information, we could start implementing more optimizations.

Nuitka shaping up

Not much has happened publicly to Nuitka, so it's time to make a kind of status post, about the exciting news there is.

SSA (Single State Assignment Form)

For a long, long time already, each release of Nuitka has worked towards enabling "SSA" usage in Nuitka. There is a component called "constraint collection", which is tasked with driving the optimization, and collecting variable traces.

Based on these traces, optimizations could be made. Having SSA or not, is (to me) the difference between Nuitka as a compiler, and Nuitka as an optimizing compiler.

The news is, SSA is shaping up, and will be used in the next release. Not yet to drive variable based optimization (reserved for a release after it), but to aid the code generation to avoid useless checks.

Improved Code Generation

Previously, under the title "C-ish", Nuitka moved away from C++ based code generation to less C++ based code generated, and more C-ish code. This trend continues, and has lead to removing even more code cleanups.

The more important change is from the SSA derived knowledge. Now Nuitka knows that a variable must be assigned, cannot be assigned, may be assigned, based on its SSA traces.

Lets check out an example:

def f():
    a = 1
    return a

Nevermind, that obviously the variable a can be removed, and this could be transformed to statically return 1. That is the next step (and easy if SSA is working properly), now we are looking at what changed now.

This is code as generated now, with current 0.5.5pre5:

tmp_assign_source_1 = const_int_pos_1;
assert( var_a.object == NULL );
var_a.object = INCREASE_REFCOUNT( tmp_assign_source_1 );

tmp_return_value = var_a.object;

Py_INCREF( tmp_return_value );
goto function_return_exit;

There are some things, wrong with it still. For one, var_a is still a C++ object, which we directly access. But the good thing is, we can assert that it starts out uninitialized, before we overwrite it. The stable release as of now, 0.5.4, generates code like this:

tmp_assign_source_1 = const_int_pos_1;
if (var_a.object == NULL)
{
    var_a.object = INCREASE_REFCOUNT( tmp_assign_source_1 );
}
else
{
    PyObject *old = var_a.object;
    var_a.object = INCREASE_REFCOUNT( tmp_assign_source_1 );
    Py_DECREF( old );
}
static PyFrameObject *cache_frame_function = NULL;
MAKE_OR_REUSE_FRAME( cache_frame_function, codeobj_4e03e5698a52dd694c5c263550d71551, module___main__ );
PyFrameObject *frame_function = cache_frame_function;

// Push the new frame as the currently active one.
pushFrameStack( frame_function );

// Mark the frame object as in use, ref count 1 will be up for reuse.
Py_INCREF( frame_function );
assert( Py_REFCNT( frame_function ) == 2 ); // Frame stack

// Framed code:
tmp_return_value = var_a.object;

if ( tmp_return_value == NULL )
{

    exception_type = INCREASE_REFCOUNT( PyExc_UnboundLocalError );
    exception_value = UNSTREAM_STRING( &constant_bin[ 0 ], 47, 0 );
    exception_tb = NULL;

    frame_function->f_lineno = 4;
    goto frame_exception_exit_1;
}

Py_INCREF( tmp_return_value );
goto frame_return_exit_1;

As you can see, the assignment to var_a.object was checking if it were NULL, and if were not (which we now statically know), would release the old value. Next up, before returning, the value of var_a.object needed to be checked, if it were NULL, in which case, we would need to create a Python exception, and in order to do so, we need to create a frame object, that even if cached, consumes time, and code size.

So, that is the major change to code generation. The SSA information is now used in it, and doing so, has found a bunch of issues, in how it is built, in e.g. nested branches, that kind of stuff.

The removal of local variables as C++ classes, and them managed as temporary variables, is going to happen in a future release, reducing code complexity further. Were a a temporary variable, already, the Py_INCREF which implies a later Py_DECREF on the constant 1 could be totally avoided.

Scalability

The scalability of Nuitka hinges much of generated code size. With it being less stupid, the generated code is now not only faster, but definitely smaller, and with more optimization, it will only become more practical.

Compatibility

Python2 exec statements

A recent change in CPython 2.7.8+ which is supposed to become 2.7.9 one day, highlighted an issue with exec statements in Nuitka. These were considered to be fully compatible, but apparently are not totally.

def f():
   exec a in b, c
   exec(a, b, c)

The above two are supposed to be identical. So far this was rectified at run time of CPython, but apparently the parser is now tasked with it, so Nuitka now sees exec a in b, c for both lines. Which is good.

However, as it stands, Nuitka handles exec in locals() the same as exec in None for plain functions (OK to classes and modules), which is totally a bug.

I have been working on an enhanced re-formulation (it needs to be tracked if the value was None, and then the sync back to locals from the provided dictionary ought to be done. But the change breaks execfile in classes, which was implemented piggy-backing on exec, and now requires locals to be a dictionary, and immediately written to.

Anyway, consider exec as well working already. The non-working cases are really corner cases, obviously nobody came across so far.

Python3 classes

Incidentally, that execfile issue will be solved as soon as a bug is fixed, that was exposed by new abilities of Python3 metaclasses. They were first observed in Python3.4 enum classes.

class MyEnum(enum):
   red  = 1
   blue = 2
   red  = 3 # error

Currently, Nuitka is delaying the building of the dictionary (absent execfile built-in), and that is not allowed, in fact, immediate writes to the mapping giving by __prepare__ of the metaclass will be required, in which case, the enum class can raise an error for the second assignment to red.

So that area now hinges on code generation to learn different local variable codes for classes, centered around the notion of using the locals dictionary immediately.

Python3.4

The next release is no longer warning you if you use Python3.4, as many of the remaining problems have been sorted out. Many small things were found, and in some cases these highlighted general Python3 problems.

Nuitka for Python3 is not yet all that much in the focus in terms of performance, but correctness will have become much better, with most prominently, exception context being now correct most often.

The main focus of Nuitka is Python2, but to Nuitka the incompatibility of Python3 is largely not all that much an issue. The re-formulations to lower level operations for just about everything means that for the largest part there is not much trouble in supporting a mostly only slightly different version of Python.

The gain is mostly in that new tests are added in new releases, and these sometimes find things that affect Nuitka in all versions, or at least some others. And this could be a mere reference leak.

Consider this:

try:
   raise (TypeError, ValueError)
except TypeError:
   pass

So, that is working with Python2, but comes from a Python3 test. Python2 is supposed to unwrap the tuple and take the first argument and raise that. It didn't do that so far. Granted, obscure feature, but still an incompatibility. For Python3, a TypeError should be raised complaining that tuple is not derived from BaseException.

Turned out, that also, in that case, a reference leak occurs, in that the wrong exception was not released, and therefore memory leaked. Should that happen a lot during a programs live, it will potentially become an issue, as it keeps frames on the traceback also alive.

So this lead to a compatibility fix and a reference leak fix. And it was found by the Python3.4 suite, checking that exception objects are properly released, and that the proper kind of exception is raised in the no longer supported case.

Performance

Graphs and Benchmarks

I had been working on automated performance graphs, and they are supposed to show up on Nuitka Speedcenter already, but currently it's broken and outdated.

Sad state of affairs. Reasons include that I found it too ugly to publish unless updated to latest Nikola, for which I didn't take the time. I intend to fix it, potentially before the release though.

Incremental Assignments

Consider the following code:

a += "bbb"

If a is a str, and if (and only if), it's the only reference being held, then CPython, reuses the object, instead of creating a new object and copying a over. Well, Nuitka doesn't do this. This is despite the problem being known for quite some time.

With SSA in place, and "C-ish" code generation complete, this will be solved, but I am not going to solve this before.

Standalone

The standalone mode of Nuitka is pretty good, and in the pre-release it was again improved. For instance, virtualenv and standalone should work now, and more modules are supported.

However, there are known issues with win32com and a few other packages, which need to be debugged. Mostly these are modules doing nasty things that make Nuitka not automatically detect imports.

This has as usual only so much priority from me. I am working on this on some occasions, as kind of interesting puzzles to solve. Most of the time, it just works though, with wxpython being the most notable exception. I am going to work on that though.

The standalone compilation exhibits scalability problems of Nuitka the most, and while it has been getting better, the recent and future improvements will lead to smaller code, which in turn means not only smaller executables, but also faster compilation. Again, wxpython is a major offender there, due to its many constants, global variables, etc. in the bindings, while Qt, PySide, and GTK are apparently already good.

Other Stuff

Funding

Nuitka doesn't receive enough donations. There is no support from organizations like e.g. the PSF, which recently backed several projects by doubling donations given to them.

I remember talking to a PSF board member during Europython 2013 about this, and the reaction was fully in line with the Europython 2012 feedback towards me from the dictator. They wouldn't help Nuitka in any way before it is successful.

I have never officially applied for help with funding though with them. I am going to choose to take pride in that, I suppose.

Collaborators

My quest to find collaborators to Nuitka is largely failing. Aside from the standalone mode, there have been too little contributions. Hope is that it will change in the future, once the significant speed gains arrive. And it might be my fault for not asking for help more, and to arrange myself with that state of things.

Not being endorsed by the Python establishment is clearly limiting the visibility of the project.

Anyway, things are coming along nicely. When I started out, I was fully aware that the project is something that I can do on my own if necessary, and that has not changed. Things are going slower than necessary though, but that's probably very typical.

But you can join now, just follow this link or become part of the mailing list and help me there with request I make, e.g. review posts of mine, test out things, pick up small jobs, answer questions of newcomers, you know the drill probably.

Future

So, there is multiple things going on:

  • More "C-ish" code generation

    The next release is going to be more "C-ish" than before, generating less complex code than before, and removes the previous optimizations, which were a lot of code, to e.g. detect parameter variables without del statements.

    This prong of action will have to continue, as it unblocks further changes that lead to more compatibility and correctness.

  • More SSA usage

    The next release did and will find bugs in the SSA tracing of Nuitka. It is on purpose only using it, to add assert statements to things it now no longer does. These will trigger in tests or cause crashes, which then can be fixed.

    We better know that SSA is flawless in its tracking, before we use it to make optimizations, which then have no chance to assert anything at all anymore.

    Once we take it to that next level, Nuitka will be able to speed up some things by more than the factor it basically has provided for 2 years now, and it's probably going to happen this year.

  • More compatibility

    The new exec code makes the dictionary synchronization explicit, and e.g. now it is optimized away to even check for its need, if we are in a module or a class, or if it can be known.

    That means faster exec, but more importantly, a better understood exec, with improved ability to do SSA traces for them. Being able to in-line them, or to know the limit of their impact, as it will help to know more invariants for that code.

When these 3 things come to term, Nuitka will be a huge, huge step ahead towards being truly a static optimizing compiler (so far it is mostly only peep hole optimization, and byte code avoidance). I still think of this as happening this year.