An open letter to Guido van Rossum: Mr Rossum, tear down that GIL!

Subject: An open letter to Guido van Rossum: Mr Rossum, tear down that GIL!
From: Juergen Brendel
Date: 27 Jun 2015

Dear Guido,

I really enjoy the Python programming language. Coming from a C/C++/Java world, I have found new levels of productivity with it. Executable pseudo code, great library support, dynamic typing, emphasis on well structured and readable code – this is all very much to my liking. We use it here at SnapLogic to great effect, and many other organizations have similarly positive experiences with it. Thank you for giving us Python!

Sadly, there is one aspect of the language, which is beginning to bother me more and more. And I know I’m not the first one to point it out. A strange design choice, which leaves the Python interpreter – by design – crippled on modern hardware. You know what I’m talking about, Guido, don’t you? That’s right, I’m talking about the GIL, the Global Interpreter Lock. And as I am about to start rambling on here, please don’t take offense Guido, because none is meant.

For those who are not familiar with the issue: The GIL is a single lock inside of the Python interpreter, which effectively prevents multiple threads from being executed in parallel, even on multi-core or multi-CPU systems! You can find more information here. But just to quote the essentials from that page:

In order to support multi-threaded Python programs, there’s a global lock that must be held by the current thread before it can safely access Python objects. … only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions.

Effectively, this means that all access to Python objects is serialized, no matter how many threads you have in your program, and no matter how many CPUs or cores you have in your hardware! Python has a really easy to use threading API, which makes multi-threaded programming quite painless. Sadly, the Python interpreter itself makes it impossible for those threads to properly take advantage of the hardware which is common these days.

Hardware vendors, such as Intel and AMD, have long recognized that the only way to continue to move CPU speeds forward is the increased parallelization, and have added more and more cores to their CPUs. You acknowledge yourself, Guido, multiple cores become common even in laptops. Here is what you wrote about all of this in your Python 3000 FAQ:

Q. Multi-core processors will be standard even on laptops in the near future. Is Python 3.0 going to get rid of the GIL (Global Interpreter Lock) in order to be able to benefit from this feature?

A. No. We’re not changing the CPython implementation much. Getting rid of the GIL would be a massive rewrite of the interpreter because all the internal data structures (and the reference counting operations) would have to be made thread-safe. This was tried once before (in the late ’90s by Greg Stein) and the resulting interpreter ran twice as slow. If you have multiple CPUs and you want to use them all, fork off as many processes as you have CPUs. (You write your web application to be easily scalable, don’t you? So if you can run several copies on different boxes it should be trivial to run several copies on the same box as well.) If you really want “true” multi-threading for Python, use Jython or IronPython; the JVM and the CLR do support multi-CPU threads. Of course, be prepared for deadlocks, live-locks, race conditions, and all the other nuisances that come with multi-threaded code.

That’s it? Guido, maybe your time at Google is influencing the way you see the world in strange ways. But I can assure you: There are plenty of programs written, which are not designed or intended to run on multiple boxes. Yes, we all want our applications to be scalable, but guess what? Today’s hardware supports that through the presence of multiple cores and CPUs in a single box! And there is a well-established paradigm to take advantage of this: Use multi threading. Alas, not so with Python.

In another discussion about the same topic, you say:

…you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities.

Just because Java was once aimed at a set-top box OS that didn’t support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn’t mean that multiple processes (with judicious use of IPC) aren’t a much better approach to writing apps for multi-CPU boxes than threads.

Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions.

Right, well maybe there are some of us out here, which have done a lot of multi-threading programming before? Maybe there are people who are willing to take on those ‘evils’? Leaving possible incompatibilities between different operating systems aside, which may complicate the creation of new processes on the fly, using IPC has its own problems:

What are we going to use? Pipes? That only works on a single system, and not across multiple boxes.
Sockets maybe? Then we have to maintain port numbers, establish clients and servers, and deal with the overhead of this, even if all our processes run on the same box. I know it’s optimized, but still.
We now need a ‘shared nothing’ architecture. I mean, it’s nice to have a shared nothing architecture. It has a bunch of advantages. Sadly, this is just not always feasible. Maybe at Google it magically is, but in the world outside of Google, it is not always the case. One reason is that message passing is really not always the most effective means of doing things…
If you can’t share data structures between threads/processes, you need to send messages to and fro, which describe what it is you want to do, and possibly also send copies of the data with it that you want to work on. Of course, we now need to have message handlers, which adds complexity to the code. Also, we have copy overhead: Data needs to be copied from the ‘real’ data structure into a message buffer, possibly needs to be marshaled somehow, and sent across, unpacked, and so forth. Python is not very fast when it comes to object creation, but there are some objects created right here, just for that message. Plus the copying that needs to be done, which further ads to the overhead.
Many years ago, I worked at nCUBE. We built massively parallel super computers. No shared memory, pure message passing between the (thousands) of CPUs. For some applications this worked very well indeed. For others, it didn’t. Some data structures were much more naturally shared. Sometimes you could partition them, and have individual processors work on separate pieces of them. In the end, though, such approach would still require unnatural contortions in many cases.

Also, I quite simply disagree with your statement:

[This] doesn’t mean that multiple processes (with judicious use of IPC) aren’t a much better approach to writing apps for multi-CPU boxes than threads.

No, I think having to resort to multi-processing – rather than multi-threading – is definitely the much worse approach. It’s nothing more than a nasty hack to be able to take advantage of multiple CPUs, and simply shouldn’t be necessary this day and age.

You see, Guido, if I really want to have a shared nothing system, I can certainly implement that. I could do it with threads, or with processes. But I rather use the threading API. Why?

The threading API interface is the same across OSs, the multi-processing ‘API’ (if you can call it that) however can cause issues on different platforms (fork() on Windows?)
If I want to send messages between threads, I can simply use queues between threads for communication. All necessary locking is taken care of for me. Those queues come with Python (‘batteries included’) and work everywhere without requiring data to be copied: My messages are simply added as objects into a queue, and thus can be as complex as I want them to be, and don’t have to be marshaled and unmarshaled.
So, even for a shared nothing system, I would much rather use the threading API. And if I really want to share data structures between threads I would be able to do it as well and with the same, simple API. This would give me choices and performance… if, well yes, if the dreaded GIL would just go away.

A word about Jython: Yes, I tried it. How wonderfully easy it is to just create two threads and see them actually run at the same time, taking advantage of my dual-core laptop. And somehow, the speed at which my programs run is comparable, and often faster than the ‘native’ Python 2.5. Strangely, they managed to have the problem solved which you said would lead to such drastic redesign and slowdown. I have to ask then, why not just base Python on the ubiquitous JVM? Just take a look at Ananth’s blog here where he discusses this. The problem with Jython of course is that – by the project’s own admission – the code base is brittle. It is also still stuck at Python 2.2. But I certainly hope they make progress.

Funnily, I just noticed Ananth’s response to my comment on his blog, where he is echoing something that I just wrote here about how being at Google may influence someone’s view of the world. I guess I’m not the only one who thinks this.

In the end, Guido, what it boils down to is this: Please don’t make architectural choices for me. I can make my own. You see, sometimes (just sometimes) your architectural choices and preferences may simply not apply to the problem I am trying to solve. I would like to encourage you to look a little bit beyond the ever expanding rim of the Google universe, and see how your wonderful Python language is used by the rest of us.

I sure hope to see many more wonderful things from Python, and I’m looking forward to Python 3000 when it comes out. And since you already have stated that you won’t change anything about the GIL, I will continue to be frustrated by this, and will be forced to look for other languages from time to time at least, just so that my programs can use the normal hardware of today. And because I like Python so much and care for it, it will continue to annoy me to no end.

Just like I feel strongly about this, I know you feel strongly about the GIL. Still I hope that one day you can see the profound disconnect between this philosophy and the direction in which hardware develops and can move to address this severe shortcoming of an otherwise wonderful language.

Sincerely,

Juergen Brendel

Category: