Donald Knuth: “I trust my family jewels only to Linux”

Andrew Binstock interviewed Donald Knuth recently, and one of the more amusing tidbits was this:

I currently use Ubuntu Linux, on a standalone laptop—it has no Internet connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics; but I trust my family jewels only to Linux.

More seriously, I found his comments about about multi-core computers to be very interesting:

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX….

I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

This is a very interesting issue, because it raises the question of what next-generation CPU’s need to do in order to be successful. Given that it is no longer possible to just double the clock frequency every 18 months, should CPU architects just start doubling the number of cores every 18 months instead? Or should they try to concentrate a lot more computing power into an individual core, and optimize for a fast and dense interconnect between the CPU’s? The latter is much more difficult, and the advantage of doing the first is that it’s really easy for marketing types to use some cheesy benchmark such as SPECint to help sell the chip, but then people find out that it’s not very useful in real life.

Why? Because programmers have proven that they have a huge amount of trouble writing programs that take advantage of these very large multicore computers. Ultimately, I suspect that we will need a radically different way of programming in order to take advantage of these systems, and perhaps a totally new programming language before we will be able to use them.

Professor Knuth is highly dubious that the later approach will work, and while I hope he’s wrong (since I suspect the hardware designers are starting to run out of ideas, so it’s time software engineers started doing some innovating), he’s a pretty smart guy, and he may well be right. Of course, another question is whether what would we do with all of that computing power? Whatever happened to the predictions that computers would be able to support voice or visual recognition? And of course, what about the power and cooling issues for these super-high-powered chips? All I can say is, the next couple of years is going to be interesting, as we try to sort out all of these issues.

27 thoughts on “Donald Knuth: “I trust my family jewels only to Linux”

  1. It’s tempting to retort “There is a worldwide demand for at most 5 computers” or “no one will need more than 640K.” Knuth is a very bright and innovative computer scientist, but I think here he suffers from lack of imagination. TeX can’t be parallelized? Oh really? I can think of one way off the bat: in assembling a book, why not start one core on each chapter? Yes, you’d still need to make multiple passes, but you’d get the advantage of multiple processors on each pass.

    We’ve just scratched the surface on parallel programming. Yes, it’s hard to conceptualize the tasks as parallel and implement them, but there is no question it can dramatically improve throughput. Businesses of more than one person process their workflow in parallel. To the extent that computerized task represent what people did by hand previously, why can’t those things be parallelized in an analogous way? Web serving is so obviously parallel it’s scarcely worth mentioning. I won’t even get into scientific/engineering programming, where parallelization opportunities ranges from the obvious to the extremely subtle.

  2. The world changed in 2003, but few noticed.

    For about 25 years, shrinking chips every 18 months or so by a factor of two was easy. The scaling rules told you exactly how to re-adjust every parameter of the chip: feature size, voltage, gate oxide thickness, etc. The result was that chips just got smaller, faster, more efficient with every generation. Microsoft’s software got 2x more bloated every time the chips got 2x faster, but no matter, it was the golden age. Wasting time with parallel algorithms was pointless if it took you an extra year, because by then, serial processors got 1.5 to 2x faster.

    But around 2003, it all came to an end. The reason is that atoms don’t scale, and gates constructed according to the scaling rules were effectively dimmers. A gate only 3-5 atoms thick would leak because of tunneling. For a couple of years, the processor vendors tried all kinds of tricks, and Intel blew a few hundred million having to cancel some major chips. The only way out was to change the formula: you can still increase density, but power’s now the limitation. The clock cycle can no longer be sped up, so you need to add parallelism.

    The result is that yes, we’re going to see the number of processors go up exponentially. Computer science, which up to now has focused on serial, sequential algorithms, is going to have to be redone. Knuth, the god of serial algorithms, doesn’t want to play. That means that someone else will have to.

    Oh, and the answer isn’t threads, at least not long-term. Huge numbers of threads contending in one address space has a limited future. I think that it’s more likely to be separate processes and streams, that is, software design will look more like hardware design.

  3. I hate to accuse Knuth of lack of imagination. But, I’ve done things I hate before… =)

    TeX isn’t particularily representative of a real-world computing application. If I’m sitting at my desktop, I might have an audio stream playing, while surfing the web, while running something like tracker to index my drive. We’ve got at least 3 processes there that could be working simultaneously.

    Now, maybe we’ll actually get decent compression if we can shard the decompression across multiple CPUs to have audio/video available when I want it. Or maybe the app that I’m running in Python or Java can head the hotspot compilation going on another CPU so that things can be faster. The “tracker” idea is so primitive because we haven’t got it right yet. What about one day when we actually have algorithms that can index the images on my hard drive with meaningful descriptions.

    I can think of so many things that could be spawned off on background threads that make up a whole desktop environment.

    Jeff Bailey

  4. I can think of at least one other way to spend extra transistors: increased hardware specialization. And in fact, current processors implement such specialization in addition to having more cores. Intel’s latest processors have dedicated instructions for both CRC32 and AES, as well as operations designed specifically to speed up current video codecs. Furthermore, Intel also announced that their processors will soon contain hundreds of graphics cores (AKA powerful vector processors), each as powerful as that in the 965G.

    Personally, I that the future of processors will involve lower-level programmability: given all this power, why not allow software to invent new instructions as needed, and specialize a pile of transistors for that purpose? Imagine what this could do for emulating another architecture, or handling the next big multimedia codec, or performing the hash that ends up replacing SHA1? It makes very little sense to continue to provide dedicated hardware instructions for everything, particularly every such instruction creates an ongoing support burden. If instruction sets can get loaded and unloaded as easily as software libraries, no such expectation of ongoing support needs to exist.

  5. “”I can think of one way off the bat: in assembling a book, why not start one core on each chapter? Yes, you’d still need to make multiple passes, but you’d get the advantage of multiple processors on each pass.””

    I suppose that is what fork() is for, eh?

    That way you get all the benefits of multiple cores without having to modify a program in anyway. On Linux with it’s fast process spawning and COW forking the penalties for using fork over some form of multi threading are pretty much nil. If fact in many cases it would actually be faster, not to mention less buggy, and vastly easier.

    The thing is is that we are not going to just 2 or 3 or 4 cores on a CPU. We are going to start seeing eight or more. We will have 16 or more cores on a single die. What are you going to plan on doing when the average computer ships with 16 or more CPU cores?

    The hardware folks are telling the programming folks that in order to take advantage of new hardware your going to be forced to program in a way that would take advantage of 4 or more cores for what would be today be more appropriately be a single threaded application.

    Its far beyond just multitasking or setting up some sort of repetitive batch processing job. Are you sure you really want to see a multi-threaded or Firefox? Programmers are having more then a hard enough job keeping those things straight without the overhead of locking and message passing…

    For 90% of applications out there (and probably more) straight forward processing, simple algorithms, and high levels of code readability are much higher priority then any sort of performance… as long as they run at a acceptable speed and don’t do anything retarded then that is all you really need. And that way, except for specialized tasks like graphics processing, large numbers of multi cores are going to simply going to go waste.

    That’s ok, though. In my personal opinion the future of computing is going to be mostly be dominated by devices that cost about the same as a new pair of shoes and can fit in your pocket comfortably.

  6. Nate,

    Well, you could start one core on each chapter, but things like the page numbers might change from one pass to another. Other things such as the references for the bibliography might change, especially for those bibliography styles which are ordered by the first time a reference is used in a book. Footnote numbers, figure numbers, etc. might also be problems. It’s soluble with a multipass algorithm, although you essentially end up typesetting the book at least twice; maybe three times in some horrible corner cases. Which of course means twice as much power, which may be an issue if you’re worried about green computing.

    I agree with you that in the future most people will not be using computers that are quite so massively powerful. So another interesting question is how many cores can be profitably used in a laptop, and what sort of power-savings tricks will be needed because of a laptops power and thermal requirements. In other words: I have a dual-core laptop; how many more cores will actually make sense? And if people don’t need to upgrade laptops every few years, what does that do the economics of companies like Dell, Lenovo etc.?

  7. > Its far beyond just multitasking or setting up some sort of repetitive batch processing job. Are you sure you really want to see a multi-threaded or Firefox?

    I’m quite sure both are multi-threaded already.

    This kind of multi-threading only makes sense if your apps are CPU bound though. OOo and Firefox shouldn’t be (that) CPU bound, so in that case there’s no need to use multi-threading like this or to take advantage of multiple cores.

  8. Firefox (2 at least) is actually mostly single-threaded, and it’s a source of endless annoyance – bad javascript on one page can effectively freeze the whole browser – *extremely* irritating if you’ve got 16 tabs open. I wish they at least ran a seperate thread or even better a separate process per window or even per tab (on x11 at least, different processes can control child subwindows of the same window), especially on linux where processes are essentially as cheap as threads.

    plugins are also run in-process, so flash can freeze/crash firefox (the latest sun java plugin deliberately runs out-of-process off its own bat, wish adobe flash plugin did the same).

  9. the best thing i got out of my current dual core setup is that i can run AVG while doing other stuff.

    it’s great 😀

    (actually i’m lying, things got much better in other applications, compared to my previous single core machine)

  10. Why don’t hardware designers just make larger IC’s. I do not know how larger the actual processor is, from what I remember they looked to be 1cm^2, so if the problem is they can only fit 1 billion transistors into that space, why not make the space 1.5cm^2, or 2cm^2, and then put in more transistors?

  11. Jeff, the problem with your analysis is that generally, streaming audio and surfing the web don’t consume a whole CPU. A tracker might, but we’re now up to a CPU and change. What apps do you run concurrently that need two full CPUs, or four, or eight?

    There are some applications that can use it, but there are a lot (like typesetting a sequential work, or compressing a sequential audio stream) which just need a lot of time by one CPU. Throwing up to 80 cores (isn’t that the stuff they’re talking about now) at a problem like that is going to leave 80 cores idle.

  12. The thing with Donald Knuth is that all of his programs are batch oriented. TeX being the classic example. You feed it a TeX file, it feeds you a PS or a PDF. Where highly concurrent software makes a difference is when you are interacting with a user, or better, lots of users.

    Concurrency is the future, and not just because CPU designers are “running out of ideas.”

  13. @atarijedi
    Well increasing die size, will drop the yield, which is something that you don’t really want.
    The larger the die, the longer it will take signals to propagate across the die, has some rather nasty effects.

    Well for starters it would seem Knuth isn’t really referring to Moore’s law, since it only makes a prediction about transistor density not computational “power”.

    I am not sure I would really say that the hardware guys have run out of ideas, as there are plenty of neat ideas out there. I would say the hardware guys are running out of neat ideas when they are constrained by the architecture of the x86 family. Which is imposed my the software guys and the general desktop computing market.

    It looks like this is set to change as the multicore business seems to be heading towards putting different cores onto the same piece of silicon. So maybe one day we will go buy a CPU, and it will have a couple of x86-64 cores, a few vector processors, an ARM core, a few Power cores, video decoders etc.

    I think I would love to see reconfigurable computing take off. Although personally I am more concerned about the bus and storage bandwidth, than the whole scaling the computing power.

    (It is also interesting that current generation of consoles are based around the Power ISA, rather than x86)

  14. I see that Oracle databases can be configured to parallelize queries.
    So individual programmers don’t have to think about this – the database engine is configured to split processing in threads that are assigned to separate processors.

  15. From the user point of view-

    I’m breaking in my first quad processor at work, and trying to decide whether it makes sense to get one for home. I’m disappointed that the most power hungry engineering design app. doesn’t use the multiple processors for all jobs. The types of engineering design and analysis that make CPU a bottleneck in my design process are very different from the types the software company tried to optimize for multiple processes. But, I don’t see how they could have anticipated the intimate details of our design process; that information is often confidential.

    However, the quad has a distinct advantage that is always a processor quickly available for user interface functions like email and documents, even when two or three other processors are 100% occupied with the pressing numerical problem of the day. This greatly improves my ability to work in parallel with the computer. Given that wetware is much more expensive than computers, this is key…

  16. Multicore? – nice. Where is the real, stable 64 bit OS for desktop? Multithreading, multiprocess – parallelism? Most of the mainboards does not have BUS -es to use of those features. Need parallel computer architecutres – not multicore but multimachines! Whitout this multicore only a multiplied pocket calculator – you can calculate taxes in parallel 🙂 We need peripherals for multiprocessing! Good example is video cards (in early 90’s I meet video card that was about 100 times more powerfull then “main” box hosted it, the result was amazing for those time).

  17. I ported a program that runs approx 6 times faster on a dual quad core Xeon. It runs nearly 1.5x on a core duo, 3.5x or so on a straight quad core. It’s a straight number cruncher. I’d say he’s absolutely right about compilers, well, actually he’s right about everything, it’s Knuth… but at any rate there are a small portion of programs that have embarrassingly parallel sections that work great multithreaded. POSIX thread implementations on are all slightly different, like looking at support for any RFC, but with tricks you can get rather portable C. The fact of the matter is that most other “make-it-faster” concepts outside of multithreading are tied to a single companies hardware and/or language/language extension. That’s always a bummer, because who wants to be stuck learning some tech that goes the way of the dinosaurs.

    NB this is for a really, really short scientific computing program. Shark is awesome on OS X for figuring out hotspots and free as in beer with OS X.

  18. >[…]the quad has a distinct advantage that is always a processor
    >quickly available for user interface functions[…]

    Well, you get the same kind of a effect with a single core CPU if time consuming tasks are run at a low priority. With a low priority you’re able to use the machine as usual and if you aren’t using much processing power elsewhere (e.g. just surfing around or reading emails), the process won’t take much longer.

    I think we’ll probably move away from purely heterogeneous architectures over to multiple cores with a batch of vector processors. However, this of course raises the question if all that processing power is actually any useful for the average user. Rendering or encoding aren’t really everyday tasks for example.

  19. What’s also worth noting is that software bloat is not a M$ only chore : I can still remember those days when the heavily bloated Mozilla suite (now Seamonkey, i’ve been told) was faster than Firefox. So for shure, there’s now plenty of standards in the new firefox (think AJAX, CSS, (X)HTML, XML, XSLT, XUL to name a few).
    And web site designers tend to stand by the assertion that you do have a dual core computer to render all this fancy stuff they put on their web sites.
    There was a time where all that we had was a single core 400Mhz processor, and it was enough to browse the web, now you see some smartphones (no advertising please 😉 with 600Mhz (single) cores which are barely sufficient to render a basic web page.
    As the saying goes, something, somewhere, went terribly wrong…

  20. >Well, you get the same kind of a effect with a single core CPU if time >consuming tasks are run at a low priority.

    In theory, but I never get very usable results in practice. This may be an OS issue. A hosed winbox takes forever to respond to keyboard input, regardless of the priority of the hosing process.

    It’s becoming more common for the “average” user to use photo editing, and video software, which can sink a fair bit of power. It’s also becoming more common for peripherals such as wireless cards to be software implemented. It’s obvious to even a fairly naive user that some of these apps are very wasteful of the computational power. From a financial and ecological point of view, buying more processor power so that software manufacturers can blatantly waste it is horrifying, but I will almost certainly break down and do it.

  21. Interesting points. I know for a fact that my desktop has not been cpu bound for years. And while I see use for dual and quad core for the desktop since we run a couple of background jobs anyway and a little bit less latency always is helpful, I’m not so sure about 80 cores on the desktop which is what Intel has in the lab.

    As always, the answer may lie in vastly different applications. Perhaps games will rescue us from the idle cycles again, utilizing it for some AI opponent.

    What’s interesting is what Intel and AMD so far does NOT do with the vastly increasing number of transistors when they ask the question if we want more cores or increased clock speed. Perhaps we want cheaper circuits, more energy efficient, or more functionality (all sorts of I/O and graphics spring to mind).

  22. Moore’s law does not say anything about speed, but only that about doubling the number of transistors every two years. Whether this doubling will converted in higher speed depends on different factors.

    The spectacular increase in CPU speed in the past was not achieved only due to higher clock rate but also due to a better CPU design. Early Intel CPUs with CISC architecture, which spent a few clocks per instruction, has been replaced with the modern superscalar design, which can perform a few instructions per cycle. Even for modern processor we can see that high clock speed does not necessitate a superior performance. For instance, 1.83 GHz Core 2 Duo is significantly faster than the 3 GHz Pentium 4.

    Do the hardware designers have run out of ideas? If so, it happened a long time ago. All modern desktop CPUs use a superscalar architecture, and the first superscalar processor was built by Seymour Cray in 1965, i.e. 6 years before Intel released its first microprocessor Intel 4004. Since then only EPIC was invented, and though EPIC was hailed as a way to achieve higher parallelism than it is possible with the superscalar architecture, EPIC has not lived up to its promise aside some specialized tasks.

    Do we have an alternative? Seymour Cray had always resisted to massively parallel machines: “If you were plowing a field, which would you rather use: Two strong oxen or 1024 chickens?” but in the middle of 1990s, he started his own massively parallel design. Unfortunately, his sudden death made impossible for us to know what direction his work would take, but the start-up company that he led is now specialized in reconfigurable computing. It is early to say if that configuration is the future of computing, but whatever it will be, I think it will be much closer to the data-stream-based computing paradigm than to massive multiprocessing. Remember Knuth words: “Pipelines actually work for me, but threads don’t.”

  23. Sorry girls and guys !

    Have you read Donald’s books? MMIX is supposed is fully parallel, but only at the power of “2 or 3” due to how Donald made MMIX-PIPE. Effectively Donald did make a few typos so that “3” and “2+epsilon” gets mixed up, but I’m not perfect either. Are there anyone out there asking that we need a larger number than 3? Please tell me and I will investigate. A similar question: Does any one of you think you know what exponentation actually means? If that is the case I will recommend you to study von Neumann algebra.

    “Language is the key to understanding the problem of language”

  24. He is not against multi-core computing in general … in the same interview he also said:

    I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation.


    I also admit that I haven’t got many bright ideas about what I wish hardware designers would provide instead of multicores, now that they’ve begun to hit a wall with respect to sequential computation.

    and also

    The field of combinatorial algorithms is so vast that I’ll be lucky to pack its sequential aspects into three or four physical volumes, and I don’t think the sequential methods are ever going to be unimportant. Conversely, the half-life of parallel techniques is very short, because hardware changes rapidly and each new machine needs a somewhat different approach.

    and finally

    So I decided long ago to stick to what I know best. Other people understand parallel machines much better than I do; programmers should listen to them, not me, for guidance on how to deal with simultaneity.

    Regards, Roman

  25. >> I do grant that web browsing probably will get better with multicores.
    >> I’ve been talking about my technical work, however, not recreation.

    So I think that your (err, everyone’s) technical work will be moving into the browser though, in some sense. Not all of it. But collaborating with one or two people on a project is very nice. Collaborating in real time with 30 or so is amazing. Some of the real-time collaboration that we’ve only truly had broadly available (Google Docs, Wikis, etc.) has been made available in the web browser. I think that’s important for a reason.

    P.S. – I’m aware that Abiword supports useful collaboration, and some other desktop apps via XMPP. The browser, however, saw a very large jump recently (last 3 years) in the number of real-time collaborative apps available. The desktop did not see anything like that. Anyway, my customers don’t have Abiword installed, and they never will, sadly.

  26. I think it’s very difficult to find an everyday program that will work better on two, four or even eight cores than one, but if FPGAs have anything to teach us, it’s that making a program work faster with effectively hundreds or thousands of cores is easy. I think chip manufacturers have to start thinking about a new RISC based, or even smaller, architectures that can be duplicated many, many times.

Leave a Reply

Your email address will not be published. Required fields are marked *