Thoughts by Ted

Musings about Open Source, Linux, and Life by Theodore Tso

Donald Knuth: “I trust my family jewels only to Linux”

Andrew Binstock interviewed Donald Knuth recently, and one of the more amusing tidbits was this:

I currently use Ubuntu Linux, on a standalone laptop—it has no Internet connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics; but I trust my family jewels only to Linux.

More seriously, I found his comments about about multi-core computers to be very interesting:

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX….

I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

This is a very interesting issue, because it raises the question of what next-generation CPU’s need to do in order to be successful. Given that it is no longer possible to just double the clock frequency every 18 months, should CPU architects just start doubling the number of cores every 18 months instead? Or should they try to concentrate a lot more computing power into an individual core, and optimize for a fast and dense interconnect between the CPU’s? The latter is much more difficult, and the advantage of doing the first is that it’s really easy for marketing types to use some cheesy benchmark such as SPECint to help sell the chip, but then people find out that it’s not very useful in real life.

Why? Because programmers have proven that they have a huge amount of trouble writing programs that take advantage of these very large multicore computers. Ultimately, I suspect that we will need a radically different way of programming in order to take advantage of these systems, and perhaps a totally new programming language before we will be able to use them.

Professor Knuth is highly dubious that the later approach will work, and while I hope he’s wrong (since I suspect the hardware designers are starting to run out of ideas, so it’s time software engineers started doing some innovating), he’s a pretty smart guy, and he may well be right. Of course, another question is whether what would we do with all of that computing power? Whatever happened to the predictions that computers would be able to support voice or visual recognition? And of course, what about the power and cooling issues for these super-high-powered chips? All I can say is, the next couple of years is going to be interesting, as we try to sort out all of these issues.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

Organic vs. Non-Organic Open Source, Revisited

There’s been some controversy generated over my use of the terminology of “Organic” and “Non-Organic” Open Source. Asa Dotzler noted that it wasn’t Mozilla’s original intent to “make a distinction between how Mozilla does open source and how others do open source”. Nessance complained that he didn’t like the term “Non-Organic”, because it was “raw and vague - is it alien, poison, silicon-based?” and suggested instead the term “Synthetic Open Source”, referencing a paper by Siobhán O’Mahony, ” What makes a project open source? Migrating from organic to synthetic communities”. Nessance referenced a series of questions and answers by Stephen O’ Grady from Red Monk, where he claimed the distinction between the two doesn’t matter. (Although given that Sun is a paying customer of Red Monk, Stephen admits that this might have influenced his thinking and so he might be “brainwashed” :-).

So let’s take some of these issues in reverse order. Does the distinction matter? After all, if the distinction doesn’t matter, then there’s no reason to create or define specialized terminology to describe the difference. Certainly, Brian Aker, a senior technologist from MySQL, thinks it does, as do folks like me and Amanda McPherson and Mike Dolan; but does it really? Are we just saying that because we want to take a cheap shot at Sun?

Well, to answer that, let’s go back and ask the question, “Why is Open Source a good thing in the first place?” It’s gotten to the point where people just assume that it’s a good thing, because everybody says it is. But if we go back to first principals maybe it will become much clearer why this dinction is so important.

Consider the Apache web server; it was able to completely dominate the web server market, easily besting all of its proprietary competitors, including the super-deep-pocketed Microsoft. Why? It won because a large number of volunteers were able to collaborate together to create a very fully featured product, using a “stone soup” model where each developer “scratched their own itch”. Many, if not most, of these volunteers were compensated by their employers for their work. Since their employers were not in the web server business, but instead needed a web server as means (a critical means, to be sure) to pursue their business, there was no economic reason not to let their engineers contribute their improvements back to the Apache project. Indeed, it was cheaper to let their engineers work on Apache collaboratively than it was to purchase a product that would be less suited for their needs. In other words, it was a collective “build vs. buy” decision, with the twist that because a large number of companies were involved in the collaboration, it was far, far cheaper than the traditional “build” option. This is a powerful model, and the fact that Sun originally asked Roy Felding from the Apache Foundation to assist in forming the Solaris community indicates that at least some people in Sun appreciated why this was so important.

There are other benefits of having code released under the Open Source license, such as the ability for others to see the implementation details of your operating system — but in truth, Sun had already made the Source Code for Solaris available for a nominal fee years before. And, of course, there are plenty of arguments over the exact licensing terms that should be used, such as GPLv2, GPLv3, CDDL, the CPL, MPL, etc., but sometimes those arguments can be a distraction from the central issue. While the legal issues that arise from the choice of license are important, at the end of the day, the most crucial issue is the development community. It is the strength and the diversity of the development community which is the best indicator for the health and the well-being of an Open Source project.

But what about end-users, I hear people cry? End users are important, to the extent that they provide ego-strokes to the developers, and to the extent that they provide testing and bug reports to the developers, and to the extent that they provide an economic justification to companies who employ open source developers to continue to do so. But ultimately, the effects of end-users on an open source project is only in a very indirect way.

Moreover, if you ask commercial end users what they value about Open Source, a survey by Computer Economics indicated that the number one reason why customers valued open source was “reduced dependence on software vendors”, which end users valued 2 to 1 over “lower total cost of ownership”. (Which is why Sun Salescritters who were sending around TCO analysis comparing 24×7 phone support form Red Hat with Support-by-email from Sun totally missed the point.) What’s important to commercial end users is that they be able to avoid the effects of vendor lock-in, which implies that if all of the developers are employed by one vendor, it doesn’t provide the value the end users were looking for.

This is why whether a project’s developers are dominated by employees from a single company is so important. The license under which the code is released is merely just the outward trappings of an open source project. What’s really critical is the extent to which the development costs are shared across a vast global community of developers who have many different means of support. This saves costs to the companies who are using a product being developed in such a fashion; it gives choice to customers about whether they can get their support from company A or company B; programmers who don’t like the way things are going at one company have an easier time changing jobs while still working on the same project; it’s a win-win-win scenario.

In contrast, if a project decides to release its code under an open source license, but nearly all the developers remain employed by a single company, it doesn’t really change the dynamic compared to when the project was previously under a closed-source license. It is a necessary but not sufficient step towards attracting outside contributors, and eventually migrating towards having a true open source development community. But if those further steps are not taken, the hopes that users will think that some project is “cool” because it is under an open-source license will ultimately be in vain. The “Generation Y”/Millennial Generation in particular are very sensitive indeed to Astroturfing-style marketing tactics.

Ok, so this is why the distinction matters. Given that it does, what terms shall we use? I still like “Organic” vs “Non-organic”. While it may not have been intended by the Mozilla Foundation, the description in their web page, “only a small percentage of whom are actual employees [of the Mozilla Foundation]“, is very much what I and others have been trying to describe. And while I originally used the description “Projects which have an Open Source Development Community” vs “Projects with an Open Source License but which are dominated by employees from a single company”, I think we can all agree these are very awkward. We need a better shorthand.

When Brian Aker from MySQL suggested “Organic” vs “Non-Organic” Open Source, and I think those terms work well. If some folks think that “Non-Organic” is somehow pejorative (hey, at least we didn’t say “genetically modified Open Source” :-), I suppose we could use Synthetic Open Source. I’m not really convinced that is any much more appetizing, myself, however.

So what would be better terms to use? Please give me some suggestions, and maybe we can come up with a better set of words that everyone is happy with.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

Links — 2008-04-25

The Open Source Commands
Really good ideas that companies should take to heart.
Open Source Commandments II: Passover Penguins
More really good ideas, especially for companies like Sun…
Did Canonical Just Get Punked by Red Hat and Novell?
Interesting thoughts about Linux desktop strategies
rPath to OEM SUSE Linux Enterprise Server from Novell for Appliances
I know a bunch of the folks at rPath, and I very much respect their technology; I think this is a very good thing for them.
Does Microsoft CEO Steve Ballmer need an intervention?
Does anyone think a Microsoft/Yahoo merger makes sense besides Mr. Ballmer?
Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

Organic vs. Non-organic Open Source

Brian Aker dropped by and replied to my previous essay by making the following comment:

I believe you are hitting the nail on the “organic” vs “nonorganic” open source. I do not believe we have a model for going from one to the other. Linux and Apache both have very different models for contribution… but I don’t believe either are really optimized at this point.

Optimization to me would lead to a system of “less priests” and more inclusion.

I made an initial reply as comment, and then decided it was so long that I should promote it to a top-level post.

I assume that when Brian talks about “organic open source” what he means is what I was calling an “open source development community”. Some googling turned up the following definition from Mozilla Firefox’s organic software page: “Our most well-known product, Firefox, is created by an international movement of thousands, only a small percentage of whom are actual employees.”

This puts it in contrast with “non-organic” software, where all or nearly all of the developers are employed by one company. (And anyone who proves talented at adding features to that source base soon gets a job offer by that one company. :-) By that definition we can certainly see projects like Wine, Mysql, Ghostscript (at one time), and others as fitting into that model, and being quite successful. There’s nothing really wrong with the non-organic software model, although many of them have struggled to make enough money when competing with pure proprietary softare competitors, with MySQL perhaps being the exception which proves the rule.

In most of these cases, though, the project started more as an organic open source, and then transitioned into the non-organic model when there was a desire to monetize the project — and/or when the open source programmers decided that it would be nice if they could turn their avocation into a vocation, and let their hobby put food on the family table.

Solaris, of course, is doing something else quite different, though. They are trying to make the transition from a proprietary customer/supplier relationship to trying to develop an Open Source community — and what John’s candidate statement pointed out is that they weren’t really interested in creating an organic open source developer community at all, but they wanted the fruits of an open source community — with plenty of application developers, end-users, etc., all participating in that community.

We don’t have a lot of precedent for projects who try to go in this direction, but I suspect they are skipping a step when they try to go to the end step without bothering to try to make themselves open to outside developers. And by continuing to act like a corporation, they end up shooting themselves in the foot. For example, the OpenSolaris license still prohibits people from publishing benchmarks or comparisons with other operating systems. Very common in closed-source operating systems and databases, but it discourages people from even trying to make things better, both within and outside of the Open Solaris core team. Instead, they respond to posts like David Miller’s with “Have you ever kissed a girl?”. (Thanks, Simon, for that quote; I had seen it before, but not for a while, and it pretty well sums up the sheer arrogance of the Open Solaris development team.)

So while Linux may not be completely optimized in terms of “less priests” and more inclusion, at least over 1200 developers contributed to 2.6.25 during its development cycle. Compared to that, Open Solaris is positively dominated by “high priests” and with a “you may not touch the holy-of-holies” attitude; heck, they won’t even allow you to compare them to other religions without branding you a heretic and suing you for licensing violations!

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

What Sun was trying to do with Open Solaris

I was recently checking to see what, if any follow-up there had been from Sun’s ham-handed handling of the Open Solaris Trademark, and I ran across this very interesting comment from John Plocher’s Candidate Statement for the Open Solaris Governing Board:

“I also think there was a misunderstanding about what Sun desired when it launched the community (in part) to encourage developers to adopt and use Solaris. My take is that, while there *is* value in getting more kernel, driver and utility developers contributing to and porting the (open) Solaris operating system, there is significantly *more* value in having a whole undivided ecosystem based on a compatible set of distributions, where application developers, university students, custom distro builders and users are all able to take advantage of each other’s work.

Put these two things together, and you can see Sun’s predicament. Sun *wanted* a community that empowered application developers, but *got* a community aimed squarely at kernel hackers. Whether you see this as the “kernel.org -vs- Ubuntu” fight, or the “fully open -vs- MySQL model” argument, in my opinion, it all is simply a reflection of the above mismatched expectations.”

So that explains why it’s take three long years to try to get basic open source development tools (such as putting Open Solaris source code in a distributed SCM located outside of the Sun firewall) for Open Solaris. It was never was Sun’s intention to try to promote a kernel engineering community, or at least, it was certainly not a high priority for them to do so. This can be shown by the fact that as of this writing they still are using the incredibly clunky requester/sponsor system for getting patches into Solaris; setting up a git or mercurial server is not rocket science. This lack explains why Linus gets more contributions while brushing his teeth than Open Solaris gets in a week.

So if you run into a Sun salescritter or a Sun CEO claiming that OpenSolaris is just like Linux, it’s not. Fundamentally, Open Solaris has been released under a Open Source license, but it is not an Open Source development community. Maybe it will be someday, as some Sun executives have claimed, but it’s definitely not a priority by Sun; if it was, it would have been done before now. And why not? After all, they are getting all of the marketing benefit of claiming that Solaris is “just like Linux”, without having to deal with any of the messy costs of working with an outside community. As a tactical measure, astroturfing is certainly a valid marketing trick. But after three years, the excuse of “just you wait a little longer, we’re just trying to figure this open source community stuff out”, is starting to wear a little thin.

Furthermore, if (as John Ploncher claims) this was about “empowering application programmers”, why was it that Sun’s first act was to trumpet how wonderful it was to release the Solaris source code under a Open Source license? This only seems to make sense if the Open Solaris initiative was really a cynical marketing tactic to try to save Solaris from being viewed as irrelevant. If that was Sun’s intention, I think it is fair to say that from a marketing point of view, the tactic has been at least partially successful — although as John has admitted, the goal of creating a full community with application developers, university students, and so on, hasn’t materialized for Open Solaris. Sun has the dream; the Linux community is living it.

However, from business standpoint, I wonder if Sun will really be able to sustain their Solaris engineering team if they will really be doing all of the work themselves, and outside contributions continue at the rate of 0.6 patches per day. After all, the margins when you are selling low-cost AMD servers are much lower than when you are selling über-expensive SPARC servers. With Linux, we have a major advantage in that kernel improvements are coming from multiple companies in the ecosystem, instead of being paid for by a single company. And given that 70-80% of Sun’s AMD servers are running Linux, not Solaris, it’s not clear how Sun justifies their Solaris engineering costs to their shareholders. Furthermore, if Solaris on x86_64 were to actually take off, there’s nothing to stop competitors from selling Solaris support — except the competitors won’t have to pay the engineering costs to maintain and improve Solaris, so they would be able to provide the support much more cheaply than Sun could. So while Sun’s marketing tactics have kept Solaris alive in some verticals, I have to question how successful Sun will be in the long-term.

Update: I’ve posted a reply to Brian Aker’s comments as a follow up that would probably be interesting to those folks who are interested in the ideas found in this essay.

Update^2: John Plocher’s name is spelled with an ‘h’, which I had omitted. My deep apologies to him for getting his name wrong. I’ve fixed it in the post.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

AT&T: Customer support horrors

This morning, I just wasted two hours of my life trying to deal with a bill with AT&T. I am a work-at-home employee, and my company has a contract with AT&T so that when I deal 1-700-xxx-xxxx, I can reach the internal corporate phone network. In addition, long distance calls on my home office line are billed to the company at the pre-negotiated corporate rates. I also had a (long-dormant) AT&T long distance account, dating from before I started working at this company. Starting at the beginning of the year, that account grew a $18 monthly fee. When I tried to make it go away, the AT&T consumer side of the house said that according to Verizon, that was because I had my long distance service through AT&T. Which was true, in a sense — AT&T was providing my service, but through a corporate account. But the consumer side of AT&T didn’t understand that, and were in my opinion, willfully ignorant.

Several phone calls later, I got the 1-800 number for the AT&T corporate side of the house, and those folks said they couldn’t look at consumer/personal AT&T accounts, and implied it was my fault that I had both accounts on the line. (This took over an hour while the support person tried multiple things, all of which was not helpful at all.)

I finally googled for AT&T CEO’s office, and found a number on the consumerist.com web site that claimed to be the AT&T CEO’s office. It had long since been directed to a help center, but after I told my tale, I was quickly transferred to an executive customer support person, who was able to fix the problem in ten minutes.

The only questions remaining are:

1) Why can’t the different parts of AT&T talk to one another?
2) Will this problem really be solved, or will I see another bill in a month or two and have to spend more time dealing with this mess all over again?
3) When will VOIP services put AT&T long distance out of its misery? (Given the absolute frustration of this morning, this can’t happen soon enough.)
4) Will AT&T compensate me for two hours of frustration and of my life that I will never get back. (Not bloody likely.)

One thing is for certain, it will be a long, long, LONG time before I will ever voluntarily choose AT&T to provide service for just about anything. Even an iPhone wouldn’t be enough inducement….

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

Why I purchased the Sony PRS-505 Reader

Although I lot of people have been lauding the Kindle, I recently decided to go with the Sony PRS-505 instead. Yes, the Kindle has built-in EVDO access, and the ability to buy books without a computer, or even browse the web; and yes, the Sony has once again demonstrated it can’t create a compelling 21st century computer application to save its life. However, it had a few things that at least for me, made a better choice for me than the Kindle:

  1. The Sony is thinner — I want to be able to slip it into my laptop case and have it take the absolute minimum amount of space.
  2. The Sony simply looks much more elegant than the Kindle; steel with a leather cover simply looks a lot better than white, cheasy plastic.
  3. I’m not interested in buying a lot of DRM’ed ebooks; ergo, I won’t be buying may books from either Sony or Amazon’s web sites. It is highly likely that within 2 years I will be buying a more advanced eBook reader, possibly one with color, and I don’t want to be locked into a single format where I have to go and repurchase all of my books just because some the latest and greatest eBook reader uses an incompatible DRM technology from whatever Sony or Amazon has used.
  4. The Sony is $100 cheaper. Given that something better will be available within 2-3 years at the very most, and possibly sooner, I’m just not interested in spending $400 on a first generation prototype.
  5. Perhaps the most important, the Sony has the really, really good open source support. Kovid Goyal’s libprs500 project supports the Sony PRS-500 and PRS-505, and has very good version tools, allowing people to convert eBooks previously stored in HTML, PDF, TXT, Microsoft Reader (.lit), IDPF/Open eBook (.epub) into Sony’s format. And with a little bit of work, it does a very, very good job with the conversion. Better yet, its ability to convert multiple HTML pages into a single eBook, with credible table of contents, means that libprs500 can pull down the New York Times, the Economist, etc., automatically format it into a single eBook which you can save onto your Sony Reader, and then read it while you are on the airplane. No muss, no fuss. I can also take various books that are available on the web as HTML and also convert them into an eBook which can be used by the Sony Reader very easily.

This last point is I believe one of the best reasons why the Sony Reader will be able to compete very successfully with the Kindle. The libprs500 software is written as a python application, and it will work on Windows, Linux, and MacOS — and its GUI user interface is far better than the truly pathetic Sony Connect software. Score one for Open Source! In my opinion, Sony should send a very nice gift certificate to Kovid as a thank you; his open source project has added an immeasurable amount of value to their product.

The only thing that you can’t do using the libprs500 software is buy DRM’ed books which are locked to the Sony Reader — but that isn’t something that many people will be particularly interested in, I suspect. OK, I did buy Pillars of the Earth, which was available on the Sony site for $6 dollars — hmm, cheaper than Amazon’s $9.99 — but that was an investment I was willing to flush down the toilet when the PRS-505 becomes obsolete, mainly so I could test what buying a DRM’ed book would be like from the Sony web site. But I probably won’t be buying many books with DRM that way. On the other hand, I am quite willing to spend quite a bit more money on non-DRM’ed books from publishes such as Baen Books.

Here’s to the hope to the publishing industry figures things out faster than RIAA’s member companies. In the meantime, I will be mostly pretending that the both the Sony and Amazon eBook stores with their proprietary DRM’ed books don’t exist…

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

Does perfect code exist? (Abstractions, Part 1)

Bryan Cantrill recently wrote a blog entry, where among other things, he philosophized on the concept of “perfect code”. He compares software to math, arguing that Euclid’s greatest common denominator algorithm shows no sign of wearing out, and that when code achieves perfection (or gets close to perfection), “it sediments into the information infrastructure” and the abstractions defined by that code becomes “the bedrock that future generations may build upon”. Later, in the comments of his blogs, when pressed to give some examples of such perfection, he cites a clever algorithm coded by his mentor to divide a high resolution timestamp by a billion extremely efficiently, and Solaris’s “cyclic subsystem”, a timer dispatch function.

Watching his talk at Google where his introduction and sidebar book review on Scott Rosenberg’s “On Dreaming in Code”, it’s clear that he very passionately believes that it is possible to write perfect code, and that one should strive for that at all times. Perhaps that’s because he mostly writes code for operating systems, where the requirements change slowly, and for one OS in particular, Solaris, which tries far harder than most software projects to keep published interfaces stable for as long as possible. In contrast, the OS I’ve spent a lot of time hacking around, Linux, quite proudly states that at least inside the kernel, interfaces can not and should not be stable. Greg Kroah-Hart’s “Stable API Nonsense” is perhaps one of the strongly and most passionate expositions of that philosophy.

I can see both sides of the argument, and in their place, both have something to offer. To Bryan’s first point, it is absolutely true that interfaces can become “bedrock” upon which entire ecosystems are built. Perhaps one of the most enduring and impactful example would be the Unix programming interface, which has since become enshrined by POSIX.2 and successor standards. I would argue, though, that it is the interface that is important, and not the code which initially implemented it. If the interface is powerful enough, and if it appears at the right time, and the initial implementation is good enough (not perfect!), then it can establish itself by virtue of the software which uses it becoming large enough that it assumes an important all out of scale with its original intention.

Of course, sometimes such an interface is not perfect. There is an apocryphal story that when S. Feldman at AT&T labs first wrote the ‘make’ utility, that he did so rather quickly, and then put it available for his fellow lab members to use, and then went home to sleep. In some versions of the story he had stayed up late and/or pulled an all-nighter to write it, so he slept a long time. When he came back to work, he had come up with a number of ways to improve the syntax of the Makefile. Unfortunately, (so goes the story) that too many teams were already using the ‘make’ utility, so he didn’t feel he could change the Makefile syntax. I have no evidence that this ever took place, and I suspect it is an urban myth that was invented to explain why Makefiles have a rather ugly and unfortunate syntax that many would call defects, including the use of syntactically significant tab characters which are indistinguishable from other forms of leading whitespace.

Another example which is the bane of filesystem designers everywhere are the Unix readdir(2), telldir(2), and seekdir(2) interfaces. These interfaces fundamentally assume that directories are stored in linear linked lists, and filesystems that wish to use more sophisticated data structures, such as b-trees, have to go to extraordinary lengths in order support these interfaces. Very few programs use telldir(2) and seekdir(2), but some filesystems such as JFS maintain two b-trees instead of one just to cater to telldir/seekdir.

And yet, it is absolutely true that interfaces can be the bedrock for an entire industry. Certainly no matter what its warts, the Unix/Posix interface has proven the test of time, and it has been responsible for the success of many a company and many billions of dollars of market capitalization. But is this the same as perfect code? No, but if billions of dollars of user applications are going to be depending on that code, it’s best if code which implement such an interface be high quality, and should attempt to achieve perfection.

But what does it mean for code to be perfect? For a particular environment, if the requirements can be articulated clearly, I can accept that code can reach perfection, in that it becomes as fast as possible (for the given computer platform), and it handles all exception cases, etc., etc. Unfortunately, in the real world, the environment and the requirements inevitably change over time. For example, Bryan’s cyclic subsystem, which he proudly touts as being if not perfect, almost so, and which executes at least 100 times a second on every Solaris system in the world. I haven’t looked at the cyclic system in any detail, since I don’t want to get myself contaminated (the CDDL and GPLv2 licenses are intentionally incompatible, and given that companies — including Sun — have sued over IPR issues, well, one can’t be too careful), but waking up the CPU from its low-power state 100 times a second isn’t a good thing at all if you are worried about energy conservation in data centers — or in laptops.

For example, on my laptop, it is possible to keep wakeups down to no more than 30-35 times a second and it would be possible to do more but for an abstraction limitation. Suppose for example an process wants to be sleep and then receive a wakeup 1 milliseconds later, and so requests this via usleep(). At the same time, another application wants to sleep until some file descriptor activity takes place, or after 1.2 milliseconds takes place. 0.3 milliseconds later, a third process requests a sleep, this time for 0.8 milliseconds. Now, it could be that in all of the above cases, the applications don’t actually need exact timing; if they all get their wakeups plus or minus some fraction of a millisecond, they would be quite cool with that. Unfortunately, the standard timer interfaces have no way of expressing this, and so the OS can’t combine the three wakeups at T+1.0, T+1.1, and T+1.2 milliseconds into one wakeup at T+1.1ms.

So this is where Greg K-H’s “Stable API Nonsense” comes into play. We may not be able to solve this problem at the userspace level, but we darn well can solve this problem inside the kernel. Inside the kernel, we can change the timer abstraction to allow device drivers and kernel routines to provide a timer delta plus a notion of how much accuracy is required for a particular timer request. Doing so might change a timer structure that previously external device drivers had depended upon — but too bad, that’s why stable ABI/API’s are not supported for internal kernel interfaces. Is this that the interface could have been extended? Well, perhaps, and perhaps not; if an interface is well designed, it is possible it can be extended in an API and/or ABI compatible way. There usually is a performance cost to doing so, and sometimes it may make sense to pay that that cost, and sometimes it may not. I’ll talk more about that in a future essay.

Yet note what happened to the timer implementation. We have a pretty sophisticated timer implementation inside Linux, that uses heap data structures and buckets of timers for efficiency. So while I might not call it perfect, it is pretty good. But, oops! Thanks to this new requirement of energy efficiency, it will likely need to get changed to support variable levels of accuracy and the ability to fire multiple timers that are (more or less) coming due in a single CPU wakeup cycle. Does that make it no longer perfect, or no longer pretty good? Well, it just had a new requirement impact the code, and if criteria for perfection is for the abstraction defined by the code to be “bedrock” and never-changing, and no need to make any changes in said code over multiple years, I would argue that little to no code, even OS code, can ever achieve perfection by that definition — unless that code is no longer being used.

(This is the first of a multi-part series of essays on abstractions that I have planned. The next essay which I plan to write will be entitled “Layer surfing, or jumping between layers”, and will be coming soon to a blog near you….)

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

I love it when things Just Work

I am currently in the Hilton Portland & Executive Tower hotel, and since I fly entirely too much, I got upgraded into a room which contains a printer. Thinking that I would try using it, I hooked it up to my laptop (running Ubuntu Gutsy), selected System->Administration->Printing on the desktop, and then clicked on New Printer. To my astonishment, when the dialog box came up, the system had already autodetected the fact that I had an HP OfficeJet KX60xi printer connected to the parallel port, had recommended which driver I should use, and a few “next” and “continue” clicks later, the printer was installed, and 15 seconds later I was able to print to it.

Users of MacOS systems are probably used to such things, but this was faster and easier than what Windows asks of users who want to install a new printer. Coming from a Unix background, I would have been quite pleased if I was able to set up the printer after manually select the printer type and driver from a dialog box. Simply not having to su to root, edit some config files, and then restart some daemons, would be a major advance. But this completely exceeded my expectations. Well done to everyone in the CUPS and GNOME community who worked to make this possible!

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit

Tip ‘o the hat, wag o’ the finger — Linux power savings for laptop users

It’s interesting to see how far, and yet how much more work we need to do on power management for Linux. I recently got a new laptop — a Lenovo Thinkpad X61s — and using the powertop tool, I was able to configure my system to the point where in what I can “airplane mail reading mode” (mailbox preloaded into memory, USB disabled, wireless and ethernet disabled, backlight down to 30% brightness, sloppily written power hogs like Firefox and Notes — every single application writer should be forced to run powertap and explain why their program feels it necessary to constantly wake up the CPU), I can get my usage down to about 9.8 watts. Using the 8 hours extended battery, that’s 8 hours of battery life, although granted doing very little. On the flip side, if I’m doing a major kernel compile, I can drive up utilization up to almost 30 watts, which means less than 3 hours of battery life. So that’s definitely the good news; Linux can sharply reduce its power consumption to the point where it is highly competitive with Windows. (And probably better than Vista, just because that OS is so heavy and bloated.) So thanks and and a tip of the hat to Intel and to Arjan van de Ven for making such a useful tool like Powertop available.

So now for the bad news. Getting down to this level of power saving thriftness, where the laptop is carefully sipping only the minimal amounts of power from the battery, is definitely a work in progress. First of all, you can only get this level of power savings by unloading a specific USB driver, uhci_hcd. This will disable low speed devices (including unfortunately the fingerprint reader and the EVDO WWAN device if you were silly enough to buy one that was built into the laptop as opposed to a stand-alone card that you can swap between laptops and lend out to friends as necessary). But how many users are going to open up a terminal window, su to root and type the command “rmmod uhci_hcd”? And know how to reload the driver using “modprobe uhci_hcd” when they need to use the USB devices again?

A similar problem exists for Network Manager; when the user disables the network by right-clicking on the applet, why doesn’t it automatically bring down the interface, instead of forcing the user to manually su to root and then type the command “ifconfig eth0 down; ifconfig wlan0 down”?

A more serious problem is the Intel Wireless driver for the 4965. Even with the wlan0 interface configured down, and with the RF kill switch enabled, keeping the iwl4965 driver loaded will still cost you an extra full watt of power. When you’re down to 9.6 watts, that means that keeping the iwl4965 driver loaded when you don’t need it will cost you a 10% reduction in your battery life! That’s just sloppy, and hopefully it will be fixed in a future update to the iwl4965 driver, but as long as you don’t mind manually removing and reloading it, you can work around this power-saving oversight.

A bigger issue, though, one for which no workaround exists, is that unlike the ipw3945 drivers, which at least had private, non-standard iwpriv commands to engage the 802.11’s power-saving features, the iwl4965 driver has neither the non-standard Intel iwpriv interfaces, nor the standard iwconfig interfaces for enabling any kind of powersaving features, including changing the transmit power of the card. So while powertop deserves plenty of   kudos, iwl4965 deserves a wag of the finger from a power saving viewpoint. No doubt Intel just needs to allocate more money to its Open Source Technology Center so it get more of its crack developers to work improving Linux support for their processors and chipsets.

Speaking of which, I’m still waiting for an Intel x.org 965GM driver that can support compiz/beryl and simultaneously show video clips at the same time… And being able turn off the 50 interrupts/second generated by the video card when they aren’t needed because 3-d graphics aren’t currently in use, without requiring a restart of the X server, would also be a nice touch. The bottom line is that Linux power savings and Linux support for laptops in general is much better than it was a year ago, and a lot of credit has to go to the efforts of Intel’s teams producing such good work as powertop, their wireless drivers, and their open X server drivers. We still have a lot of work left to be done, though!

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Technorati
  • Reddit