The Great Debate

"Will Compilers Ever Produce Code as Good as an Expert Assembly Language Programmer?"

(Randall Hyde)

Part IV: Fast Enough Isn't

The "Great Debate" is a very emotional exchange that has been running continuously since the late '70s. On some newsgroup somewhere, you will find a thread discussing this topic; although you will have the best luck looking at the Internet comp.lang.asm.x86, comp.lang.c, alt.assembly, or comp.lang.c++ newsgroups. Of course, almost anything you read in these newsgroups is rubbish, no matter what side of the argument the author is championing. Because this debate has been raging for (what seems like) forever, it is clear there is no easy answer; especially one that someone can make up off the top of their head (this describes about 99.9% of all postings to a usenet newsgroup). This page contains a series of essays that discuss the advances of compilers and machine architectures in an attempt to answer the above question.

Although I intend to write a large number of these essays, I encourage others, even those with opposing viewpoints, to contribute to this exchange. If you would like to contribute a "well thought out, non-emotional essay" to this series, please send your contribution (HTML is best, ASCII text is second best) to

debate@webster.ucr.edu

In this essay, I would like to spend some time talking about the speed of a program. This essay is a plea for higher performance programs, not necessarily a plea that programmers write their programs in assembly language. It is possible to write slow programs in assembly language, it is usually possible to write faster programs in a HLL. Of course, people generally associate the use of assembly language with high performance software, hence the inclusion of this essay in this series. In this essay I will discuss some of the reasons (excuses) programmers given for not writing fast programs and then I will discuss why performance is still an issue today and will continue to be an issue in the future even with machines 1,000 times faster than those we have today.

I was an undergraduate at UC Riverside in the middle 70s, just at the end of the "efficiency is everything" period of software engineering. In the late 60s and early 70s, it was still common to find large application programs written in assembly language because the cost of running software far exceeded the cost of writing the software. Since then, Moore's law has been in full swing. Machines have doubled in speed every three or so years and the prices have dropped dramatically. Since the 70s, of course, the cost of developing software has overtaken and far exceeded the cost of running the software on a given machine[1].

This deemphasis on efficiency has produced an obvious side effect - since schools no longer teach their students to write efficient code, the students never get any optimization experience. Since they never get this experience, they are completely unable to optimize code when the need arises. Human nature is to ignore what you do not understand. Hence most programmers make excuses why a program cannot be or should not be optimized. Here are many of the common excuses:

If it's too slow on today's machines, just wait two years and the software will be fast enough.
The machine spends most of its time waiting for the user to enter keystrokes. There is no sense in speeding up such software.
Multiple processor machines are just around the corner.
Most of the software never executes anyway (i.e., the 90/10 rule).
Compiling technology is improving all the time (next year's compiler will be faster).
Optimization would take too long and the product would lose market share.
Competing products aren't any faster.
Much of the program's performance is due to software beyond the control of the programmers (e.g., the O/S, run-time libraries, third-party software, etc.).
The time spent optimizing a program could be put to better use adding new features to the program.
The program is already fast enough.

I'm not going to bother addressing all the excuses above on a point by point basis. Most excuses are exactly that - an excuse trying to cover up the programmer's own inadequacies. Some of them, however, are worth a few comments.

Future technology. Someday computers will be fast enough (and compilers will be producing fast enough code) that today's dog software will run as a respectable rate. For example, today's computers are typically 1,000 times faster than the computers that were available 20 years ago. Programs that were too slow to run on those machines run just fine today (e.g., 3-D graphics and multimedia applications). If your program runs at about half the speed it should, just wait three years and computers will be fast enough (and compilers will be generating faster code) so your application will perform in a satisfactory manner.

To understand what's wrong with this picture, just take a look at your own personal machine. If it's a relatively state of the art machine, figure out how much three and four year old software you have running on it. Probably very little. You're probably running the latest version (or nearly the latest version) of every program you common use. Programmers who feel that all they have to do is wait a few years for hardware technology to catch up with their software forget that three years down the road they will be writing software that requires the machines to be faster still. That software will probably require two or three processors to run reasonably well. The end result is that the end users inevitably wind up running the latest version of the software quite a bit slower than it really should be running. Since most software is purchased by new machine owners, those buying the software rarely have the opportunity to "downgrade" their programs to an earlier version (since they don't own the earlier version).

Optimization is too expensive: You will often hear programmers using phrases like "market window" and "time to market" as reasons for avoiding an optimization phase in their software. While these are all valid concerns, these same programmers think nothing of spending additional time to add new features to a product even though these new features increase the development cost, lose market opportunity, and increase the time to market. A programmer who eschews performance, something every user can appreciate, for an obscure feature than almost no one will ever use (but looks good on a product comparison matrix) is really fooling themselves.

Perceived vs. actual speed: From operating systems theory we learn that there are several different ways to measure the performance of a software system. Throughput is, essentially, a measurement of the amount of calculation a software system achieves within a given time interval[2]. Response time is a measure of how long a program takes to produce a result once the user supplies the necessary input(s) to a computation. Overhead is the amount of time the system requires to support a computation that is not directly related to producing the result of a computation.

Overall throughput is an important measure. It describes the amount of time a user will take while running a program to produce some desired result[3]. If you increase throughput, you will increase users' productivity since they can finish using the program sooner and begin working on other tasks.

Response time and throughput, interestingly enough, are often at odds with one another. Programming techniques that improve response time often reduce throughput and vice versa. However, poor response time gives the perception that a program is running slow, regardless of the actual throughput. In most cases, response time is actually more important than actual throughput. The actual speed of an interactive program is less important than the user's perception of its speed. A lot of research into human response time indicates that users perceive quantum differences in performance rather than incremental improvements. Generally, users can perceive the following differences in response time:

Instantaneous: less than 1/20th of a second, or so.
Fast: between 1/10 and 1/4 seconds.
Delayed: between 1/2 and 1.0 seconds.
Sluggish: a few seconds.
Loss of attention span: between 10 and 20 seconds.
Loss of activity: around one minute.
Loss of memory: ten or more minutes.

Instantaneous response time is what every application should shoot for. As soon as the user hits the enter key or otherwise indicates that a computation may now take place, the program should be back with the result. Even fast response time, although noticeable, goes largely ignored by a typical user. However, once the response time of a program heads into the delayed or sluggish area, users tend to get annoyed with the software. This creates a distraction that affects them psychologically and results in slight lower productivity greater than the throughput of the program would suggest.

Once a program's response time exceeds a few seconds and approaches 10 seconds or so, a very bad thing happens: the response time exceeds the user's attention span and the user loses his or her train of thought. Once the answer finally does appear, the user has to remember what they were doing resulting in even less productivity.

Somewhere between 10 seconds and a minute, the user starts looking for a completely different task to work on. Once the user is involved with another task, the information provided by the current computation may go unused for some time period while the user wraps up the other task.

The last phase associated with response time is loss of memory -- users simply forget that they are working on a given problem and, being involved in something else, may never think to look back to respond to the information provided. Let me give a real good example of this problem. I started a backup on my Win 95 machine. The backup takes about one to two hours. So I started working on this essay in the meantime. As I type this sentence, the backup has long since completed, but I'd forgotten about that backup (and the fact that I really should be working on a different problem than this essay on my Win 95 machine) since I'd become involved with this essay.

There are a few important things to note about these response time categories. First, it generally doesn't help/hurt the perceived performance of a program if you change its response time and the new response time still falls within the same category. For example, if your programs response time improves from four down to two seconds, most users won't really notice a big improvement in the system performance. Users do notice a difference when you switch from one category to the next. The second thing to note is that "near future technological advancements" generally do not speed up your software to the point it will switch from one category to the next. That typically requires an order of magnitude improvement; the type of improvement that is possible only with a major algorithmic change or by using assembly language.

If you cannot improve the response time of a program to the point where you switch from one category to the next, you may as well concentrate on improving the throughput of your system. Just make sure that improving the throughput doesn't impact response time to the point it falls into a slower category.

Fast enough isn't. Now for the real point of this essay. A large number of programmers simply feel that their programs are fast enough. Perhaps they've met the minimal performance requirements specified for that program. Perhaps it runs great on their high-end development platforms (which are often two to four times faster than a typical user's machine). Whatever the case, the developer is happy; should s/he waste any time making that program run faster? Well, this essay wouldn't exist if the answer were no. After all, fast enough, isn't.

Consider a typical application. If a software developer has written his or her software so that it runs just fast enough on a given platform, you'd better believe that software was tested on a machine with no other software running at the same time. Now imagine the poor end user running this software on a Macintosh, a Win95, a Win NT, or a UNIX machine, along with several other programs. Now that program that was fast enough is running dog slow. Look folks, a simple fact of life is that you can no longer assume your software has the machine all to itself. Those days died with MS-DOS.

On the other hand, if you make your software run twice as fast as it really needs to, then two such programs can run concurrently on a machine and still run fast enough. Likewise, if your program runs four times faster than it really needs to, four (or more) such programs could run concurrently.

Of course, a typical developer might claim that multiprocessor systems will solve this problem. Want to run more programs? No sweat, just add more processors. There are two problems with this theory. First, you have the future technology problem mentioned above. As users purchase machines that have multiple processors, they will also be purchasing software that winds up using all the power of those multiple processors. Second, there is a limit to the number of processors you can add to a typical system and expect performance to improve.

Of course, one cannot generalize this argument to every piece of software in existence. Some programs, for example, do have all the resources of the underlying system (an embedded system, for example). Nevertheless, for commercial applications one would expect to buy for a personal computer system, it shouldn't be the task of the software developer to determine how to waste CPU cycles, that should be the user's.

[1] It has not necessarily exceeded the cost of using the software. See the essay on economic concerns for more details.

[2] Throughput is actually the inverse of this -- the number of tasks completed in a given time interval. This essay will ignore this difference since both view describe the same thing.

[3] I will ignore the amount of time the program spends waiting for user input in this discussion. If the user gets up and takes a coffee break in the middle of using a program, that shouldn't logically affect the throughput at all. Throughput describes what the program is capable of, not what actually happens.