The "Great Debate" is a very emotional exchange that has been running continuously since the late '70s. On some newsgroup somewhere, you will find a thread discussing this topic; although you will have the best luck looking at the Internet comp.lang.asm.x86, comp.lang.c, alt.assembly, or comp.lang.c++ newsgroups. Of course, almost anything you read in these newsgroups is rubbish, no matter what side of the argument the author is championing. Because this debate has been raging for (what seems like) forever, it is clear there is no easy answer; especially one that someone can make up off the top of their head (this describes about 99.9% of all postings to a usenet newsgroup). This page contains a series of essays that discuss the advances of compilers and machine architectures in an attempt to answer the above question.
Although I intend to write a large number of these essays, I encourage others, even those with opposing viewpoints, to contribute to this exchange. If you would like to contribute a "well thought out, non-emotional essay" to this series, please send your contribution (HTML is best, ASCII text is second best) to
debate@webster.ucr.edu
Often, someone will try to "prove" to me that compilers produce really good code by taking some HLL sequence of statements and showing me the outstanding assembly sequence that compiler produces. Folks, there is only one way to prove an "always" condition by example; that's by enumerating all possibilities and showing the condition to be true for all such possibilities. Since there are (for all practical purposes) an infinite number of possible programs one can write, you will not be able to prove a compiler's worthiness by example.
Note that the general question is not "Can a compiler produce a code sequence that is as good as (or better than) a human would?" I consider myself to be an expert assembly language programmer. However, I am not ashamed to admit that I've learned some assembly language tricks by studying the output of various compilers. Just because I wrote some assembly sequence (without looking at some compiler's output) and you found a compiler that bests me doesn't make the compiler better than me. If you feed the same input to a compiler twice, you will always (assuming a deterministic program) get the same output. Give the same problem to an assembly language programmer twice and you're likely to get two different solutions. One will probably be better than the other. Have compilers ever beaten me? Yes. They do it all the time. But on different code sequences I beat the compiler every time. If I really apply myself, I can beat the compiler every time.
Another problem with the "Proof by Example" myth is the fact that pro-compiler types will often use the output of several different compilers to boost their arguments. That is, given three or four different algorithms/code sequences, they may run the code through several different compilers and pick the best output. The problem with this approach is that no single compiler implements everything in the best possible fashion. Some compilers will excel in one area and totally suck in another. If the output of three compilers is complementary (i.e., they each excel where the other two fail), it is possible to pick the best results from the different compilers and present them as though they were outputs from a single compiler. This tends to hide the fact that compilers often fail miserably at some things that a human would handle automatically.
The argument for this policy is simply "Well, if existing compilers can do all these good things seaparately, surely we can merge the best of these compilers into a single product and have something really great." This line of reasoning fails for three reasons-
(1) some optimizations are mutually exclusive. That is, if you perform one type of optimization you cannot perform some other type of optimization on the code. If the "best" example from one compiler uses an optimization technique that is mutually exclusive with the "best" example from a different compiler on a different problem, it may not be possible to merge those two techniques into the same product.
(2) Don't forget that most compilers are commercial products. The quality of the optimizer is often a trade secret and other vendors may not be able to directly clone an optimization technique.
(3) Even if two optimizations are not mutually exclusive, putting the two of them into the same program could produce difficult to maintain code or severely impact the performance of the compiler.
Software engineers have been promising for 20 years now that compilers would merge all known techniques into a single product and we've have really great compilers someday soon. Compilers have gotten better, but they're still a long ways off from perfect.
Note that it is possible to disprove a theory with a single example Therefore, if you want to claim that compilers can always produce better code than humans, all I've got to provide is one example to the contrary. Proving that compilers, on the average, produce better code than an expert assembly language program is far more difficult.
Perhaps the best indication of how well compilers in the future will operate is in the past. By looking at how well compilers have improved their code generation capabilities over the past several decades, we can anticipate how much better they will get over the next decade.
In the late 70's and early 80's there was a flurry of activity with respect to the production of optimization compilers. The result was quite impressive. In ten years, the code from compilers for a give language doubled, tripled, or improved by an even greater percentage. Coincidentally, it was during this time that "Software Engineering" came of age and people began to move away from assembly language because compilers promised high performance with less work.
Unfortunately, compilers in the later 80's and early 90's failed to produce the dramatic improvements seen in the late 70's and early 80's. Indeed, most major performance improvements during this time period came from architectural improvements to the CPU rather than any great advance in compiler technology. Whereas performance gains in the 100-500% area were common with the first way of microprocessor compilers, the improvments dropped well below 100% in the second wave of products (late 80's and early 90's). Today, compiler writers are scratching in the dirt to get gains of 15-30%. Computer architects aren't doing much better. Compiler writing is a fairly mature science at this point. It is very unlikely (short of someone proving that P=NP) that we will ever again see impressive gains in compiler technology with respect to raw performance improvement.
Therefore, extrapolating the past performance of compiler writers to predict how much faster the code will run that compilers will produce ten years from now is very dangerous. Unless there is a radical shift in computer architectures that favors HLLs at the expense of assembly language, it is unlikely the performance gap between good HLL programs and good assembly language programs will become much narrower. Indeed, the only real thing left to do is to consolidate as many optimizations as possible into a single compiler (we are a long ways off from this today). This will probably improve performance by another 50% on the average.
As I mentioned in the previous section, most of the big perfomance gains over the past 20 years have been due to architectural improvements, not to compiler improvements. The mere fact that we've gone from a 5MHz 8088 to a 200MHz Pentium Pro in a high-end PC in 15 years has a lot more to do with the speed of software today than with the quality of compilers. While certain technologies, such as RISC, have closed the gap between human-based machine code output and compiler-based machine code output, the performance boost by compilers pales in comparison that that provided by the newer hardware.
Another problem with contributors to "The Great Debate" is the limited exposure many people have. If you get involved in a thread arguing the relative merits of assembly language vs. C, you will often find the pro-HLL types leading the charge are UNIX programmers. Now I don't want to pigeon-hole all UNIX programmers, but the types I've seen making the argument against assembly language have very little experience outside the UNIX (or mainframe) O/S arena. I think that one could make a very good case that assembly language is a bad thing to use under UNIX. Does that mean assembly language isn't useful elsewhere? Gee, some programmers wearing UNIX blinders sure seem to think so.
Before you start coming up with reasons why assembly language is not a practical tool, make sure you state the domain in which you operate. Claiming "Code doesn't really need to get any faster" or "We don't need to worry about saving memory" are fine arguments when you're working on a 500 MHz DEC Alpha with 1 GByte main memory installed. Are the claims you're making for your environment going to apply to the engineer trying to convince a Barbie doll that it should talk using a $0.50 microcomputer system? Keep in mind, it's the C/C++ (and other HLL) programmers arguing that you should never have to use assembly. The assembly programmers never (okay, rarely) argue that you should always use assembly[1]. It is very difficult to defend a term like "never". It is very easy to defend a term like "sometimes" or "occasionally." Just because you've never been forced to use assembly language in order to achieve some goal doesn't mean it is always possible to avoid assembly. Be careful about those blinders you're wearing when arguing against assembly.
Okay, it seems like a stupid question. Obviously any code written in assembly language is going to have a difficult time running on a different processor (it may not even run efficiently on a processor that is a member of the processor family for which the original code was written). Worse still, you will have to learn several different assembly languages in order to move your code amongst processors. While learning a second or third assembly language is much easier than learning your first, learning all the idiosyncrases that you must know to write fast code still requires quite a bit of work. So it seems that porting code involving assembly language is not a brillant idea.
On the other hand, Software Engineering Researchers typically point out that coding represents only about 30% of the software development effort. Even if your program were written in 100% pure assembly language, one would expect that it would require no more than 40% of the original effort to completely port the code to a new processor (the extra 10% provides some time to handle bugs introduced by typos, etc.).
Perhaps you're thinking 40% is pretty bad. Keep in mind, however, that porting C/C++ code doesn't take zero effort; particularly if you switch operating systems while porting your code. If you're the careful type, who constantly reviews their code to ensure it's portable, you're simply paying this price during initial development rather than during a porting phase (and there is a cost to carefully writing portable code). I am not trying to say that it is as easy to port assembly code as it is to port C/C++ code, I'm only saying that the difference isn't as great as it seems. This is especially true when porting code between operating systems that have different APIs (e.g., porting between flavors of UNIX is easy; now try UNIX -> Windows -> Macintosh -> OS/400 -> MVS -> etc).
Is assembly language easier to read than HLL code? Being an expert assembly language programmer and a fairly accomplished C programmer, I find my own assembly language programs only slightly more difficult to read than my own C programs. On the other hand, I generally take great pains to structure my source code so that it is fairly easy to read (take a look at my code on this web site). I will say this - I've seen some assembly code out there that is absolutely unreadable. Of course, I've also seen my share of C/C++ code that looks like an explosion in an alphabet soup factory.
Of course, only the person doing the reading can really make this judgement call. Obviously, if you know assembly but don't know C/C++, you'll find assembly is easier to read. The reverse is also true. I happen to know both really well and I find a well-written C/C++ program a little easier to read than an assembly language program. Poorly written examples in both languages are so bad they are incomparable. Once a program is unreadable, it is difficult to determine how unreadable it is.
Quick quiz: What does the following C statement do and how long did it take you to figure this out?
*(++s) && *(++s) && *(++s) && *(++s);
Most people (who know 80x86 assembly) would find the corresponding 8086 code much more precise and readable:
mov bx, s mov al, 0 inc bx cmp al, [bx] jz Done inc bx cmp al, [bx] jz Done inc bx cmp al, [bx] jz Done inc bx Done:
This notion exists because people tend to save assembly language programming for the very time critical (and often complex) components of their program. Obviously if you've spent a lot of time and effort arranging the instructions in a certain sequence to ensure the pipeline never stalls, and then you discover that you need to modify the computation that is going on, the new changes will introduce a lot of work since you will have to reschedule each of the instructions.
Of course, it never occurs to people that similar low-level optimizations that occur in HLL programs are very difficult to maintain as well. Consider the well-written (from a performance point of view) Berkeley string routines. These routines need to be completely redone if you move from a 32-bit processor to a 16-bit processor or a 64-bit processor.
As a general rule, any code that is optimized is difficult to maintain. This has led to the proverb "Early optimization is the root of all evil." People perceive that it is difficult to maintain assembly code mainly because the assembly code they've had to deal with is generally optimized code.
What if we don't go in and pull out every unnecessary cycle out of a section of assembly code? Will the code be easier to maintain? Sure. For the same reason non-optimal C code is easy (?) to maintain.
Of course, one of the primary reasons for using assembly language is to reduce the use of system resources (i.e., to optimize one's program). Therefore, when using assembly language in place of a HLL, you're typically going to be dealing with hard to maintain code. Don't forget one thing, however, had you chosen to continue using a HLL rather than dropping down into assembly language, the optimization that would have been necessary in the HLL would have produced hard to maintain HLL code. Keep in mind, optimization is the root of the problem, not simply the choice of assembly language.
Is assembly language easier to write than HLL code? Sometimes. There are certain algorithms that, believe it or not, are easier to understand and implement at a very low level. Bit manipulation is one area where this is true. Also see the section on floating point arithmetic later in this document for more details.
I personally don't know not having really learned assembly language on a RISC chip. I have certainly heard of individuals who have written some butt-kicking code in assembly on a RISC, but this is generally third-hand knowledge. I do know this, though. One of the design principles behind the original RISC design was to study the instructions a typical compiler would use and throw out all the other instructions in a typical CISC instruction set. This suggests that an assembly language programmer has less to work with on RISC chips than on CISC machines. Nevertheless, I will not comment on this subject since I don't have any first hand experience. I invite those who have mastered RISC assembly to write a guest essay for this series.
That depends entirely on the algorithm. Generally, algorithms will fall into one of four categories:
1) Horrible solution in assembly, horrible solution in some HLL.
2) Horrible solution in assembly, elegant solution in some HLL.
3) Elegant solution in assembly, horrible solution in most HLLs.
4) Elegant solution in assembly, elegant solution in some HLL.
Show me your algorithm, I'll tell you which category I think it belongs in.
Although the number of people who know assembly language increases daily (faster than programmers are dying or forget assembly language), the number of people who know a given HLL is generally increasing much faster. While this says something bad about assembly language, what it has to do with the question "Will Compilers Ever Produce Better Code than a Human?" is an interesting question in its own right.
Probably not. This is one big advantage compilers have. If you get a new compiler for a later chip in a CPU family, all you've got to do is recompile your code to take advantage of the new architecture. On the other hand, your hand-written assembly code will need some manual changes to take advantage of architectural changes in the CPU. This fact alone has driven many to condem writing code in assembly. After all, today's super-fast program may run like a dog on tomorrow's architecture. This argument, however, depends upon two fallacies-
(1) Tomorrow's compilers will also take advantage of these architectural features.
(2) The assembly language program used architectural features on today's chips that cause performance losses on tomorrow's chip.
Historically, compilers for the x86 architecture have lagged architectural advances by one or two generations. For example, about the time the Pentium Pro arrived, we were starting to see true 80486 optimizations in compilers. True, many compilers claim to support "Pentium" optmizations. However, such compilers do very little for real programs. Given past support from compiler vendors, coupled with the fact that the trend is to handle really tedious (e.g., instruction scheduling) optimizations directly in the hardware, I personally feel that worrying about a specific member of a CPU family will become a moot point.
Those claiming that hand written assembly language is inferior because the next member of a CPU family will render the code obsolete are missing the whole point of assembly optimization. Except in extreme cases, assembly language programmers rarely optimize at the level of counting cycles or scheduling instructions (as the pro-compiler crowd points out, this is really too tedious a task for human beings). Assembly language programmers achieve their performance gains by typically using "medium-level" optimization that are CPU family dependent, but usually independent of the specific CPU. This is such an important concept that I will devote a completely separate essay in this series to this subject.
Further essays in this series will address the question "Is there a true need to use assembly language?" The "Compilers can generate code that is just as good as humans." is one (albeit incorrect) negative answer to this question. In the following essays I will attempt to answer this question in the positive sense.