From: Ullrich von Bassewitz (uz_at_musoftware.de)
Date: 2003-05-18 04:44:00
On Sun, May 18, 2003 at 01:25:33AM +0200, Groepaz wrote: > now the question is what that actually proves.... a) cc65 doesnt have a > peephole optimizer or b) the peephole rules are not sufficient. It proves that c) you haven't read the docs:-) The coding rules state explicitly that you shouldn't use post increment (and I do also remember telling you that several times when talking about your programs). When writing the code the proper way x = 0; do { ++x; buf[x] = 0; } while (x); you will get the following assembler code: lda #$00 sta _x L000A: inc _x lda #<(_buf) ldx #>(_buf) clc adc _x bcc L000F inx L000F: sta sreg stx sreg+1 lda #$00 tay sta (sreg),y lda _x bne L000A rts This is almost identical to what you propose the compiler should generate. > and whatever > it is, i am very tempted to help with improving it :=) I would say that you overestimate the potentialities of a peephole optimizer. The problems with a peephole optimizer are: 1. It can only handle patterns it knows. 2. A small change in code generation may break lots of patterns. 3. There are lots of order dependencies. Some patterns do only work after others have been replaced, some more patterns must be replaced before others. 4. Checking for patterns, especially larger ones (like in your example) gets a mess rather quickly, because small variances in several places add up to dozens of different resulting patterns. It is easy for a human to see the common base in all these patterns, but not for a computer. Problem #1 means that you can *always* come up with a sequence that runs through a peephole optimizer without major changes - just use a pattern it doesn't know. This is also the answer to the following question: > the second question goes one (or more :=)) step further... is there a way to > make the compiler access arrays <=256 elements via indexed addressing mode > rather than indirect? The replacement you're mentioning here is indeed done for loads (your plasma demo shows this [or should - seems my current development version of the compiler breaks it]). So yes, it is possible. But because of problems 1-4, adding more patterns doesn't always make sense, I'm doing this usually for heavily used patterns. The cc65 optimizer has about 60 different rules. Not all, but most of them are what you would call "peephole optimizing", and many do handle more than just one pattern. If you define an environment variable named "CC65_OPTSTATS" with the name of a file, the compiler will generate a text file with this name that contains single and cumulated statistics about the steps applied, and those that were successful. The file is then updated with every compiler run. > (btw could you tell in > a few words what type of optimizations cc65 actually does?... and if there is > peephole optimization, could you point me to the file with the rules in it? > :)) The cc65 optimizer is much more complex, so there are no static rules. Code optimization is done on the 6502 level only (this is the reason why sometimes the generated code is really clever, especially for simple code, and sometimes absolutely ugly). The following files are concerned with optimization: codeinfo.c codeopt.c coptadd.c coptcmp.c coptneg.c coptsize.c coptstore.c copttest.c coptc02.c coptind.c coptpush.c coptstop.c coptsub.c You will also want to have a look at the following modules, they contain data structures and functions for the abstract representation of the 6502 code, the labels and so on: codeent.c codelab.c codeseg.c opcodes.c reginfo.c > however...the first thing sounds actually very doable to me (peephole > optimizing is nothing more than pattern matching anyway) while the second > appears to be the real cracknut here :o) See my comments above. Maybe I'm wrong, but it would say that you're underestimating the problems. Look at the generated code above, then assume we have some simple changes: 1. The programmer may write buf[++x] = 0; or buf[x++] = 0; instead of ++x; buf[x] = 0; 2. We may have a pointer that points to the buffer. Something like char x; char* buf; void main (void) { x = 0; do { ++x; buf[x] = 0; } while (x); } 3. Either the buffer or the index may be an auto variable on the stack. 4. The index may have some arithmetics applied like in buf[x+1] = 0; 5. The compiler/code generator may use some other zero page register instead of sreg. Just take these (rather simple) variations and calculate how many resulting patterns you get. If you don't find a better solution instead of "nothing more than pattern matching", you will end up with a dozens of patterns just for this case. And all of these patterns may have errors and may need changes when one single line in the code generator gets changed. > Nevertheless improving the peephole > stuff looks very promising to me For me it seems to be pretty maxed out. Just have a look at the list of files concerned with this sort of stuff... > (i've spotted rts following immediatly on > jsr a couple of times in compiled code aswell...) ... tjam whatever :) "jsr xxx/rts" should always get replaced by "jmp xxx". If you do *really* find it in optimized code, you can send me a bug report. Regards Uz -- Ullrich von Bassewitz uz_at_musoftware.de ---------------------------------------------------------------------- To unsubscribe from the list send mail to majordomo_at_musoftware.de with the string "unsubscribe cc65" in the body(!) of the mail.
This archive was generated by hypermail 2.1.3 : 2003-05-18 04:44:21 CEST