From: Ullrich von Bassewitz (uz_at_musoftware.de)
Date: 2003-05-18 04:44:00
On Sun, May 18, 2003 at 01:25:33AM +0200, Groepaz wrote:
> now the question is what that actually proves.... a) cc65 doesnt have a
> peephole optimizer or b) the peephole rules are not sufficient.
It proves that
c) you haven't read the docs:-)
The coding rules state explicitly that you shouldn't use post increment (and I
do also remember telling you that several times when talking about your
programs). When writing the code the proper way
x = 0;
do {
++x;
buf[x] = 0;
} while (x);
you will get the following assembler code:
lda #$00
sta _x
L000A: inc _x
lda #<(_buf)
ldx #>(_buf)
clc
adc _x
bcc L000F
inx
L000F: sta sreg
stx sreg+1
lda #$00
tay
sta (sreg),y
lda _x
bne L000A
rts
This is almost identical to what you propose the compiler should generate.
> and whatever
> it is, i am very tempted to help with improving it :=)
I would say that you overestimate the potentialities of a peephole optimizer.
The problems with a peephole optimizer are:
1. It can only handle patterns it knows.
2. A small change in code generation may break lots of patterns.
3. There are lots of order dependencies. Some patterns do only work after
others have been replaced, some more patterns must be replaced before
others.
4. Checking for patterns, especially larger ones (like in your example)
gets a mess rather quickly, because small variances in several places
add up to dozens of different resulting patterns. It is easy for a
human to see the common base in all these patterns, but not for a
computer.
Problem #1 means that you can *always* come up with a sequence that runs
through a peephole optimizer without major changes - just use a pattern it
doesn't know. This is also the answer to the following question:
> the second question goes one (or more :=)) step further... is there a way to
> make the compiler access arrays <=256 elements via indexed addressing mode
> rather than indirect?
The replacement you're mentioning here is indeed done for loads (your plasma
demo shows this [or should - seems my current development version of the
compiler breaks it]). So yes, it is possible. But because of problems 1-4,
adding more patterns doesn't always make sense, I'm doing this usually for
heavily used patterns.
The cc65 optimizer has about 60 different rules. Not all, but most of them are
what you would call "peephole optimizing", and many do handle more than just
one pattern. If you define an environment variable named "CC65_OPTSTATS" with
the name of a file, the compiler will generate a text file with this name that
contains single and cumulated statistics about the steps applied, and those
that were successful. The file is then updated with every compiler run.
> (btw could you tell in
> a few words what type of optimizations cc65 actually does?... and if there is
> peephole optimization, could you point me to the file with the rules in it?
> :))
The cc65 optimizer is much more complex, so there are no static rules. Code
optimization is done on the 6502 level only (this is the reason why sometimes
the generated code is really clever, especially for simple code, and sometimes
absolutely ugly). The following files are concerned with optimization:
codeinfo.c codeopt.c coptadd.c coptcmp.c coptneg.c coptsize.c
coptstore.c copttest.c coptc02.c coptind.c coptpush.c coptstop.c
coptsub.c
You will also want to have a look at the following modules, they contain data
structures and functions for the abstract representation of the 6502 code, the
labels and so on:
codeent.c codelab.c codeseg.c opcodes.c reginfo.c
> however...the first thing sounds actually very doable to me (peephole
> optimizing is nothing more than pattern matching anyway) while the second
> appears to be the real cracknut here :o)
See my comments above. Maybe I'm wrong, but it would say that you're
underestimating the problems. Look at the generated code above, then assume we
have some simple changes:
1. The programmer may write
buf[++x] = 0;
or
buf[x++] = 0;
instead of
++x;
buf[x] = 0;
2. We may have a pointer that points to the buffer. Something like
char x;
char* buf;
void main (void)
{
x = 0;
do {
++x;
buf[x] = 0;
} while (x);
}
3. Either the buffer or the index may be an auto variable on the stack.
4. The index may have some arithmetics applied like in
buf[x+1] = 0;
5. The compiler/code generator may use some other zero page register
instead of sreg.
Just take these (rather simple) variations and calculate how many resulting
patterns you get. If you don't find a better solution instead of "nothing more
than pattern matching", you will end up with a dozens of patterns just for
this case. And all of these patterns may have errors and may need changes when
one single line in the code generator gets changed.
> Nevertheless improving the peephole
> stuff looks very promising to me
For me it seems to be pretty maxed out. Just have a look at the list of files
concerned with this sort of stuff...
> (i've spotted rts following immediatly on
> jsr a couple of times in compiled code aswell...) ... tjam whatever :)
"jsr xxx/rts" should always get replaced by "jmp xxx". If you do *really* find
it in optimized code, you can send me a bug report.
Regards
Uz
--
Ullrich von Bassewitz uz_at_musoftware.de
----------------------------------------------------------------------
To unsubscribe from the list send mail to majordomo_at_musoftware.de with
the string "unsubscribe cc65" in the body(!) of the mail.
This archive was generated by hypermail 2.1.3 : 2003-05-18 04:44:21 CEST