[cc65] Cracknut...

Date view Thread view Subject view

From: Groepaz (groepaz_at_gmx.net)
Date: 2003-05-18 01:25:33


this one kindof results from a little discussion we had on the go64 
mailinglist.... may be interisting to hear some comments :)

ok..the keywords are "efficiency of generated code" and "peephole optimizing" 
(we were discussing compilers in general)

i came up with a simple codesnippet that kindof drastically demonstrates how 
bloated compiled code can get...atleast when the compiler is small-c based 
:=P

first in handcoded asm:

buf:	.res $100

main:
	ldx #0
	txa
lp:
	sta buf,x
	inx
	bne lp
	rts

i think what it does is obvious...

now there is a snippet of C-code doing the same thing:

char x;
char buf[0x100];

 void main(void)
 {
 	x=0;
 	do
 	{
 		buf[x++]=0;
 	} while(x);
 }

cc65 generates the following code (-Osir).... please notice the comments about 
remaining peepholes (more to that later...)

_x:   .res    1,$00
_buf:   .res    256,$00

_main:
        lda     #$00
        sta     _x
L0006:  
	;-peephole start
	; lda     _x
        ; pha
        ; clc
        ; adc     #$01
        ; sta     _x
        ; pla
	;-peephole end
	inc _x
	lda _x
	;-peephole optimization end
	;-peephole start
        ; clc
        ; adc     #<(_buf)
        ; tay
        ; lda     #$00
        ; adc     #>(_buf)
        ; tax
        ; tya
	;-peephole end
	clc
	ldx  #>(_buf)
        adc     #<(_buf)
	bcc @s
	inx
@s:	
	;-peephole optimization end
        sta     sreg
        stx     sreg+1
        lda     #$00
        tay
        sta     (sreg),y
        lda     _x
        bne     L0006
        rts

after removing those 2 peepholes we get:

_main:
        lda     #$00
        sta     _x
L0006:  
	inc _x
	lda _x
	clc
	ldx  #>(_buf)
        adc     #<(_buf)
	bcc @s
	inx
@s:	
        sta     sreg
        stx     sreg+1
        lda     #$00
        tay
        sta     (sreg),y
        lda     _x
        bne     L0006
        rts

now the question is what that actually proves.... a) cc65 doesnt have a 
peephole optimizer or b) the peephole rules are not sufficient. and whatever 
it is, i am very tempted to help with improving it :=) (btw could you tell in 
a few words what type of optimizations cc65 actually does?... and if there is 
peephole optimization, could you point me to the file with the rules in it? 
:))

the second question goes one (or more :=)) step further... is there a way to 
make the compiler access arrays <=256 elements via indexed addressing mode 
rather than indirect? that could probably reduce the above loop even further, 
probably to something quite close to the first mentioned handwritten code. (i 
know it'll prolly involve major changes to the 
compiler/codegenerator/whatever but i'd like to hear your comments anyway... 
maybe there's a small chance or sth :=P)

however...the first thing sounds actually very doable to me (peephole 
optimizing is nothing more than pattern matching anyway) while the second 
appears to be the real cracknut here :o) Nevertheless improving the peephole 
stuff looks very promising to me (i've spotted rts following immediatly on 
jsr a couple of times in compiled code aswell...) ... tjam whatever :)

gpz
----------------------------------------------------------------------
To unsubscribe from the list send mail to majordomo_at_musoftware.de with
the string "unsubscribe cc65" in the body(!) of the mail.


Date view Thread view Subject view

This archive was generated by hypermail 2.1.3 : 2003-05-18 01:30:00 CEST