From: Groepaz (groepaz_at_gmx.net)
Date: 2003-09-10 05:26:50
i'm currently playing with optimizing that raycaster as much as i can without
writing too much external assembly (which can not be completely avoided
unfortunatly) and stumbled about some things...
-
an expression like
register unsigned short x;
register unsigned char xx;
x+=(xx>>2);
translates to
lda regbank+1
lsr a
lsr a
sec
eor #$FF
adc regbank+4
sta regbank+4
lda #$00
eor #$FF
adc regbank+4+1
sta regbank+4+1
two things here... 1) i think the immediate eor after immediate lda should
really be catched by the optimizer :) 2) it maybe a nice possible
optimisation to use a branch/increment type of code for the highbyte when
adding an unsigned char to an unsigned short.
-
to circumvent the above i tried the following macro
#define uaddsc(_a,_b) \
( \
__asm__ ("lda %v", _a), \
__asm__ ("clc"), \
__asm__ ("adc %v", _b), \
__asm__ ("sta %v", _a), \
__asm__ ("bcc @_l"), \
__asm__ ("inc %v+1", _a), \
__asm__ ("@_l:") \
)
mmmh....two problems here
1) for the heck of it, i cant find out how to make the branch work :=P neither
a local label, nor a "*+3" or anything else i could think of would do the
trick...any ideas? :) (sth to generate a unique local label could be a
solution)
2) if any of the arguments passed to the macro are register variables, the
resulting code will be all wrong - it generates references to bogus memory
locations instead of the register variables :=P
-
this macro
#define SINUS(_x) (unsigned char) \
(__AX__= (_x), \
__asm__ ("tay"), \
__asm__ ("lda %v,y", sinustablelow), \
__AX__)
unsigned char i;
register unsigned char xx;
xx=SINUS(i);
gives this:
lda L04C6
tay
lda _sinustablelow,y
sta regbank+1
two things that could come in handy here....
1) additional pseudo-variables for the X and Y registers so that kinda macro
can be written more efficiantly
2) i am wondering why the optimizer doesnt change the lda/tay into ldy (it
_does_ remove the ldx too anyway!) ... i've seen much more unneccesary use of
the y register than the x register actually. (infact its pretty smart about
removing unneeded x-register loads if you arrange the code right)
-
additional questions...
- what impact does it have if i write inline-macros and i dont load parameters
using the __AX__ pseudo-variable but directly like in the uaddsc macro above?
i think it shouldnt make a difference except that using __AX__ will help the
optimizer to eg remove the lda/ldx alltogether if the value is in a/x
currently anyway....but maybe there is more :)
- i'd like to page-align some global arrays (since that saves some cycles for
free on 0x100 byte tables, avoiding crossing page-boundaries) ... is that
possible in C somehow?
- how much do the c64 libraries depend on the system interupt, ie $ea31 beeing
called frequently? will it for example affect the clock() call if i use my
own interupt handler? (i'd like to skip the kernel stuff alltogether and just
do some simple joystick stuff)...i need a working clock/time for syncing :)
mmmmh...thats all for now i guess...back to coding :)
gpz
----------------------------------------------------------------------
To unsubscribe from the list send mail to majordomo_at_musoftware.de with
the string "unsubscribe cc65" in the body(!) of the mail.
This archive was generated by hypermail 2.1.3 : 2003-09-10 05:32:01 CEST