From: Groepaz (groepaz_at_gmx.net)
Date: 2003-09-10 05:26:50
i'm currently playing with optimizing that raycaster as much as i can without writing too much external assembly (which can not be completely avoided unfortunatly) and stumbled about some things... - an expression like register unsigned short x; register unsigned char xx; x+=(xx>>2); translates to lda regbank+1 lsr a lsr a sec eor #$FF adc regbank+4 sta regbank+4 lda #$00 eor #$FF adc regbank+4+1 sta regbank+4+1 two things here... 1) i think the immediate eor after immediate lda should really be catched by the optimizer :) 2) it maybe a nice possible optimisation to use a branch/increment type of code for the highbyte when adding an unsigned char to an unsigned short. - to circumvent the above i tried the following macro #define uaddsc(_a,_b) \ ( \ __asm__ ("lda %v", _a), \ __asm__ ("clc"), \ __asm__ ("adc %v", _b), \ __asm__ ("sta %v", _a), \ __asm__ ("bcc @_l"), \ __asm__ ("inc %v+1", _a), \ __asm__ ("@_l:") \ ) mmmh....two problems here 1) for the heck of it, i cant find out how to make the branch work :=P neither a local label, nor a "*+3" or anything else i could think of would do the trick...any ideas? :) (sth to generate a unique local label could be a solution) 2) if any of the arguments passed to the macro are register variables, the resulting code will be all wrong - it generates references to bogus memory locations instead of the register variables :=P - this macro #define SINUS(_x) (unsigned char) \ (__AX__= (_x), \ __asm__ ("tay"), \ __asm__ ("lda %v,y", sinustablelow), \ __AX__) unsigned char i; register unsigned char xx; xx=SINUS(i); gives this: lda L04C6 tay lda _sinustablelow,y sta regbank+1 two things that could come in handy here.... 1) additional pseudo-variables for the X and Y registers so that kinda macro can be written more efficiantly 2) i am wondering why the optimizer doesnt change the lda/tay into ldy (it _does_ remove the ldx too anyway!) ... i've seen much more unneccesary use of the y register than the x register actually. (infact its pretty smart about removing unneeded x-register loads if you arrange the code right) - additional questions... - what impact does it have if i write inline-macros and i dont load parameters using the __AX__ pseudo-variable but directly like in the uaddsc macro above? i think it shouldnt make a difference except that using __AX__ will help the optimizer to eg remove the lda/ldx alltogether if the value is in a/x currently anyway....but maybe there is more :) - i'd like to page-align some global arrays (since that saves some cycles for free on 0x100 byte tables, avoiding crossing page-boundaries) ... is that possible in C somehow? - how much do the c64 libraries depend on the system interupt, ie $ea31 beeing called frequently? will it for example affect the clock() call if i use my own interupt handler? (i'd like to skip the kernel stuff alltogether and just do some simple joystick stuff)...i need a working clock/time for syncing :) mmmmh...thats all for now i guess...back to coding :) gpz ---------------------------------------------------------------------- To unsubscribe from the list send mail to majordomo_at_musoftware.de with the string "unsubscribe cc65" in the body(!) of the mail.
This archive was generated by hypermail 2.1.3 : 2003-09-10 05:32:01 CEST