Re: [cc65] code generation

Date view	Thread view	Subject view

From: Groepaz (groepaz_at_gmx.net)
Date: 2003-09-15 01:05:01

Previous message: Ullrich von Bassewitz: "Re: [cc65] code generation"
In reply to: Ullrich von Bassewitz: "Re: [cc65] code generation"
Next in thread: Ullrich von Bassewitz: "Re: [cc65] code generation"
Next in thread: Maciej Witkowiak: "Re: [cc65] Optimizing C code"
Reply: Ullrich von Bassewitz: "Re: [cc65] code generation"

On Sunday 14 September 2003 13:27, Ullrich von Bassewitz wrote:

> I've had a look at the preprocessor: What you're suggesting is rather
> difficult to add to the current implementation[1]. But because of some
> other problems with the current implementation, the preprocessor may get a
> rewrite at some time, and then I may consider your proposal.

i see .... so remind me to remind you when its time to do it :=P

> > that should allow labels in inline asm in both macros and functions with
> > the following restrictions:
>
> [...]
>
> Here is one more: The concept does not allow generation of local labels
> outside of macros, because it relies on a counter incremented when a macro
> is expanded.

oh...it should only be a problem with generating local labels outside a macro 
if these are generated directly after locals which in turn are inside a 
macro... just let the counter increment everytime a function starts aswell 
(or each time a .proc directive is emitted maybe)

a found another interisting problem though (and a real clean generic solution 
could be really tricky i guess :=P) .... think of a macro that a) has an AX 
assignment in between a jump and the jump target label and b) gets another 
macro (which also has locals) as paramter ... urgs :=) (somekind of real 
nested lexical levels and all that crap is needed here to get it 1001% right 
i guess :=P)

(to lazy to write correct syntax now...)

#define test1(a,b) \
        AX = a \
	inc a \
        bcc @l1 \
        AX = b \
	@l1: \
	AX

#define test2(c) \
	AX = c \
	inx \
        bcc @l1 \
	inx \
	@l1: \
	AX 

test1(2,test2(4));

.... the @l1 from test1() will be in the wrong "namespace" since the expansion 
of test2() had already incremented the pointer further.... a real true 
recursive macro expansion can probably do the job though :)

> This and the load/transfer stuff is in the current snapshot, but I do

hope to be able to test this soon (couldnt check the register variable stuff 
yet either :/)

> somehow regret adding it, because it saves just a few bytes, at least when
> compiling the samples and libraries. For new peephole optimization
> proposals I would suggest a few statistics highlighting the
> advantages/disadvantages.

mmmh well... IMHO every optimization that does both decrease codesize and 
increase execution speed at the same time (as the immidiate lda/ora etc 
merging does) is worth the trouble. yes it probably does only save a few 
bytes, admittedly it even saves only a few bytes in code where i exploit 
certain special cases. however, adding this and maybe a handful of additional 
things really has potential - its amazing to see how smart some of the 
compiler generated code actually looks after opt65 removed a few hickups. 
(let alone the bunch of useless tax/txa instructions it removes) also, the 
difference between opt65 and the cc65 internal optimizer doesnt seem to be 
that large either - ie, theres not much left that an improved opt65 could 
possibly do, any further stuff needs knowledge about the code generator 
(which opt65 does not have by design) or the program flow or whatever else is 
pretty hard to implement without a parse tree and that kinda things :)

tjam, and i find it kinda hard to make any figures about how "worthy" a 
certain optimization is :=P it really depends on your testprogram what kind 
of possibly optimizations you will even spot - and even then the value of a 
removed instruction may raise and fall with that particular instruction 
beeing in the most inner loop of your program or not :)

for short, i have to agree that optimizations that trade speed for size or 
vice versa can be problematic for general use, but all the rest are worth 
beeing used - personally i can happily live with (much) increased compile 
time then too.

> I do really, really doubt this number, at least for compiler generated
> code. It may be true for inline asm or use of __AX__/__EAX__, but in this
> case it's a problem if the asm code. Within the samples and library
> sources, there is exactly ONE module that profits from the optimization -
> and the reason is that this modules makes heavy use of the is... functions
> from ctype.h that are inline assembly.

i'm only testing with what the compiler makes from my raycaster code.... a 
handful inline-asm macrose there, but the rest is pretty much plain C (with a 
few additional routines in asm, but they arent touched by opt65 or anything 
at all)

some of the things that happen here may be really specific, i dont know....but 
actually i dont really think so - i am just copying some techniques that i am 
used to from assembler coding :) (and amazingly lot of the compiler generated 
code looks like what i would probably write when doing plain asm....after a 
peephole run with opt65 that is :=P)

however, i will release the thing as an example for such kind of hacking 
(getting most of the speed out of C with no respect to final code size :=)))
.... i want to add one or two more variants of the core algorithm (mostly to 
have a look myself what works best in cc65) and by then i think it should 
contain quite a bunch of different variations of a handful of sub-problems 
which can be compared then.

mmmh....maybe if i find the time i'll try building contiki with all the stuff 
piped through opt65.... that could be interisting :)

> BTW, your macro is a good example of code that will not work in certain
> cases
>
> as pointed out by me in my last mail within this thread:
> > #define getblock(_x,_y) (unsigned char) \
> >    	( \
> >    	__AX__= (_y), \
> >    	__asm__ ("txa"), \
> >    	__asm__ ("tay"), \
> >    	__AX__= (_x), \
> >    	__asm__ ("lda %v,x", worldptr_lo ), \
> >    	__asm__ ("sta ptr1"), \
> >    	__asm__ ("lda %v,x", worldptr_hi ), \
> >    	__asm__ ("sta ptr1+1"), \
> >    	__asm__ ("lda (ptr1),y"), \
> >    	__asm__ ("ldx #%b", 0), \
> >      __AX__)
>
> The second assignment to __AX__ will destroy the Y register just loaded in
> all but the most simplest cases (where _x is a global variable).

ouch! i wouldnt have expected that here really.... uhmz :=P the assignment 
somehow suggests that it doesnt touch Y at all (even if in reality it cant 
really avoid using it)

gpz

----------------------------------------------------------------------
To unsubscribe from the list send mail to majordomo_at_musoftware.de with
the string "unsubscribe cc65" in the body(!) of the mail.

Previous message: Ullrich von Bassewitz: "Re: [cc65] code generation"
In reply to: Ullrich von Bassewitz: "Re: [cc65] code generation"
Next in thread: Ullrich von Bassewitz: "Re: [cc65] code generation"
Next in thread: Maciej Witkowiak: "Re: [cc65] Optimizing C code"
Reply: Ullrich von Bassewitz: "Re: [cc65] code generation"

Date view	Thread view	Subject view

This archive was generated by hypermail 2.1.3 : 2003-09-15 01:16:32 CEST