Re: [cc65] Macros in inline assembler

From: Ullrich von Bassewitz <uz1musoftware.de> Date: 2012-01-19 18:17:36 · This archive was generated by hypermail 2.1.8 : 2012-01-19 18:17:46 CET

On Thu, Jan 19, 2012 at 05:33:47PM +0100, "Andreas Rückert" wrote:
> We are talking about a modified version of this code:
>
> http://www.koders.com/c/fid0D6D481A7D85CEB963C3F4258F30CF903DA541F3.aspx
>
> or precisely about the unpacking starting in line 103.

Interesting. This code doesn't contain the undefined behaviour bug from your
first mail:-)

If you want to make that fast on the 6502, treat it as a block of bytes, not
longs. What the loop on line 103 does is to change byte order of the first 16
32 bit words. This translates to the following asm code (untested):

        ldy     #0
Loop:   tya
        tax
        lda     (block),y
        sta     W+3,x
        iny
        lda     (block),y
        sta     W+2,x
        iny
        lda     (block),y
        sta     W+1,x
        iny
        lda     (block),y
        sty     w+0,x
        iny
        cpy     #64
        bne     Loop

No longs involved and therefore reasonably fast.

> I want to run this code on several platforms, including PCs with
> optional GPUs, so rewriting everything in assembler is not really
> an option. Maybe some part with conditional compilation. But the
> later sha256 rounds are too complex to unloop them by hand. So I'll
> keep them as C for now.

Problem is that 32 bit integers common on other platforms are extraordinarily
slow on the 6502. There is no chance to get that fast on the 6502 without
changing the actual implementation.

Regards

        Uz

-- 
Ullrich von Bassewitz                                  uz@musoftware.de
----------------------------------------------------------------------
To unsubscribe from the list send mail to majordomo@musoftware.de with
the string "unsubscribe cc65" in the body(!) of the mail.