On Thu, Jan 19, 2012 at 05:33:47PM +0100, "Andreas Rückert" wrote: > We are talking about a modified version of this code: > > http://www.koders.com/c/fid0D6D481A7D85CEB963C3F4258F30CF903DA541F3.aspx > > or precisely about the unpacking starting in line 103. Interesting. This code doesn't contain the undefined behaviour bug from your first mail:-) If you want to make that fast on the 6502, treat it as a block of bytes, not longs. What the loop on line 103 does is to change byte order of the first 16 32 bit words. This translates to the following asm code (untested): ldy #0 Loop: tya tax lda (block),y sta W+3,x iny lda (block),y sta W+2,x iny lda (block),y sta W+1,x iny lda (block),y sty w+0,x iny cpy #64 bne Loop No longs involved and therefore reasonably fast. > I want to run this code on several platforms, including PCs with > optional GPUs, so rewriting everything in assembler is not really > an option. Maybe some part with conditional compilation. But the > later sha256 rounds are too complex to unloop them by hand. So I'll > keep them as C for now. Problem is that 32 bit integers common on other platforms are extraordinarily slow on the 6502. There is no chance to get that fast on the 6502 without changing the actual implementation. Regards Uz -- Ullrich von Bassewitz uz@musoftware.de ---------------------------------------------------------------------- To unsubscribe from the list send mail to majordomo@musoftware.de with the string "unsubscribe cc65" in the body(!) of the mail.Received on Thu Jan 19 18:17:44 2012
This archive was generated by hypermail 2.1.8 : 2012-01-19 18:17:46 CET