Memotest 2017 - Z80 coding contest

Bill B · Post by **Bill B** » 14 Oct 2017 22:36

My entry came in at 41 bytes and 181 cycles.

It was tested in MEMU running CPM.

AndyKey · Post by **AndyKey** » 15 Oct 2017 19:17

My best submission is attached below.
It has

;
; Memofest 2017 - Z80 coding contest
; Submission #3 by Andy Key
; PRNG16 is 41 bytes and 185T excluding final RET, as before
; PRNG17 is 55 bytes and 348T excluding final RET, now < 2*T(PRNG16)
; PRNG17 is 66 bytes and 266T excluding final RET, also possible
;

thewiz · Post by **thewiz** » 16 Oct 2017 14:22

Here was my attempt at 52 bytes.

Code: Select all

CALC_PRNG

;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;*      1111111                     1111111

;* q := ((p XOR q) ROL c)
;*        1111111

		LD A, L			; BC = DE = (HL XOR DE)
		XOR E			; HL = old p
		LD C, A
		LD E, A
		LD A, H
		XOR D
		LD B, A
		LD D, A

; ????????????? SWAP BC and DE USAGE ???????????????

;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;*      1111111      2222222        1111111

		XOR A			; HL = p ROL a
		SRL H
		RR L
		RRA
		SRL H
		RR L
		RRA
		OR H
		;LD H, A

;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;*      1111111 333  2222222        1111111

		;LD A, H		; BC = (p XOR q) XOR (p XOR a)
		XOR B			; DE still equal p XOR q
		LD B, A			; HL = not needed any more
		LD A, L
		XOR C
		LD C, A

		LD H, D
		LD L, E			; HL = DE = p XOR q

;* q := ((p XOR q) ROL c)
;*        1111111  22222

		LD A, D			; LSBit into C flag
		RRA
		RR E			; ROT LSB with C flag
		LD A, D			; ROT MSB with C flag
		RRA
		LD D, E
		LD E, A

;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;*      1111111 333  2222222        1111111  44444

		ADD HL, HL
		ADD HL, HL

;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;*      1111111 333  2222222  555   1111111  44444

		LD A, H
		XOR B
		LD B, A
		LD A, L
		XOR C
		LD C, A

		; At this point
		;	BC = new p
		;	DE = new q

		LD H, B
		LD L, C
		ADD HL, DE

		;	BC = new p
		;	HL = PRNG

		LD A, B			; Swap BC & HL
		LD B, H
		LD H, A
		LD A, C
		LD C, L
		LD L, A
	
		RET

Tony Brewer · Post by **Tony Brewer** » 16 Oct 2017 23:31

Thanks again to everyone who took part in Memotest 2017. Nobody beat or matched my Basic or Advanced code, therefore I cannot declare a winner but I hope those who entered were able to enjoy it.

Code that creates high-quality pseudo-random numbers does have practical applications, if not on the Z80 then on more modern processors. The name of the PRNG algorithm is xoroshiro, short for xor/rotate/shift/rotate.

In the original code, the two states that are summed to give the PRN are called s0 and s1, which I changed to p and q to avoid confusion. My p and q equations expanded the three (p XOR q) terms to make things slightly more complicated.

Solving the problem

First examine the equations

p := (p ROL a) XOR (p SHL b) XOR (q SHL b) XOR p XOR q
q := (p ROL c) XOR (q ROL c)

which can be simplified to

p := p ROL a XOR (p XOR q) SHL b XOR (p XOR q)
q := (p XOR q) ROL c

Define x = p XOR q, then

p := p ROL a XOR x SHL b XOR x
q := x ROL c

p is a function of p and x, whereas q is a function of x only. q is no longer needed once x has been calculated and x can be stored in the registers that held q on entry.

Basic challenge

The best [a,b,c] triplet is [14,2,7] thus

p := p ROL 14 XOR x SHL 2 XOR x
q := x ROL 7

Do p ROL 14 with the quicker p ROR 2.
Do x ROL 7 with the quicker x ROL 8 ROR 1, where ROL 8 is a simple byte swap.
x SHL 2 is ADD x,x twice.

Z80 registers:
HL is the obvious register pair for the PRN, as it is the sum of two 16-bit values with the carry ignored. HL can hold other values until the end of the code when the sum is calculated. BC and DE are equally obvious choices for p and q, or q and p. Most people chose the latter but the former produces code of identical size and speed. A is required for the XOR operations and the flags will always be affected so that registers AF,BC,DE,HL should suffice.

Advanced challenge

The best [a,b,c] triplet is [7,8,12] thus

p := p ROL 7 XOR x SHL 8 XOR x
q := x ROL 12

Do p ROL 7 with the quicker p ROL 8 ROR 1.
Do x ROL 12 with the quicker x ROL 8 ROL 4, where again ROL 8 is a byte swap.
x SHL 8 is x shifted left by a whole byte.

Z80 registers:
BC and DE cannot hold all of p and q as they 17-bit. As the PRN output is the high 16 bits of p + q, it makes sense for BC and DE to hold the high bits. Another register is needed for the low bits of p and q and A was chosen with the successful aim of using only AF,BC,DE,HL again.

Code Optimisation

This was discussed during Memofest 2017, as reported by Andy at
http://www.nyangau.org/memotech/memofest2017.htm

There was general agreement about the types of optimisations that could be used to reduce the size of the solution, including :-

* rewriting the expressions for P and Q to factorise out common terms, ie: P XOR Q
* rotating in the opposite direction if it was quicker eg: ROL 7 as ROR 1 and swap bytes
* do two XORs in a single operation
* removing instructions that did something then the opposite immedately afterwards
* using alternate registers as save/restore
* picking the registers to hold P and Q on entry/exit so as to leave other registers free that can do other things eg: using P=DE, Q=BC, freeing up HL for the PRN, which is handy as you can ADD HL,HL
* rotating the 17 bit values through the carry

I found EX AF,AF' not to be helpful and did not use it, the problem being that while it might save A the carry flag is also switched out and vice-versa. Using the stack was banned mainly to save people from themselves - it was unnecessary and just made the code slower.

x SHL b is one of two things that are easier in the Advanced challenge. Eight is a great number for a left shift and it saves an 8-bit register when calculating p. The other thing that is easier is rotating, as a 17-bit rotate is simpler than 16-bit. At first sight the latter appears trivial but it cannot be done without involving a third register.

One trick not mentioned in the list above is to copy B or C (or D or E) to A when rotating BC (or DE) as A can be rotated in half the time and bytes. Quite often it is not necessary to copy from A back to the original register and it is better to treat BA or DA, say, as a register pair instead of BC or DE.

Byte swapping can also save both code and time. Provided the register contains the right value, an XOR C is just as good as an XOR B when dealing with the high byte of p. The carry flag is manipulated quite a lot in the Advanced code and the instructions must be ordered carefully to use the carry before it is cleared by an XOR.

My Advanced code adds p0 and q0 and uses the resultant carry as an input to an ADC instruction such that HL = BC + DE (as in the Basic challenge) + 1 when the least significant bits of p and q are both 1. To add p0 and q0, bit 7 of A is p0 and bit 7 of another register is q0. The low seven bits of each are all zero, making the result in A after addition identical to an XOR. Thus bit 7 of A is p0 XOR q0 = x0. Instead of A holding p0 and q0 on entry and exit, it is best for it to hold p0 and x0, as the latter has already been calculated for the next iteration. This is why I said on Facebook:

What makes the advanced challenge harder is that p and q are 17-bit, not 16-bit. You can represent the two extra state bits however you choose.

Best PRNG16 and PRNG17 code to follow ...

Tony Brewer · Post by **Tony Brewer** » 17 Oct 2017 22:44

Best PRNG16 and PRNG17 code in attached file.

Prng.zip: (1.72 KiB) Downloaded 1738 times

thewiz · Post by **thewiz** » 19 Oct 2017 14:20

I know the competition is over but I've been tinkering with my code I made and just by using BC and DE as p and q, one of the points made already, I've got my code down to 44 bytes. A saving of 8 bytes.

Regards

thewiz · Post by **thewiz** » 20 Oct 2017 14:44

I've still been tinkering with my code and seen how I can get it to 37 bytes. By making ED = p xor q and moving the ADD HL,HL instructions so that the next XOR can be combined got it down to 38 bytes. Which just left shaving a byte off the p ROL a bit.

Well done Tony, I can see my Z80 skills are rustier then I thought

Tony Brewer · Post by **Tony Brewer** » 22 Oct 2017 20:49

As mentioned above, the proper name for this pseudo-random number generator is xoroshiroN+ where N is the total number of state bits (in the two words s0 and s1 that I called p and q for the contest) and + means the two words should be added, to give a PRN of size N/2 bits.

Below are some [a,b,c] triplets that tests have shown produce the most random results. The first nine are 'unofficial' and the result of brute-force searching for full-period sequences of 2^N-1 different states before repeating, then randomness tests to choose the best triplet (many thanks to evanh). The last one is from the inventors of the algorithm Sebastiano Vigna & David Blackman and how this is proven to be full-period is beyond me.

;state s1,s0 must be seeded to be non-zero
s1 := s0 XOR s1
s0 := s0 ROL a XOR s1 SHL b XOR s1
s1 := s1 ROL c
PRN := s0 + s1

xoroshiro24+ [ 8, 2, 3]
xoroshiro26+ [ 9,12, 3]
xoroshiro28+ [11, 5, 2]
xoroshiro30+ [ 6,10, 8]
xoroshiro32+ [14, 2, 7]
xoroshiro34+ [ 7, 8,12]
xoroshiro36+ [ 8,12,13]
xoroshiro38+ [11, 5,16]
xoroshiro40+ [ 7, 9,12]
xoroshiro128+ [55,14,36]

Smallest Z80 routine to generate next PRN, excluding RET:

xoroshiro32+ = 37 bytes, 169 T-states
xoroshiro34+ = 51 bytes, 211 T-states (PRN = top 16 bits of sum)
xoroshiro40+ = 72 bytes, 285 T-states (PRN = top 16 bits of sum)

Can anyone reading this implement the xoroshiro+ algorithm on the Z80 in fewer bytes?

* M E M O R U M *

Memotest 2017 - Z80 coding contest

Re: Memotest 2017 - Z80 coding contest

Re: Memotest 2017 - Z80 coding contest

Re: Memotest 2017 - Z80 coding contest

Re: Memotest 2017 - Z80 coding contest

Re: Memotest 2017 - Z80 coding contest

Re: Memotest 2017 - Z80 coding contest

Re: Memotest 2017 - Z80 coding contest

Re: Memotest 2017 - Z80 coding contest