Memotest 2017  Z80 coding contest
Re: Memotest 2017  Z80 coding contest
My entry came in at 41 bytes and 181 cycles.
It was tested in MEMU running CPM.
It was tested in MEMU running CPM.
 Attachments

 prng16_bb.zip
 Zipped PRNG code
 (1023 Bytes) Downloaded 72 times
Re: Memotest 2017  Z80 coding contest
My best submission is attached below.
It has
;
; Memofest 2017  Z80 coding contest
; Submission #3 by Andy Key
; PRNG16 is 41 bytes and 185T excluding final RET, as before
; PRNG17 is 55 bytes and 348T excluding final RET, now < 2*T(PRNG16)
; PRNG17 is 66 bytes and 266T excluding final RET, also possible
;
It has
;
; Memofest 2017  Z80 coding contest
; Submission #3 by Andy Key
; PRNG16 is 41 bytes and 185T excluding final RET, as before
; PRNG17 is 55 bytes and 348T excluding final RET, now < 2*T(PRNG16)
; PRNG17 is 66 bytes and 266T excluding final RET, also possible
;
 Attachments

 contest.zip
 (25.71 KiB) Downloaded 71 times
{{{ Andy
Re: Memotest 2017  Z80 coding contest
Here was my attempt at 52 bytes.
Code: Select all
CALC_PRNG
;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;* 1111111 1111111
;* q := ((p XOR q) ROL c)
;* 1111111
LD A, L ; BC = DE = (HL XOR DE)
XOR E ; HL = old p
LD C, A
LD E, A
LD A, H
XOR D
LD B, A
LD D, A
; ????????????? SWAP BC and DE USAGE ???????????????
;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;* 1111111 2222222 1111111
XOR A ; HL = p ROL a
SRL H
RR L
RRA
SRL H
RR L
RRA
OR H
;LD H, A
;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;* 1111111 333 2222222 1111111
;LD A, H ; BC = (p XOR q) XOR (p XOR a)
XOR B ; DE still equal p XOR q
LD B, A ; HL = not needed any more
LD A, L
XOR C
LD C, A
LD H, D
LD L, E ; HL = DE = p XOR q
;* q := ((p XOR q) ROL c)
;* 1111111 22222
LD A, D ; LSBit into C flag
RRA
RR E ; ROT LSB with C flag
LD A, D ; ROT MSB with C flag
RRA
LD D, E
LD E, A
;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;* 1111111 333 2222222 1111111 44444
ADD HL, HL
ADD HL, HL
;* p := p XOR q XOR (p ROL a) XOR ((p XOR q) SHL b)
;* 1111111 333 2222222 555 1111111 44444
LD A, H
XOR B
LD B, A
LD A, L
XOR C
LD C, A
; At this point
; BC = new p
; DE = new q
LD H, B
LD L, C
ADD HL, DE
; BC = new p
; HL = PRNG
LD A, B ; Swap BC & HL
LD B, H
LD H, A
LD A, C
LD C, L
LD L, A
RET
 Attachments

 MAIN.zip
 Competition entry with code to output first 16 numbers.
 (404 Bytes) Downloaded 69 times
THIS is what Memotech is doing now.

 Posts: 62
 Joined: 08 Jan 2014 20:50
Re: Memotest 2017  Z80 coding contest
Thanks again to everyone who took part in Memotest 2017. Nobody beat or matched my Basic or Advanced code, therefore I cannot declare a winner but I hope those who entered were able to enjoy it.
Code that creates highquality pseudorandom numbers does have practical applications, if not on the Z80 then on more modern processors. The name of the PRNG algorithm is xoroshiro, short for xor/rotate/shift/rotate.
In the original code, the two states that are summed to give the PRN are called s0 and s1, which I changed to p and q to avoid confusion. My p and q equations expanded the three (p XOR q) terms to make things slightly more complicated.
Solving the problem
First examine the equations
p := (p ROL a) XOR (p SHL b) XOR (q SHL b) XOR p XOR q
q := (p ROL c) XOR (q ROL c)
which can be simplified to
p := p ROL a XOR (p XOR q) SHL b XOR (p XOR q)
q := (p XOR q) ROL c
Define x = p XOR q, then
p := p ROL a XOR x SHL b XOR x
q := x ROL c
p is a function of p and x, whereas q is a function of x only. q is no longer needed once x has been calculated and x can be stored in the registers that held q on entry.
Basic challenge
The best [a,b,c] triplet is [14,2,7] thus
p := p ROL 14 XOR x SHL 2 XOR x
q := x ROL 7
Do p ROL 14 with the quicker p ROR 2.
Do x ROL 7 with the quicker x ROL 8 ROR 1, where ROL 8 is a simple byte swap.
x SHL 2 is ADD x,x twice.
Z80 registers:
HL is the obvious register pair for the PRN, as it is the sum of two 16bit values with the carry ignored. HL can hold other values until the end of the code when the sum is calculated. BC and DE are equally obvious choices for p and q, or q and p. Most people chose the latter but the former produces code of identical size and speed. A is required for the XOR operations and the flags will always be affected so that registers AF,BC,DE,HL should suffice.
Advanced challenge
The best [a,b,c] triplet is [7,8,12] thus
p := p ROL 7 XOR x SHL 8 XOR x
q := x ROL 12
Do p ROL 7 with the quicker p ROL 8 ROR 1.
Do x ROL 12 with the quicker x ROL 8 ROL 4, where again ROL 8 is a byte swap.
x SHL 8 is x shifted left by a whole byte.
Z80 registers:
BC and DE cannot hold all of p and q as they 17bit. As the PRN output is the high 16 bits of p + q, it makes sense for BC and DE to hold the high bits. Another register is needed for the low bits of p and q and A was chosen with the successful aim of using only AF,BC,DE,HL again.
Code Optimisation
This was discussed during Memofest 2017, as reported by Andy at
http://www.nyangau.org/memotech/memofest2017.htm
x SHL b is one of two things that are easier in the Advanced challenge. Eight is a great number for a left shift and it saves an 8bit register when calculating p. The other thing that is easier is rotating, as a 17bit rotate is simpler than 16bit. At first sight the latter appears trivial but it cannot be done without involving a third register.
One trick not mentioned in the list above is to copy B or C (or D or E) to A when rotating BC (or DE) as A can be rotated in half the time and bytes. Quite often it is not necessary to copy from A back to the original register and it is better to treat BA or DA, say, as a register pair instead of BC or DE.
Byte swapping can also save both code and time. Provided the register contains the right value, an XOR C is just as good as an XOR B when dealing with the high byte of p. The carry flag is manipulated quite a lot in the Advanced code and the instructions must be ordered carefully to use the carry before it is cleared by an XOR.
My Advanced code adds p0 and q0 and uses the resultant carry as an input to an ADC instruction such that HL = BC + DE (as in the Basic challenge) + 1 when the least significant bits of p and q are both 1. To add p0 and q0, bit 7 of A is p0 and bit 7 of another register is q0. The low seven bits of each are all zero, making the result in A after addition identical to an XOR. Thus bit 7 of A is p0 XOR q0 = x0. Instead of A holding p0 and q0 on entry and exit, it is best for it to hold p0 and x0, as the latter has already been calculated for the next iteration. This is why I said on Facebook:
Code that creates highquality pseudorandom numbers does have practical applications, if not on the Z80 then on more modern processors. The name of the PRNG algorithm is xoroshiro, short for xor/rotate/shift/rotate.
In the original code, the two states that are summed to give the PRN are called s0 and s1, which I changed to p and q to avoid confusion. My p and q equations expanded the three (p XOR q) terms to make things slightly more complicated.
Solving the problem
First examine the equations
p := (p ROL a) XOR (p SHL b) XOR (q SHL b) XOR p XOR q
q := (p ROL c) XOR (q ROL c)
which can be simplified to
p := p ROL a XOR (p XOR q) SHL b XOR (p XOR q)
q := (p XOR q) ROL c
Define x = p XOR q, then
p := p ROL a XOR x SHL b XOR x
q := x ROL c
p is a function of p and x, whereas q is a function of x only. q is no longer needed once x has been calculated and x can be stored in the registers that held q on entry.
Basic challenge
The best [a,b,c] triplet is [14,2,7] thus
p := p ROL 14 XOR x SHL 2 XOR x
q := x ROL 7
Do p ROL 14 with the quicker p ROR 2.
Do x ROL 7 with the quicker x ROL 8 ROR 1, where ROL 8 is a simple byte swap.
x SHL 2 is ADD x,x twice.
Z80 registers:
HL is the obvious register pair for the PRN, as it is the sum of two 16bit values with the carry ignored. HL can hold other values until the end of the code when the sum is calculated. BC and DE are equally obvious choices for p and q, or q and p. Most people chose the latter but the former produces code of identical size and speed. A is required for the XOR operations and the flags will always be affected so that registers AF,BC,DE,HL should suffice.
Advanced challenge
The best [a,b,c] triplet is [7,8,12] thus
p := p ROL 7 XOR x SHL 8 XOR x
q := x ROL 12
Do p ROL 7 with the quicker p ROL 8 ROR 1.
Do x ROL 12 with the quicker x ROL 8 ROL 4, where again ROL 8 is a byte swap.
x SHL 8 is x shifted left by a whole byte.
Z80 registers:
BC and DE cannot hold all of p and q as they 17bit. As the PRN output is the high 16 bits of p + q, it makes sense for BC and DE to hold the high bits. Another register is needed for the low bits of p and q and A was chosen with the successful aim of using only AF,BC,DE,HL again.
Code Optimisation
This was discussed during Memofest 2017, as reported by Andy at
http://www.nyangau.org/memotech/memofest2017.htm
I found EX AF,AF' not to be helpful and did not use it, the problem being that while it might save A the carry flag is also switched out and viceversa. Using the stack was banned mainly to save people from themselves  it was unnecessary and just made the code slower.There was general agreement about the types of optimisations that could be used to reduce the size of the solution, including :
* rewriting the expressions for P and Q to factorise out common terms, ie: P XOR Q
* rotating in the opposite direction if it was quicker eg: ROL 7 as ROR 1 and swap bytes
* do two XORs in a single operation
* removing instructions that did something then the opposite immedately afterwards
* using alternate registers as save/restore
* picking the registers to hold P and Q on entry/exit so as to leave other registers free that can do other things eg: using P=DE, Q=BC, freeing up HL for the PRN, which is handy as you can ADD HL,HL
* rotating the 17 bit values through the carry
x SHL b is one of two things that are easier in the Advanced challenge. Eight is a great number for a left shift and it saves an 8bit register when calculating p. The other thing that is easier is rotating, as a 17bit rotate is simpler than 16bit. At first sight the latter appears trivial but it cannot be done without involving a third register.
One trick not mentioned in the list above is to copy B or C (or D or E) to A when rotating BC (or DE) as A can be rotated in half the time and bytes. Quite often it is not necessary to copy from A back to the original register and it is better to treat BA or DA, say, as a register pair instead of BC or DE.
Byte swapping can also save both code and time. Provided the register contains the right value, an XOR C is just as good as an XOR B when dealing with the high byte of p. The carry flag is manipulated quite a lot in the Advanced code and the instructions must be ordered carefully to use the carry before it is cleared by an XOR.
My Advanced code adds p0 and q0 and uses the resultant carry as an input to an ADC instruction such that HL = BC + DE (as in the Basic challenge) + 1 when the least significant bits of p and q are both 1. To add p0 and q0, bit 7 of A is p0 and bit 7 of another register is q0. The low seven bits of each are all zero, making the result in A after addition identical to an XOR. Thus bit 7 of A is p0 XOR q0 = x0. Instead of A holding p0 and q0 on entry and exit, it is best for it to hold p0 and x0, as the latter has already been calculated for the next iteration. This is why I said on Facebook:
Best PRNG16 and PRNG17 code to follow ...What makes the advanced challenge harder is that p and q are 17bit, not 16bit. You can represent the two extra state bits however you choose.
Last edited by Tony Brewer on 17 Oct 2017 23:59, edited 1 time in total.

 Posts: 62
 Joined: 08 Jan 2014 20:50
Re: Memotest 2017  Z80 coding contest
Best PRNG16 and PRNG17 code in attached file.
Re: Memotest 2017  Z80 coding contest
I know the competition is over but I've been tinkering with my code I made and just by using BC and DE as p and q, one of the points made already, I've got my code down to 44 bytes. A saving of 8 bytes.
Regards
Regards
THIS is what Memotech is doing now.
Re: Memotest 2017  Z80 coding contest
I've still been tinkering with my code and seen how I can get it to 37 bytes. By making ED = p xor q and moving the ADD HL,HL instructions so that the next XOR can be combined got it down to 38 bytes. Which just left shaving a byte off the p ROL a bit.
Well done Tony, I can see my Z80 skills are rustier then I thought
Well done Tony, I can see my Z80 skills are rustier then I thought
THIS is what Memotech is doing now.

 Posts: 62
 Joined: 08 Jan 2014 20:50
Re: Memotest 2017  Z80 coding contest
As mentioned above, the proper name for this pseudorandom number generator is xoroshiroN+ where N is the total number of state bits (in the two words s0 and s1 that I called p and q for the contest) and + means the two words should be added, to give a PRN of size N/2 bits.
Below are some [a,b,c] triplets that tests have shown produce the most random results. The first nine are 'unofficial' and the result of bruteforce searching for fullperiod sequences of 2^N1 different states before repeating, then randomness tests to choose the best triplet (many thanks to evanh). The last one is from the inventors of the algorithm Sebastiano Vigna & David Blackman and how this is proven to be fullperiod is beyond me.
;state s1,s0 must be seeded to be nonzero
s1 := s0 XOR s1
s0 := s0 ROL a XOR s1 SHL b XOR s1
s1 := s1 ROL c
PRN := s0 + s1
xoroshiro24+ [ 8, 2, 3]
xoroshiro26+ [ 9,12, 3]
xoroshiro28+ [11, 5, 2]
xoroshiro30+ [ 6,10, 8]
xoroshiro32+ [14, 2, 7]
xoroshiro34+ [ 7, 8,12]
xoroshiro36+ [ 8,12,13]
xoroshiro38+ [11, 5,16]
xoroshiro40+ [ 7, 9,12]
xoroshiro128+ [55,14,36]
Smallest Z80 routine to generate next PRN, excluding RET:
xoroshiro32+ = 37 bytes, 169 Tstates
xoroshiro34+ = 51 bytes, 211 Tstates (PRN = top 16 bits of sum)
xoroshiro40+ = 72 bytes, 285 Tstates (PRN = top 16 bits of sum)
Can anyone reading this implement the xoroshiro+ algorithm on the Z80 in fewer bytes?
Below are some [a,b,c] triplets that tests have shown produce the most random results. The first nine are 'unofficial' and the result of bruteforce searching for fullperiod sequences of 2^N1 different states before repeating, then randomness tests to choose the best triplet (many thanks to evanh). The last one is from the inventors of the algorithm Sebastiano Vigna & David Blackman and how this is proven to be fullperiod is beyond me.
;state s1,s0 must be seeded to be nonzero
s1 := s0 XOR s1
s0 := s0 ROL a XOR s1 SHL b XOR s1
s1 := s1 ROL c
PRN := s0 + s1
xoroshiro24+ [ 8, 2, 3]
xoroshiro26+ [ 9,12, 3]
xoroshiro28+ [11, 5, 2]
xoroshiro30+ [ 6,10, 8]
xoroshiro32+ [14, 2, 7]
xoroshiro34+ [ 7, 8,12]
xoroshiro36+ [ 8,12,13]
xoroshiro38+ [11, 5,16]
xoroshiro40+ [ 7, 9,12]
xoroshiro128+ [55,14,36]
Smallest Z80 routine to generate next PRN, excluding RET:
xoroshiro32+ = 37 bytes, 169 Tstates
xoroshiro34+ = 51 bytes, 211 Tstates (PRN = top 16 bits of sum)
xoroshiro40+ = 72 bytes, 285 Tstates (PRN = top 16 bits of sum)
Can anyone reading this implement the xoroshiro+ algorithm on the Z80 in fewer bytes?