SPX - Large capacity, read only, "disc" system

Modern, Memotech inspired, hardware projects
Post Reply
Martin A
Posts: 797
Joined: 09 Nov 2013 21:03

SPX - Large capacity, read only, "disc" system

Post by Martin A »

If anyone remembers Memofest 2018, there was a strange 8 pin chip that found it's way into the photo of the NFX network card.

I think I mentioned at the time that it was an 8 meg serial flash, and capable of taking an entire CPM partition.

I've now managed to get something build to test the idea of running CPM from an SPI flash.

Since 2018, the flash chip I was playing with has gone out of production, the modern replacement is actually twice the capacity, so that's now 16 meg of storage in a 6mm square package, enough for 2 partitions. There's 2 sockets on the board, so in theory there could be 4 partitions, but in practice there isn't enough software to warrant using a 2nd device. There are larger devices available that would fit 4 partitions or more on a tiny chip, but the data transfer protocol is different and again there's no reason to make the change unless the current chip goes out of produsction.

The chip is not only serial, it's 3 volt CMOS. So as well as converting parallel to serial to parallel, the signals have to be level shifted between 3.3v and 5v.
The test board
The test board
Image1.jpg (113.35 KiB) Viewed 4865 times
The test board is double ended, allowing the early work to be done on the MTX plus. Once the I/O routines were running well, the ROM could be written, and testing moved to the MTX512.

Working from the bottom up there is a CPM boot rom and 20 pin GAL to decode the I/O port. The board uses a single port (128, #80) for all transfers. Above the rom are 3 LED's for one for each chip select and one for the SPI clock of an activity indicator. There is a 3 volt regulator to power the flash and 3v devices and some jumpers to account for the differences between the MTX and the MTX plus.

The 20 pin chip in the middle is a 74HC273 octal flip flop which is the output port and runs at 5v. My test MTX has a CMOS CPU for a stock MTX an HCT273 would be needed. The 273 has a clear input attached to reset so is in a known state at start up.

The top row of chips is a 74HC04 hex inverter, it's running from the 3v supply so has to be HC or some other 3v family device. It's providing the output for all 3 LEDs as well as the flash chip selects and the 3v data output port enable. The output port resets to zero, the inverters ensure the flash chips are inactive (high), as is the write buffer.

In the middle is a 74LVC244. I picked the LVC version as it's a 3 volt device but has 5v tolerant inputs. It can take the 5 signals from the output port, and convert them to 3v for the flash. In addition it has 2 separate 4 ports each with their own enable. The 3 state output is something the 273 lacks. One 4 bit port is permanently active, that supplies the 2 chip selects and clock to the inverter and also the enable for the other 4 bit port. The 2nd port deals with the data transfer to the flash, and needs to be 3 stated when the flash is outputting data

The final 20 pin chip is a 74HCT373 octal latch, it's run in transparent mode, with the IN (#80) decode attached to the output enable. 5 of the 8 bits are used, the other 3 read low. The flash chip is configured to supply data in 4 bit mode, however it can also use 1 bit mode. For that reason the while lower 4 bits mirror the 4 bit output, the output that would be used in single bit mode is duplicated at bit 7 for ease of access.

The 2 empty sockets at the top are where the decode GAL goes on the MTXplus and the unused device 2 socket. There are no shift registers or anything like that, the entire serial protocol is handled in software.
Mounted on the MTX
Mounted on the MTX
Image2.jpg (45.95 KiB) Viewed 4865 times
In place on the MTX it illustrates how impractical vertical add-ons are on the MTX. The case end panels have been removed and the keyboard section of the case offset to avoid touching anything on the board. I didn't have any spare right angle edge connectors so it had to be that way.
In use
In use
Image3.jpg (74.71 KiB) Viewed 4865 times
The CPM rom is just the one I put together for the NFX board, with the "disc" driver software re-written. For "Day to Day" use, a custom loaded partition would be more use.

I've just dropped 2 partitions from my Memu install onto the flash for testing, One with the 59k system, the other has the 54k version.

Speed wise it's not as fast as CFX or CFX2 because it's transferring 4 bits at a time instead of 1 or 2 bytes, however it's still capable of over 30k a second so for the size of the usual MTX games load times aren't an issue. There is also some tweaking room in the driver and deblocking routines to speed things up a little.

Both SPI flash chips I tried have 4k sectors, which would seriously compromise write access as all 4k would need to be written each time CPM write a 128 byte "sector" in addition 4k of ram would have to have been found to buffer the sector. Unlike the CFX or Andy's ReMemorizoer (which is also SPI) where the sector size is a more reasonable 512 bytes. Which is why it's read only, a writeable CPM system would be seriously slow.

An alternative version of the boot rom with the SDX basic extensions instead of CPM would be pretty straightforward to produce, as only minor tweaks to the low level driver source would be needed.

The current source for the SPI driver:

Code: Select all

; SPX specific low level routines
SPIport EQU &80
; Port setup - output
; B7 - clk
; B6 - CS1
; B5 - CS0
; B4 - Data write enable
; B3 - hold
; B2 - write protect
; B1 - SO
; B0 - SI

; port data - input
; B7 SO bit for bit mode transfers
; B0-B3 nibble data for 4 bit transfers

; CPM rom expects to see CFinit, CFread, CFwrite
; sector size is set up for 512 byte sectors
;
; Initialise CF card
; after:
;   if ok, Z, CF initialised
;   if not ok, NZ, CF not initialised
;
.CPMCFInit
XOR A                            ; A zero'd and ZF set to indicate all OK
RET



;
; Read block from SPI device
;   before:
;     SDLBA is 32 bit block 512 byte block number, HL is buffer
;   after:
;     if ok, Z
;     if not ok, NZ
;
.CPMCFREAD
push BC
ld a,&30            ; clock low, chip select 0 active, data buffer active
out (SPIport),A     ; alert the chip that a transfer is comming
call CPMsetRead     ; routine sets up a read command buffer
call CPMsendCommand ; send the command that was just set up
ld bc,02            ; reading 512 bytes to the internal buffer
.CPMCFread1
in a,(SPIport)      ; get high nibble
ld (hl),a           ; just store no need to mask or rotate
ld a,&A0            ; clock high, clip select 0, buffer off
out (SPIport),a
ld a,&20            ; clock low, clip select 0, buffer off
out (SPIport),a
in a,(SPIport)      ; get low nibble
rld                 ; 4 bit rotate into (HL) completing the byte
ld a,&A0            ; pulse the clock again
out (SPIport),a
ld a,&20 
out (SPIport),a
inc hl              ; increment the pointer
djnz CPMCFread1     ; repeat until all bytes done
dec c
jr nz CPMCFread1
pop bc
xor A               ; exit with A zero'd and Z set 
out (SPIport),A     ; and turn of the SPI chip 
RET

;
;
; Write block
;   before:
;     SDLBA is 32 bit block 512 byte block number, HL is buffer
;   after:
;     if ok, Z
;     if not ok, NZ 

.CPMCFwrite
xor A
INC A                 ; no write support at present just return an error
RET

;
; set up the command buffer
; Input - SDLBA is count of 512byte sectors
; CF claculation routines

.CPMsetRead
push af
ld a,107
ld (SPIcommand),a    ; set up for read data in 4 bit mode 
xor A
ld (SPIlow),a        ; low byte must be zero as using 512b sectors
ld a,(CPMSDLBA)
rla                  ; rotate 1 bit, low bit is zero as XOR clears carry
ld (SPImid),a
LD a,(CPMSDLBA+1)    ; do the same for the top byte
rla
ld (SPIhigh),a
pop af
RET

; send a 4 byte + dummy command to the SPI chip
; on exit the clock is low, the first 4 bits of data will be clocked on to the
; output pins
.CPMsendCommand
push hl
push bc
ld hl,SPIcommand
LD C,5
.CPMsendByte
LD B,8
.CPMsendBit
LD A,(HL)
RLCA
LD (HL),A
LD A,0
ADC A,&3C           ;chip select 0, buffer enabled, WP high, hold high, data in bit 0
OUT (SPIport),A     ;put the data bit on the pin
OR &80
OUT (SPIport),A     ;flip the clock
LD A,&20
OUT (SPIport),A     ;flip the clock back and deselect everything
DJNZ CPMsendBit
INC HL
DEC C
JR NZ,CPMsendByte
pop bc
pop hl
RET




;   End of SPI low level routines

END
Bill B
Posts: 590
Joined: 26 Jan 2014 16:31

Re: SPX - Large capacity, read only, "disc" system

Post by Bill B »

If the chip is supplying 4 bits of data, would it be better to have two chips in parallel to supply 8 bits? Would the speed improvement in not having to repack the data be worth the cost of the extra chip?
Martin A
Posts: 797
Joined: 09 Nov 2013 21:03

Re: SPX - Large capacity, read only, "disc" system

Post by Martin A »

Interesting idea. :D

The chips are pretty cheap, of the order of £1.20 each in 10's. Putting 2 on won't break the bank! (They are much, MUCH cheaper than parallel flash which costs about that for 1/2 a meg, and imagine the board size and decoding required for 32 of those!)

Adding the extra output port hardware would cost more than the flash would. Decoding 2 IO ports, one for 8 bit data one for control isn't a problem the decode gal has 5 spare pins at the moment.

But for a read only system that might not be needed, the only thing being sent to the flash is the read data command, and that's sent 1 bit at a time using the serial protocol. Keeping the output from the 2 serial in pins from interfering with each other is the only connection issue, and that can probably be sorted by re-jigging the output port buffer.

Because everything is clocked at around 1% of the maximum speed of the chip, any timing difference between devices shouldn't be an issue.

Splitting the Partition image(s) into 4 bit wide data shouldn't be too hard, a bit of Basic on the RiscPC would deal with that.

Having said that, the current system was designed with the the Z80 RLD instruction in mind, the repacking isn't a chore. What 8 bit transfer would save is the entire 2nd clocking sequence, and that's actually slower than the first half because RLD is much slower than LD (HL),A.

I'm tempted to give it a try.
Post Reply