Perhaps the ST chip SPI can be slowed down - usually SPI data rates are controlled by the master (in this case the 328?).
From the diagrams, the timings seem to all be based on the CK signal. These chips have an internal oscillator, or can accept clock in and clock out, or an external crystal or oscillator.
One of the nice things about these chips is that you basically tell it where to go, and it does so, and just raises the !BUSY line when it's done. A big feature that made me drool while reading the datasheet was the Full Step mode. If you try to move it too fast in a microstepping mode, it will kick itself into full step mode to get where you want it to go, then go back to microstepping mode, all while retaining its relative positioning. (Of course, you can limit its max speed, etc, especially while it's cutting)
These chips also have the option of being strung together in a daisy chain (the MOSI to the first one's SDI, its SDO to the next one's SDI, etc, etc, then the last one's SDO to MISO) (See pg. 37) In this case, they share one !CS line. That would certainly cut down on the CPU time, though it is not without its downsides. This would be one reason why a protoboard-type shield would be nice, to wire them up any which way and try it out first
