I think your approach would make a great example and addition to the concurrency docs. Being able to do input, output and computation in parallel is a huge benefit. The locks seems effective at avoiding race conditions and I get a decent speedup (~3x on 4 cores) for a few test cases. So while I don't think there is or ever will be a single "right" way to solve concurrency, this is certainly a very viable option.
The only potential issue I see is the `compute` function is required release the GIL to get the full benefit. Most numpy array methods do this but if you've got additional calculations in pure python or call non-GIL-aware C functions, there's a chance that compute will block. If blocking computation outweighs IO significantly, you might not see much performance benefit if at all. Those users would be better served by some form of multiprocessing.
It's hard to know if any arbitrary python code will block the GIL - making it tricky to predict the performance characteristics of this technique. I think a note in the docs re: the compute function and the GIL would be sufficient to warn users of any pitfalls.