I like the idea in principle, but in practice I feel that it would limit RP rather than enhance it. It's difficult to portray a character's voice accurately, particularly when they belong to entirely different cultures, ages, and genders to their players.
My being restricted to only playing twenty-something characters with a twisted Australian accent might mangle a few eardrums, for instance.
As you said, there's nothing to stop the use of text to work around that, but - and I'm far from an expert on anything more advanced than a toaster - I suspect the additional load on the server from delivering and storing all those recordings may not be worth the trade-off.