It seems very raw at this stage. I am not sure where they are headed. The worse thing that can happen with a Natural Language engine is for users to keep getting back messages that the engine doesn't understand. I use to work with NL parsers a long time ago, and they all eventually failed, because the user ended up spending way too much time learning how to phrase the question - and then the user was never sure that the NL engine understood the question correctly and returned a reliable answer. There is a big difference between what Google tries to do (search websites for phrases and return highly rated sites), vs Wolfram which is actually trying to return answers.
My guess is that they are going to have to work a lot on fine tuning their objectives and user interface.
Rich