We need a natural language tokeniser, and I found a pretty comprehensive one in `polyglot`. It is GPLv3 licensed, and so in respect of the `polyglot` authors' wishes, we will switch this project's license to GPLv3 for compatibility.
BSD licensed for now. We'll see where this takes us first.