|
60d0f530bf
|
wordstat: Handle "invalid" UTF-8.
`pycld` is fussy where it comes to UTF-8 (see
https://github.com/mikemccand/chromium-compact-language-detector/issues/22
and https://github.com/aboSamoor/polyglot/issues/71). This strips out
the characters that make `cld` choke.
Thanks to @andreoua for the suggested fix.
|
2018-12-07 21:02:39 +10:00 |
|