Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
php with utf-8 tokenization are broken #106
Comments
|
Thanks @Savier, perhaps the |
I've tried to use php_dedupe_definitions_v2.pkl for my own project and found many functions with broken tokenization. For example, find functions with empty ('') tokens - there are above 8000 of that. Then, If we try to look for all 1-letter tokens we will get tons of 1-letter utf8 tokens which is impossible.