I never expected to revisit my Wiktionary parser, but I’ve received enough feedback from people to do just that.
Briefly, Wikparser is a small app written in PHP able to extract specific lexical information (POS, definitions, gender, synonyms, hypernyms) from either Wiktionary’s API or a local MySQL copy of its database.
Much of the code has been rewritten, cleaned up, improved. This new version of Wikparser (0.3) is available on Github and is, of course, open to anyone interested in modifying it further. For information on how to use it and on how to add language support, have a look at this page, also updated. Most changes are within the code itself, but I have added new functionality to the software, most notably:
- It should be a little bit faster. Not a whole lot faster, mind you, but I’ve cleaned up some of the functions and improved the MySQL querying.
- It can now extract gender.
- It now works natively with Spanish and German. For the time being, support for these languages is considered partial. If you’re a native speaker, don’t hesitate to contact me with any suggestions regarding improving support for these languages or any other languages.
The parser still only provides basic output (no XML or JSON output yet), but despite its limited functionality, it would seem that it has nevertheless proven useful to some.