Last Friday (June 12, 2015 to be precise), Wikimedia announced that all of their sites would be switching to secure connections by default. Consequently, any tool that uses one of Wikimedia’s APIs must connect through HTTPS, otherwise it will fail on connect. The version of Wikparser currently available on Github has been updated with the fix (thanks to quuuit for […]
Browsing the archives for the Programming category
I never expected to revisit my Wiktionary parser, but I’ve received enough feedback from people to do just that. Briefly, Wikparser is a small app written in PHP able to extract specific lexical information (POS, definitions, gender, synonyms, hypernyms) from either Wiktionary’s API or a local MySQL copy of its database. Much of the code has been […]
After many years of related research, I have finally updated Polylexical.com with additional data and search parameters. Most importantly, all NN and N à N compounds (729 and 319 items respectively) are tagged with information such as headedness and semantic relations. Details regarding the features and parameters contained in the database can be found by clicking […]
I’ve migrated the Wikparser code to GitHub under a GPL licence. Any changes I make in the future will be pushed to the Wikparser GitHub repository. The parser currently supports both English and French versions of Wiktionary, but additional languages may be added with minimal effort. Instructions on how to do so can be found here […]
Although functionally limited, the small Wiktionary parser I developed a few years ago has served its purpose. It was meant to provide a fast and easy way to extract specific information for a given word, which would then be included in a separate database. Its output is crude; its abilities meagre. I may, in the […]
The Wiktionary parser (Wikparser), a small tool able to extract specific information from Wiktionary entries, has been updated to version 0.2c. It can now be used to extract hypernyms. Unfortunately, hypernyms are under-represented in Wiktionary. If you’re working on English, you’d be better served by Wordnet.
Perhaps you need an easy and quick way to extract specific information from a Wiktionary entry, such as a word’s part of speech or definitions? While Wikimedia’s API allows you to retrieve text-only entries for a given word, it doesn’t allow you to target a specific subsection of the entries. This is where the Wikparser […]
Working with accented characters (or any unicode, non latin character for that matter) often poses problems when trying to match them using regular expression functions such as preg_match or preg_replace in PHP. The w expression is meant to match any word character, but it won’t match é or ï in a unicode (UTF-8) encoded string. […]
Say you have a script/api that you want to host on your own server, but would like to set a minimum period of time between calls. If you don’t have access to your Apache settings, there are few simple ways to do it with a bit of PHP code. First, you could use PHP’s sleep() […]