Browsing the archives for the Lexicography category Updated

in Info, Lexicography, Programming

After many years of related research, I have finally updated with additional data and search parameters. Most importantly, all NN and N à N compounds (729 and 319 items respectively) are tagged with information such as headedness and semantic relations. Details regarding the features and parameters contained in the database can be found by clicking […]

Wikparser now on GitHub

in Lexicography, Programming

I’ve migrated the Wikparser code to GitHub under a GPL licence. Any changes I make in the future will be pushed to the Wikparser GitHub repository. The parser currently supports both English and French versions of Wiktionary, but additional languages may be added with minimal effort. Instructions on how to do so can be found here […]

Wiktionary Parsers

in Lexicography, Programming

Although functionally limited, the small Wiktionary parser I developed a few years ago has served its purpose. It was meant to provide a fast and easy way to extract specific information for a given word, which would then be included in a separate database. Its output is crude; its abilities meagre. I may, in the […]

Wiktionary Parser Update

in Lexicography, Programming

The Wiktionary parser (Wikparser), a small tool able to extract specific information from Wiktionary entries, has been updated to version 0.2c. It can now be used to extract hypernyms. Unfortunately, hypernyms are under-represented in Wiktionary. If you’re working on English, you’d be better served by Wordnet. – A Database of French Compounds

in Lexicography

My doctoral work is primarily on the semantics of compounding in French. In order to conduct my research, I needed access to a repository of compounds that allowed for specific types of search queries (e.g. all plural N-N compounds), but none met my needs. Mathieu-Colas’s database of French compounds, probably the most well-known such online […]

Wikparser – A Wiktionary Text Parser

in Lexicography, Programming

Perhaps you need an easy and quick way to extract specific information from a Wiktionary entry, such as a word’s part of speech or definitions? While Wikimedia’s API allows you to retrieve text-only entries for a given word, it doesn’t allow you to target a specific subsection of the entries. This is where the Wikparser […]


Installing a local copy of Wiktionary (MySQL)

in Lexicography

For my lexicographic research, I chose to use Wiktionary because it’s one of the very few online dictionaries that allows you to extract information from its database. Even better, however, is that the Wikimedia Foundation regularly makes dumps of each of its projects available to the public free of charge. These dumps are available as […]