Leopar is distributed with the CeCILL Licence
Leopar is a parser for natural languages which is based on the formalism of Interaction Grammars (IG). The parsing principle (called “electrostatic parsing”) consists in neutralizing opposite polarities: a positive polarity corresponds to an available linguistic feature and a negative one to an expected feature. The structures used in IG are underspecified syntactic trees decorated with polarities and called “Polarized Tree Descriptions (PTDs)”.
During the parsing process, PTDs are combined by partial superposition guided by the aim of neutralizing polarities: two opposite polarities are neutralized by merging their support nodes. Parsing succeeds if the process ends with a minimal and neutral tree.
The figure below describes Leopar and the toolchain around it. In the figure, PTD stands for Polarized Tree Description and EPTD for Elementary Polarized Tree Description:
The first parsing step in Leopar is anchoring:
- Each lexicon entry is described with an hypertag (i.e. a feature structure which describes morpho-syntactic information on the lexical unit);
- In order to preserve tokenization ambiguities, tokenization is represented as a DAG (Direct Acyclic Graph) of hypertags;
- For each hypertag describing a lexical unit, the relevant EPTDs are build by instanciation of template EPTDs defined in the grammar.
Paths in the DAG produced by the anchoring represented the set of lexical selections that should be considered by the parsing process. In order to reduce the number of paths and so to speed up the deep parsing, the next step in Leopar is to filter out paths that are doomed to fail. Two kinds of filters are used:
- Polarity filters remove lexical selections for which the set of polarities in not well-balanced.
- Companion filters remove lexical selections for which it can be predicted (from knowledge on the template EPTDs) that some polarity will fail to find a dual polarity (called a companion) able to saturate it.
The atomic operation used during deep parsing is node merging operation. At each step:
- two dual polarities are chosen,
- the two nodes carrying these polarities are merged,
- tree description around the two are superposed.
Of course, in case of dead-lock some backtracking is used to chose another pair of polarities.
(E)PTD are tree descriptions which describes constraints on the phrase-structure tree. The parsing process aims to build a phrase structure tree which is a model the EPTDs chosen for each lexical unit.
Dependency graphs are built from the phrase structure tree but also with information about taken from the parsing process itself.
- More informations in the Leopar's dev page
The parser is developed in the Sémagramme Team (LORIA - INRIA).
- Bruno GUILLAUME is the main developer of the parser
- Guy PERRIER is the main developer of the linguistic resources
- Guillaume BONFANTE contributes to parsing algorithms and brings many other ideas
- Sylvain POGODALLA was one the historical pioneer
- Paul MASSON has worked hard to make everything cleaner, installable and usable
Many other people have contributed to Leopar with ideas, code, resources: Joseph Le Roux, Jonathan Marchand, Mathieu Morey, Karën Fort, Valmi Dufour-Lussier, Shohreh Tabatabayi Seifi, Jennifer Planul, Hassen Ben-Zineb, Masood Ghayoomi, Philippe Schmucker.