(14) Ce document est un exemple de papyrus réemployé
Syntactic constraints (12) and (13), if formalized and available in the dictionary, rule out respectively <N:ms> from (9) and <V:W> from (10); on the other hand, constructions (11) and (14) respectively justify the choice of <V:W> in (9) and of <N:ms> in (10).
The pre- and post-nominal positions of adjectives are other cases where the total resolution of lexical ambiguity requires a much more detailed syntactic parsing than usual. There exist sentences for which the same task requires a thorough recognition of the whole syntactic structure.
Thus, a valid determination of the tags to be attached to words may depend on the recognition of global sentence structure. This is a circular dependency, since the recognition of syntactic structures is based on lexical information included in tags. This circular dependency is precisely one of the intrinsic difficulties of computer parsing.
The observation that exhaustive resolution of lexical ambiguity depends, in general, on global syntactic parsing, radically affects the nature of the problem. Correct tagging is a by-product of parsing. Thus, ambiguity resolution does not appear to have an object and a solution as a distinct problem: it disappears as a specific problem.
However, an objective of partially resolving, or reducing, lexical ambiguity, when a thorough syntactic parsing is not required, is less ambitious and more realistic. This goal defines a distinct task: filtering the output of dictionary-based tagging, and removing invalid analyses before parsing or application-specific procedures, or before the intervention of linguists building a syntactic parser3. Before filtering, this output is a set of analyses or readings of the text or sentence, and each of these analyses is represented as a sequence of tags generated by dictionary lookup. This process of filtering or selection is meant to facilitate the parsing of the text, by limiting the number of alternative readings of a sentence and the complexity of the data transmitted to the parser, which will produce identical output more efficiently (R. Milne 1986). Taggers and syntactic parsers that work by discarding analyses are said to be reductionistic (A. Voutilainen & P. Tapanainen 1993).
Measures of performance of ambiguity reduction systems ought to be consistent with this goal. They should depend on the reduction of the number of alternative analyses, or on the complexity of the information handed on to the syntactic parser, in order to measure the interest of the method.
If we want to measure the interest of ambiguity reduction independently of the particular parser, or of whatever system to be run after disambiguation, the reduction of the number of alternative readings is obviously the most natural quantity to be used, provided that valid readings are not discarded in the process of filtering.
However, if we want to take into account more accurately the application-related context of ambiguity reduction, we have to measure how much this procedure speeds up further computer processing. Then, in the case of parsing, the method of measuring depends on the algorithmic content of the parser, and namely of the relation between its input and its execution time.
If the time spent in parsing is in proportion to the number of analyses in input, quickly filtering input is likely to speed up the operation, since the number of analyses of a sentence grows exponentially with the average number of tags per word, and therefore grows very quickly with lexical ambiguity.
The execution time of modern parsing algorithms4 depends in a complex way, not only on the number of analyses in input, but also on the complexity of the data structure that represents these analyses. This structure can be a finite automaton. Now, when a finite automaton represents a set of sequences, several measures of the complexity of the automaton are known, but none of them is equivalent to the number of sequences. When there is only one sequence, the automaton is obviously bound to be small; but this result is not always within reach; and when some of the sequences are removed during a process of filtering, the complexity of the automaton may increase or decrease. This is why the complexity of the automaton cannot be used as a quantitative means of measuring the performance of the process. And ambiguity reduction can theoretically either speed up or slow down computer parsing.
According to our experimentations in French (É. Laporte & A. Monceaux 1999), ambiguity reduction is quick and generally brings about a dramatic decrease in the complexity of the automaton that represents the alternative analyses of a sentence. It is plausible, therefore, that applying good disambiguators should speed up parsing. However, this hypothesis will have to be empirically checked when satisfactory parsers are available and able to exploit the content of reasonably informative tags. Theoretically, inserting a filtering phase could indeed slow down the global process, and such an operation could become completely useless in the long run.
But such is not the case yet. On the contrary, it is to be expected that the availability of good disambiguators will facilitate the development of parsers.
Therefore, we will define ambiguity reduction as a procedure applied to analyses resulting from tagging, and aiming at rejecting the largest possible number of wrong analyses with the simplest and quickest possible means.
This definition has several important consequences.
First, ambiguity reduction does not make sense but in combination with another task, such as syntactic parsing, and in this case, one observes a strict coupling between ambiguity reduction and parsing, in the sense that one cannot automate the former procedure disregarding how the latter is automated, and that two computer systems that perform the respective procedures in a compatible way cannot be modified independently. For example, the two systems must use the same set of tags and be based on the same type of analyses. Since the result of ambiguity reduction is handed on to the parser, the reliability of the latter depends on the reliability of the former. We will come back to this issue in section 7.
Second, the definition of objectives is vague, in the sense that they include a limitation of the means of achieving the task. This characterizes the problem as application-related, as opposed to more fundamental problems as lexical description or computer parsing, which have definite objectives. Thus, measuring the performance of a disambiguator is not a theoretical, but a purely empirical enterprise. The little theoretical aspect of the problem may reduce the motivation for studying it, but connections with applications are a compensation. In addition, the vagueness of objectives does not imply at all that the task is easy or that a solution can be the result of a rough-and-ready work. The paradox is only apparent: for successful interfacing with such a complex system as a parser, a disambiguator must produce output of an excellent quality.
Third, in consequence of the division of computer processing between an ambiguity reduction step and a syntactic parsing step, what is not achieved during the former must be achieved during the latter. For example, it falls to the parser to resolve all ambiguity remaining in the end of the first step. The motivation for this division is to limit the first step to simple and quick means. For the global organization of the system, this limit must first be set in a more detailed way. The general framework mentioned above implies that the ambiguity reduction step:
does not involve the systematic recognition of constituents,
does not involve the insertion or use of any boundary symbols, except sentence boundaries,
does not generate new readings in addition to those directly produced by dictionary-based tagging,
does not explicitly represent in output the syntactic transformations that have been applied,
though parsing is certain to resort, in some way or another, to each of these technical means. These limitations are imposed to ambiguity reduction in order to keep it simple and quick. They can also be stated by asserting that this procedure is a filtering and that the context analysed can only be local. For instance, consider the ambiguity of tours:
(15) Il y a deux types de tours médiévales, d'après mon expérience: les défensives et les décoratives
Several predicative senses ('going round', 'trip', 'turn', 'ballot'…) and a technical sense ('lathe') correspond to the masculine: <tour,N:mp>, and the architectural sense ('tower') to the feminine: <tour,N:fp>. A local agreement rule between adjacent noun and adjective can make use of the fact that médiévales is unambiguously in the feminine in order to correctly choose the tag <tour,N:fp>. If the adjective médiévales did not occur in the sentence:
Il y a deux types de tours, d'après mon expérience: les défensives et les décoratives
the only information that could resolve the ambiguity of tours would come from the incomplete noun phrases on the right of the colon. The recognition of the relation between tours and these noun groups would require, among other things, recognizing the complement d'après mon expérience, a task that goes beyond the scope of ambiguity reduction and belongs to syntactic parsing. One can say in this case that the context required to disambiguate tours is not local. (However, a local context is usually not restricted to a word on each side.)
Fourth, the notions of recall (ability to keep valid analyses) and precision (ability to remove invalid analyses) must be distinguished. Reducing ambiguity requires two aptitudes: recall and precision. Our objective, as defined above, is to increase precision as much as possible. We know that precision cannot always reach 100%, and that it normally falls to parsing to resolve all remaining ambiguity. Let us consider the consequences of rejecting a valid analysis. The output of ambiguity reduction is processed by another component, that can be a parser. We already noticed that the reliability of this component depends on the reliability of the disambiguator: the unavoidable consequence of discarding a valid analysis is the failure of the parsing of a whole sentence. Now, a parser is a program designed to be used as an essential component of translation systems, of speech synthesis from written text, or of other applications in which reliability of output is an important parameter. Generating reliable output at an acceptable speed is the main purpose of a parser, and the filtering step is introduced only to speed up the process. Maintaining recall at the level of 100% during the step of ambiguity reduction is therefore the highest priority. Maintaining precision at the maximal possible level is the second priority. Because of this order of priority, a lot of caution is advisable in the use of approximations in ambiguity reduction.
5. Data required for ambiguity reduction
Automatic ambiguity reduction involves analysing and recognizing the grammatical context, in order to check local constraints, which are distributional, grammatical, combinatorial constraints on sequences of words or of tags. For instance, the study of sentence (15) exemplifies how we can take advantage of a constraint on noun-adjective agreement. Such constraints, duely encoded, constitute the linguistic data of the system.
Much controversy surrounds the approaches to the construction or acquisition of these data. They are either elaborated by linguists, or obtained by machine learning.
The former approach is chronologically the first (A. Joshi & Ph. Hopely 1996; Z. Harris 1962; Sh. Klein & R. Simmons 1963; B. Greene & G. Rubin 1971). Formally describing grammatical constraints can remain a mere craft or get more or less industrialized, but it requires a non-trivial analytical reflection. Consider the example of pêcher, which is ambiguous between a noun and a verb:
(16) Il a l'impression de ne pêcher que des poissons-chats
In this sentence, a very local clue indicates that it is a verb: the presence of ne. This observation can be stated as:
(17) When it immediately follows ne, pêcher is a verb
This grammatical constraint correctly discards the nominal tag for pêcher from (16). The difficulty lies in the determination of the adequate level of generality. A little general constraint, like (17), applies rarely and resolves few instances of ambiguity. We can generalize it, a natural and intuitive operation, to (18):
(18) Any word that immediately follows ne is a verb
This second version still correctly applies to (16), but incorrectly rules out the tag <plus,ADV> for plus in:
Il a l'impression de ne plus pêcher que des poissons-chats
An excessively general constraint applies to inadequate cases and can reject valid analyses. This undermines the reliability of the disambiguator and of the syntactic parser, if any, and diverts the system from its first-priority objective of maintaining recall at the level of 100%. In order to determine the acceptable level of generality, writers of formal descriptions of constraints have two methods at their disposal:
- searching texts for examples and counter-examples5, and
- directly constructing examples and especially counter-examples.
These two methods are complementary. The former, though partially automatable, cannot replace the latter. Consider the following example, borrowed from J. Senellart (1999):
(19) In a sequence of the form (a, as, avions) > <V:K>, the first word is a form of the verb avoir
<V:K> stands for past participle. This constraint is designed to resolve the ambiguity of a, as, avions, three forms of the verb avoir which are ambiguous with nouns. It correctly rules out the noun tag from:
Nous avions également popularisé une technologie
and not a single wrong application of (19) was detected in one year of the newspaper Le Monde. Even so, obvious and natural counter-examples exist:
On ne pilote que des avions complètement révisés
in which (19) erroneously removes the noun tag.
The example of (17)-(18) is particularly simple, because it resorts to a very local context. In practice, it is frequently necessary, like in (19), to consider a slightly more extended context, on the left, on the right, or on both sides. The author of the description and formalization of the constraints has to imagine all possible contexts of use of a given word. We insist on the fact that, by definition, parsing tools like systematic recognition of noun phrases and other constituents are not available at that stage.
The second approach to acquiring the linguistic data of disambiguators is generally accepted as the standard technology by the computational-linguistic community. It is based on automatic generalization, or machine learning, from tagged or untagged texts (I. Marshall 1983; J. Benello et al. 1989; B. Merialdo 1994), and, sometimes, from other data like pre-defined rule schemes (E. Brill 1992). This solution is intrinsically approximate. It is oriented towards the processing of cases occurring the most frequently in texts.
The ouptut of automatic generalization can take the form of readable rules, or of unreadable numerical data. In the latter case, the behaviour of the disambiguator is undefined, i.e. the output of the system for a given input can be known by testing the system, but cannot be predicted. In other words, nothing is ensured about the result, and in particular the highest-priority objective of reliably retaining all valid analyses is out of reach.
In addition, very informative tags and contexts encompassing more than one word are difficult to exploit through frequency-based methods. Automatic generalization can indeed be viewed as an exploration of an abstract space the volume of which depends, among other things, on the number of existing tags, on the extent of context taken into account, and on the size of the sample of texts to be explored. An increase in the first parameter may imply an increase in both others. Up to now, the feasibility of testing the approach with fine-grained tags has been limited by the computational complexity involved to explore this space. We mentioned in section 3, for instance, that the separation of senses for adjectives is difficult to deal with in such a framework.
From now on, we will focus on ambiguity reduction data obtained through direct formal description by linguists, and we will consider that a disambiguator is a set of programs but also of linguistic data, including an ambiguity reduction grammar.
6. Examples of grammatical constraints
Formal distributional or grammatical constraints are the main substance of a disambiguator. We will examine examples of issues faced during the elaboration of such data. Each grammatical constraint separately described is sometimes called a rule, and this term cannot always be avoided, though it generally refers to the framework of a network of rules and exceptions. When we use it, however, we will not assume the existence of a system of rule/exception relations. Problems of maintenance are typical of these systems (cf. section 9.1.).
Considering that reliability is an essential quality of a disambiguator, since its output goes on to be exploited by other systems, the definition of our objectives gives priority to ensuring that the process always preserves all valid analyses. This aim is very difficult to achieve. Consider the ambiguity of the words dément, verb or adjective; and visite, noun or verb:
(20) Elle dément qu'il s'agisse d'une visite à l'université
The grammatical words elle and une are indications that dément is a verb and visite a noun. We can state the following restriction on the use of these words:
(21) il, ils, elle, elles do not occur immediately before an adjective
(22) un, une do not occur immediately before a noun
These rules correctly reject the tags6 <dément,A:ms> and <visiter,V:P3s> from (20), but they have counter-examples:
Elle rend celui qui travaille avec elle dément ou génial
Ceux qui voulaient en avoir un affluent à l'entrée du parc
In these sentences, (21) and (22) wrongly rule out <dément,A:ms> and <affluer,V:P3p>. A counter-example suffices to disqualify a constraint that discards valid analyses. The difficulty in avoiding this type of mistake is inherent in the problem. On the one hand, writers of grammatical constraints must know all the observable grammatical contexts of the forms to be processed, and take them into account, which (21) and (22) do not. On the other hand, constraints must be stated so as to recognize a sufficient local context in the sentences, without any previous recognition of phrase boundaries, and although this context may be ambiguous. For instance, constraint (21) can be improved by taking into consideration a wider left context — which makes it less general. If an explicit sentence boundary or a subordinating conjunction occurs immediately before <il,PRO>, (21) applies better. The underlying syntactic fact is the presence of a clause boundary, but grammatical constraints cannot refer to clause boundaries, since analyses are filtered before the syntactic parsing that will, among other things, recognize those boundaries.
Describing grammatical constraints for ambiguity reduction has another unpleasant aspect: it is impossible to complete the description. This was to be expected, since the objective of a complete syntactic description goes beyond the scope of ambiguity reduction. However, it may be impossible to process two very similar cases with the same constraint. Consider again, for instance, constraint (21), that correctly applies immediately after an explicit sentence boundary, like in (20). Syntactically, an adjacency between a pronoun and a verb is a contingent detail of a structure, since a complement may be inserted between elle and dément without changing the basic sentence:
Elle, sans hésiter, habituée à assumer ses responsabilités, dément qu'il s'agisse d'une visite à l'université
However, constraint (21) does not apply any more, and cannot be adapted so that it applies, since this would mean recognizing completely the complement inserted.
Due to these intrinsic difficulties, formalizing and encoding grammatical constraints is a hard, sometimes frustrating task. One could even claim that the task is unfeasible, arguing that any ambiguity reduction constraint will necessarily have counter-examples; or that the (few) existing systems have already faced these difficulties in all the possible ways, and reached the best possible results. However, these opinions are not based on verifiable facts. On the contrary, we think that the intrinsic difficulties of the problem are well-known, but that solutions have not been investigated systematically. Such an investigation could follow two complementary approaches: on the one hand, elaborating linguistic analyses at the root of the process; on the other hand, building a formalism for stating and applying grammatical constraints. We will examine these two topics successively.
7. Underlying linguistic analyses
Ambiguity reduction always takes place as a step in a global process aiming at assigning a linguistic analysis to sentences of written texts, or several analyses in the case of ambiguous sentences. This goal is in turn a prerequisite for certain types of procedures on written texts. Linguistic analyses are formal descriptions ranging from representations of minimal elements of the text, to syntactic structures of sentences. The question of deciding which analyses are to be assigned to sentences is obviously a fundamental one, though it is little debated in literature. As a matter of fact, the computer data and software components that participate in the process must refer to the same underlying linguistic analyses, which creates an interdependency between these components.
The assignement of a formal structural description to a sentence implies the following steps:
- the elementary units are identified in an electronic dictionary and the corresponding tags are assigned to words, including compound words; at this stage, what we call an analysis or reading of a sentence is a sequence of tags;
- these readings are filtered in order to reduce ambiguity quickly;
- when the selected readings are parsed, relevant constituents and transformations are recognized.
The three steps of the process are dictionary lookup, ambiguity reduction and syntactic parsing. The three modules that implement them rely on a set of linguistic data, respectively an electronic dictionary, an ambiguity reduction grammar, and a syntactic description of the language. Due to conceptual interrelations between these data sets, they must be based on the same analyses. It may happen that several analyses of a given sentence are conceivable: problems can arise when different analyses are chosen for the elaboration of the linguistic data, or when the chosen analysis is formalized in different ways.
Consider for example the following sentence:
Les supporters sont souvent décidés, certains sont même violents
The subject of sont même violents can be analysed in two ways. The first solution is to consider that certains is a pronoun in this sentence; the electronic dictionary, therefore, must describe it as ambiguous between a determiner, <certains,DET:mp> and a pronoun, <certains,PRO:mp>, in addition to one or several adjectives <certain,A:mp>. The determiner entry is meant for sentences like: