source: trunk/yao/share/antlr-2.7.7/doc/err.html @ 1

Last change on this file since 1 was 1, checked in by lnalod, 15 years ago

Initial import of YAO sources

File size: 14.5 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
2<html>
3
4<head>
5<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
6<title>Error Handling and Recovery</title>
7</head>
8
9<body bgcolor="#FFFFFF">
10
11<h2><a name="_bb1"></a><a name="lexicalanalysis">Error
12    Handling and Recovery</a></h2>
13    <p>All syntactic and semantic errors cause parser exceptions to be thrown. In particular,
14    the methods used to match tokens in the parser base class (match et al) throw
15    MismatchedTokenException. If the lookahead predicts no alternative of a production in
16    either the parser or lexer, then a NoViableAltException is thrown. The methods in the
17    lexer base class used to match characters (match et al) throw analogous exceptions.</p>
18    <p>ANTLR will generate default error-handling code, or you may specify your own exception
19    handlers. Either case results (where supported by the language) in the creation of a <tt>try/catch</tt>
20    block. Such <tt>try{}</tt> blocks surround the generated code for the grammar element of
21    interest (rule, alternate, token reference, or rule reference). If no exception handlers
22    (default or otherwise) are specified, then the exception will propagate all the way out of
23    the parser to the calling program. </p>
24    <p>ANTLR's default exception handling is good to get something working, but you will have
25    more control over error-reporting and resynchronization if you write your own exception
26    handlers. </p>
27    <p>Note that the '@' exception specification of PCCTS 1.33 does not apply to ANTLR.</p>
28    <h3><a name="ANTLR Exception Hierarchy">ANTLR Exception Hierarchy</a></h3>
29    <p>ANTLR-generated parsers throw exceptions to signal recognition errors or other stream
30    problems.&nbsp; All exceptions derive from <font face="Courier New">ANTLRException</font>.
31    &nbsp; The following diagram shows the hierarchy:</p>
32    <p><img src="ANTLRException.gif" width="646" height="263"
33    alt="ANTLRException.gif (14504 bytes)"></p>
34    <table border="0" width="100%">
35      <tr>
36        <th width="50%">Exception</th>
37        <th width="50%">Description</th>
38      </tr>
39      <tr>
40        <td width="50%" align="left" valign="top"><small><font face="Courier New">ANTLRException</font></small></td>
41        <td width="50%">Root of the exception hiearchy.&nbsp; You can directly subclass this if
42        you want to define your own exceptions unless they live more properly under one of the
43        specific exceptions below.</td>
44      </tr>
45      <tr>
46        <td width="50%" align="left" valign="top"></td>
47        <td width="50%"></td>
48      </tr>
49      <tr>
50        <td width="50%" align="left" valign="top"><small><font face="Courier New">CharStreamException</font></small></td>
51        <td width="50%">Something bad that happens on the character input stream.&nbsp; Most of
52        the time it will be an IO problem, but you could define an exception for input coming from
53        a dialog box or whatever.</td>
54      </tr>
55      <tr>
56        <td width="50%" align="left" valign="top"><small><font face="Courier New">CharStreamIOException</font></small></td>
57        <td width="50%">The character input stream had an IO exception (e.g., <font
58        face="Courier New">CharBuffer.fill()</font> can throw this).&nbsp; If <font
59        face="Courier New">nextToken()</font> sees this, it will convert it to a <font
60        face="Courier New">TokenStreamIOException</font>.</td>
61      </tr>
62      <tr>
63        <td width="50%" align="left" valign="top"></td>
64        <td width="50%"></td>
65      </tr>
66      <tr>
67        <td width="50%" align="left" valign="top"><small><font face="Courier New">RecognitionException</font></small></td>
68        <td width="50%">A generic recognition problem with the input.&nbsp; Use this as your
69        &quot;catch all&quot; exception in your main() or other method that invokes a parser,
70        lexer, or treeparser.&nbsp; All parser rules throw this exception.</td>
71      </tr>
72      <tr>
73        <td width="50%" align="left" valign="top"><small><font face="Courier New">MismatchedCharException</font></small></td>
74        <td width="50%">Thrown by CharScanner.match() when it is looking for a character, but
75        finds a different one on the input stream.</td>
76      </tr>
77      <tr>
78        <td width="50%" align="left" valign="top"><small><font face="Courier New">MismatchedTokenException</font></small></td>
79        <td width="50%">Thrown by Parser.match() when it is looking for a token, but finds a
80        different one on the input stream.</td>
81      </tr>
82      <tr>
83        <td width="50%" align="left" valign="top"><small><font face="Courier New">NoViableAltException</font></small></td>
84        <td width="50%">The parser finds an unexpected token; that is, it finds a token that does
85        not begin any alternative in the current decision.</td>
86      </tr>
87      <tr>
88        <td width="50%" align="left" valign="top"><small><font face="Courier New">NoViableAltForCharException</font></small></td>
89        <td width="50%">The lexer finds an unexpected character; that is, it finds a character
90        that does not begin any alternative in the current decision.</td>
91      </tr>
92      <tr>
93        <td width="50%" align="left" valign="top"><small><font face="Courier New">SemanticException</font></small></td>
94        <td width="50%">Used to indicate syntactically valid, but nonsensical or otherwise bogus
95        input was found on the input stream.&nbsp; This exception is thrown automatically by
96        failed, validating semantic predicates such as:<pre>a : A {false}? B ;</pre>
97        <p>ANTLR generates:</p>
98        <pre><small>match(A);
99if (!(false)) throw new
100  SemanticException(&quot;false&quot;);
101match(B);</small></pre>
102        <p>You can throw this exception yourself during the parse if one of your actions
103        determines that the input is wacked.</td>
104      </tr>
105      <tr>
106        <td width="50%" align="left" valign="top"></td>
107        <td width="50%"></td>
108      </tr>
109      <tr>
110        <td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamException</font></small></td>
111        <td width="50%">Indicates that something went wrong while generating a stream of tokens.</td>
112      </tr>
113      <tr>
114        <td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamIOException</font></small></td>
115        <td width="50%">Wraps an IOException in a <font face="Courier New">TokenStreamException</font></td>
116      </tr>
117      <tr>
118        <td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamRecognitionException</font></small></td>
119        <td width="50%">Wraps a <font face="Courier New">RecognitionException</font> in a <font
120        face="Courier New">TokenStreamException</font> so you can pass it along on a stream.</td>
121      </tr>
122      <tr>
123        <td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamRetryException</font></small></td>
124        <td width="50%">Signals aborted recognition of current token. Try to get one again. Used
125        by <small><font face="Courier New">TokenStreamSelector.retry()</font></small> to force <font
126        face="Courier New">nextToken()</font> of stream to re-enter and retry.&nbsp; See the
127        examples/java/includeFile directory.<p>This a great way to handle nested include files and
128        so on or to try out multiple grammars to see which appears to fit the data.&nbsp; You can
129        have something listen on a socket for multiple input types without knowing which type will
130        show up when.</td>
131      </tr>
132    </table>
133    <p><a name="_bb2"></a>The typical main or parser invoker has try-catch around the
134    invocation:</p>
135    <pre>    try {
136       ...
137    }
138    catch(TokenStreamException e) {
139      System.err.println(&quot;problem with stream: &quot;+e);
140    }
141    catch(RecognitionException re) {
142      System.err.println(&quot;bad input: &quot;+re);
143    }</pre>
144    <p>Lexer rules throw <font face="Courier New">RecognitionException</font>, <font
145    face="Courier New">CharStreamException</font>, and <font face="Courier New">TokenStreamException</font>.</p>
146    <p>Parser rules throw <font face="Courier New">RecognitionException</font> and <font
147    face="Courier New">TokenStreamException</font>.</p>
148    <h3><a name="Modifying Default Error Messages With Paraphrases">Modifying Default Error
149    Messages With Paraphrases</a></h3>
150    <p>The name or definition of a token in your lexer is rarely meaningful to the user of
151    your recognizer or translator.&nbsp; For example, instead of seeing</p>
152    <pre>T.java:1:9: expecting ID, found ';'</pre>
153    <p>you can have the parser generate:</p>
154    <pre>T.java:1:9: expecting an identifier, found ';'</pre>
155    <p>ANTLR provides an easy way to specify a string to use in place of the token name.&nbsp;
156    In the definition for ID, use the paraphrase option:</p>
157    <pre>ID
158options {
159  paraphrase = &quot;an identifier&quot;;
160}
161  : ('a'..'z'|'A'..'Z'|'_')
162    ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
163  ;</pre>
164    <p>Note that this paraphrase goes into the token types text file (ANTLR's persistence
165    file).&nbsp; In other words, a grammar that uses this vocabulary will also use the
166    paraphrase. </p>
167    <h3><a name="ParserExceptionHandling">Parser Exception Handling</a></h3>
168    <p>ANTLR generates recursive-descent recognizers. Since recursive-descent recognizers
169    operate by recursively calling the rule-matching methods, this results in a call stack
170    that is populated by the contexts of the recursive-descent methods. Parser exception
171    handling for grammar rules is a lot like exception handling in a language like C++ or
172    Java. Namely, when an exception is thrown, the normal thread of execution is stopped, and
173    functions on the call stack are exited sequentially until one is encountered that wants to
174    catch the exception. When an exception is caught, execution resumes at that point. </p>
175    <p>In ANTLR, parser exceptions are thrown when (a) there is a syntax error, (b) there
176    is a failed validating semantic predicate, or (c) you throw a parser exception from an
177    action. </p>
178    <p>In all cases, the recursive-descent functions on the call stack are exited until an
179    exception handler is encountered for that exception type or one of its base classes (in
180    non-object-oriented languages, the hierarchy of execption types is not implemented by a
181    class hierarchy). Exception handlers arise in one of two ways. First, if you do nothing,
182    ANTLR will generate a default exception handler for every parser rule. The default
183    exception handler will report an error, sync to the follow set of the rule, and return
184    from that rule. Second, you may specify your own exception handlers in a variety of ways,
185    as described later. </p>
186    <p>If you specify an exception handler for a rule, then the default exception handler is
187    not generated for that rule. In addition, you may control the generation of default
188    exception handlers with a <a href="options.html#defaultErrorHandler">per-grammar or
189    per-rule option</a>. </p>
190    <h3><a name="SpecifyingParserException-Handlers">Specifying Parser Exception-Handlers</a></h3>
191    <p>You may attach exception handlers to a rule, an alternative, or a labeled element. The
192    general form for specifying an exception handler is:</p>
193    <pre><tt>
194exception [label]
195catch [exceptionType exceptionVariable]
196  { action }
197catch ...
198catch ...
199</tt></pre>
200    <p>where the label is only used for attaching exceptions to labeled elements. The <tt>exceptionType</tt>
201    is the exception (or class of exceptions) to catch, and the <tt>exceptionVariable</tt> is
202    the variable name of the caught exception, so that the action can process the exception if
203    desired. Here is an example that catches an exception for the rule, for an alternate and
204    for a labeled element: </p>
205    <pre><tt>
206rule:   a:A B C
207    |   D E
208        exception // for alternate
209          catch [RecognitionException ex] {
210            reportError(ex.toString());
211        }
212    ;
213    exception // for rule
214    catch [RecognitionException ex] {
215       reportError(ex.toString());
216    }
217    exception[a] // for a:A
218    catch [RecognitionException ex] {
219       reportError(ex.toString());
220    }
221</tt>  </pre>
222    <p>Note that exceptions attached to alternates and labeled elements <b>do not</b> cause
223    the rule to exit. Matching and control flow continues as if the error had not occurred.
224    Because of this, you must be careful not to use any variables that would have been set by
225    a successful match when an exception is caught. </p>
226    <h3><a name="Default Exception Handling in the Lexer">Default Exception Handling in the
227    Lexer</a></h3>
228    <p>Normally you want the lexer to keep trying to get a valid token upon lexical error.
229    &nbsp; That way, the parser doesn't have to deal with lexical errors and ask for another
230    token.&nbsp; Sometimes you want exceptions to pop out of the lexer--usually when you want
231    to abort the entire parsing process upon syntax error.&nbsp; To get ANTLR to generate
232    lexers that pass on <font face="Courier New">RecognitionException</font>'s to the parser
233    as <font face="Courier New">TokenStreamException</font>'s, use the <font
234    face="Courier New">defaultErrorHandler=false</font> grammar option.&nbsp; Note that IO
235    exceptions are passed back as <font face="Courier New">TokenStreamIOException</font>'s
236    regardless of this option.</p>
237    <p>Here is an example that uses a bogus semantic exception (which is a subclass of <font
238    face="Courier New">RecognitionException</font>) to demonstrate blasting out of the lexer:</p>
239    <pre>class P extends Parser;
240{
241public static void main(String[] args) {
242        L lexer = new L(System.in);
243        P parser = new P(lexer);
244        try {
245                parser.start();
246        }
247        catch (Exception e) {
248                System.err.println(e);
249        }
250}
251}
252
253start : &quot;int&quot; ID (COMMA ID)* SEMI ;
254
255class L extends Lexer;
256options {
257        defaultErrorHandler=false;
258}
259
260{int x=1;}
261
262ID  : ('a'..'z')+ ;
263
264SEMI: ';'
265      {if ( <em>expr</em> )
266       throw new
267          SemanticException(&quot;test&quot;,
268                            getFilename(),
269                            getLine());} ;
270
271COMMA:',' ;
272
273WS  : (' '|'\n'{newline();})+
274      {$setType(Token.SKIP);}
275    ;</pre>
276    <p>When you type in, say, &quot;<font face="Courier New">int b;</font>&quot; you get the
277    following as output:</p>
278    <pre>antlr.TokenStreamRecognitionException: test</pre>
279    <pre><font face="Arial" size="2">Version: $Id: //depot/code/org.antlr/release/antlr-2.7.7/doc/err.html#2 $</font></pre>
280</body>
281</html>
Note: See TracBrowser for help on using the repository browser.