1 | <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> |
---|
2 | <html> |
---|
3 | |
---|
4 | <head> |
---|
5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
6 | <title>Error Handling and Recovery</title> |
---|
7 | </head> |
---|
8 | |
---|
9 | <body bgcolor="#FFFFFF"> |
---|
10 | |
---|
11 | <h2><a name="_bb1"></a><a name="lexicalanalysis">Error |
---|
12 | Handling and Recovery</a></h2> |
---|
13 | <p>All syntactic and semantic errors cause parser exceptions to be thrown. In particular, |
---|
14 | the methods used to match tokens in the parser base class (match et al) throw |
---|
15 | MismatchedTokenException. If the lookahead predicts no alternative of a production in |
---|
16 | either the parser or lexer, then a NoViableAltException is thrown. The methods in the |
---|
17 | lexer base class used to match characters (match et al) throw analogous exceptions.</p> |
---|
18 | <p>ANTLR will generate default error-handling code, or you may specify your own exception |
---|
19 | handlers. Either case results (where supported by the language) in the creation of a <tt>try/catch</tt> |
---|
20 | block. Such <tt>try{}</tt> blocks surround the generated code for the grammar element of |
---|
21 | interest (rule, alternate, token reference, or rule reference). If no exception handlers |
---|
22 | (default or otherwise) are specified, then the exception will propagate all the way out of |
---|
23 | the parser to the calling program. </p> |
---|
24 | <p>ANTLR's default exception handling is good to get something working, but you will have |
---|
25 | more control over error-reporting and resynchronization if you write your own exception |
---|
26 | handlers. </p> |
---|
27 | <p>Note that the '@' exception specification of PCCTS 1.33 does not apply to ANTLR.</p> |
---|
28 | <h3><a name="ANTLR Exception Hierarchy">ANTLR Exception Hierarchy</a></h3> |
---|
29 | <p>ANTLR-generated parsers throw exceptions to signal recognition errors or other stream |
---|
30 | problems. All exceptions derive from <font face="Courier New">ANTLRException</font>. |
---|
31 | The following diagram shows the hierarchy:</p> |
---|
32 | <p><img src="ANTLRException.gif" width="646" height="263" |
---|
33 | alt="ANTLRException.gif (14504 bytes)"></p> |
---|
34 | <table border="0" width="100%"> |
---|
35 | <tr> |
---|
36 | <th width="50%">Exception</th> |
---|
37 | <th width="50%">Description</th> |
---|
38 | </tr> |
---|
39 | <tr> |
---|
40 | <td width="50%" align="left" valign="top"><small><font face="Courier New">ANTLRException</font></small></td> |
---|
41 | <td width="50%">Root of the exception hiearchy. You can directly subclass this if |
---|
42 | you want to define your own exceptions unless they live more properly under one of the |
---|
43 | specific exceptions below.</td> |
---|
44 | </tr> |
---|
45 | <tr> |
---|
46 | <td width="50%" align="left" valign="top"></td> |
---|
47 | <td width="50%"></td> |
---|
48 | </tr> |
---|
49 | <tr> |
---|
50 | <td width="50%" align="left" valign="top"><small><font face="Courier New">CharStreamException</font></small></td> |
---|
51 | <td width="50%">Something bad that happens on the character input stream. Most of |
---|
52 | the time it will be an IO problem, but you could define an exception for input coming from |
---|
53 | a dialog box or whatever.</td> |
---|
54 | </tr> |
---|
55 | <tr> |
---|
56 | <td width="50%" align="left" valign="top"><small><font face="Courier New">CharStreamIOException</font></small></td> |
---|
57 | <td width="50%">The character input stream had an IO exception (e.g., <font |
---|
58 | face="Courier New">CharBuffer.fill()</font> can throw this). If <font |
---|
59 | face="Courier New">nextToken()</font> sees this, it will convert it to a <font |
---|
60 | face="Courier New">TokenStreamIOException</font>.</td> |
---|
61 | </tr> |
---|
62 | <tr> |
---|
63 | <td width="50%" align="left" valign="top"></td> |
---|
64 | <td width="50%"></td> |
---|
65 | </tr> |
---|
66 | <tr> |
---|
67 | <td width="50%" align="left" valign="top"><small><font face="Courier New">RecognitionException</font></small></td> |
---|
68 | <td width="50%">A generic recognition problem with the input. Use this as your |
---|
69 | "catch all" exception in your main() or other method that invokes a parser, |
---|
70 | lexer, or treeparser. All parser rules throw this exception.</td> |
---|
71 | </tr> |
---|
72 | <tr> |
---|
73 | <td width="50%" align="left" valign="top"><small><font face="Courier New">MismatchedCharException</font></small></td> |
---|
74 | <td width="50%">Thrown by CharScanner.match() when it is looking for a character, but |
---|
75 | finds a different one on the input stream.</td> |
---|
76 | </tr> |
---|
77 | <tr> |
---|
78 | <td width="50%" align="left" valign="top"><small><font face="Courier New">MismatchedTokenException</font></small></td> |
---|
79 | <td width="50%">Thrown by Parser.match() when it is looking for a token, but finds a |
---|
80 | different one on the input stream.</td> |
---|
81 | </tr> |
---|
82 | <tr> |
---|
83 | <td width="50%" align="left" valign="top"><small><font face="Courier New">NoViableAltException</font></small></td> |
---|
84 | <td width="50%">The parser finds an unexpected token; that is, it finds a token that does |
---|
85 | not begin any alternative in the current decision.</td> |
---|
86 | </tr> |
---|
87 | <tr> |
---|
88 | <td width="50%" align="left" valign="top"><small><font face="Courier New">NoViableAltForCharException</font></small></td> |
---|
89 | <td width="50%">The lexer finds an unexpected character; that is, it finds a character |
---|
90 | that does not begin any alternative in the current decision.</td> |
---|
91 | </tr> |
---|
92 | <tr> |
---|
93 | <td width="50%" align="left" valign="top"><small><font face="Courier New">SemanticException</font></small></td> |
---|
94 | <td width="50%">Used to indicate syntactically valid, but nonsensical or otherwise bogus |
---|
95 | input was found on the input stream. This exception is thrown automatically by |
---|
96 | failed, validating semantic predicates such as:<pre>a : A {false}? B ;</pre> |
---|
97 | <p>ANTLR generates:</p> |
---|
98 | <pre><small>match(A); |
---|
99 | if (!(false)) throw new |
---|
100 | SemanticException("false"); |
---|
101 | match(B);</small></pre> |
---|
102 | <p>You can throw this exception yourself during the parse if one of your actions |
---|
103 | determines that the input is wacked.</td> |
---|
104 | </tr> |
---|
105 | <tr> |
---|
106 | <td width="50%" align="left" valign="top"></td> |
---|
107 | <td width="50%"></td> |
---|
108 | </tr> |
---|
109 | <tr> |
---|
110 | <td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamException</font></small></td> |
---|
111 | <td width="50%">Indicates that something went wrong while generating a stream of tokens.</td> |
---|
112 | </tr> |
---|
113 | <tr> |
---|
114 | <td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamIOException</font></small></td> |
---|
115 | <td width="50%">Wraps an IOException in a <font face="Courier New">TokenStreamException</font></td> |
---|
116 | </tr> |
---|
117 | <tr> |
---|
118 | <td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamRecognitionException</font></small></td> |
---|
119 | <td width="50%">Wraps a <font face="Courier New">RecognitionException</font> in a <font |
---|
120 | face="Courier New">TokenStreamException</font> so you can pass it along on a stream.</td> |
---|
121 | </tr> |
---|
122 | <tr> |
---|
123 | <td width="50%" align="left" valign="top"><small><font face="Courier New">TokenStreamRetryException</font></small></td> |
---|
124 | <td width="50%">Signals aborted recognition of current token. Try to get one again. Used |
---|
125 | by <small><font face="Courier New">TokenStreamSelector.retry()</font></small> to force <font |
---|
126 | face="Courier New">nextToken()</font> of stream to re-enter and retry. See the |
---|
127 | examples/java/includeFile directory.<p>This a great way to handle nested include files and |
---|
128 | so on or to try out multiple grammars to see which appears to fit the data. You can |
---|
129 | have something listen on a socket for multiple input types without knowing which type will |
---|
130 | show up when.</td> |
---|
131 | </tr> |
---|
132 | </table> |
---|
133 | <p><a name="_bb2"></a>The typical main or parser invoker has try-catch around the |
---|
134 | invocation:</p> |
---|
135 | <pre> try { |
---|
136 | ... |
---|
137 | } |
---|
138 | catch(TokenStreamException e) { |
---|
139 | System.err.println("problem with stream: "+e); |
---|
140 | } |
---|
141 | catch(RecognitionException re) { |
---|
142 | System.err.println("bad input: "+re); |
---|
143 | }</pre> |
---|
144 | <p>Lexer rules throw <font face="Courier New">RecognitionException</font>, <font |
---|
145 | face="Courier New">CharStreamException</font>, and <font face="Courier New">TokenStreamException</font>.</p> |
---|
146 | <p>Parser rules throw <font face="Courier New">RecognitionException</font> and <font |
---|
147 | face="Courier New">TokenStreamException</font>.</p> |
---|
148 | <h3><a name="Modifying Default Error Messages With Paraphrases">Modifying Default Error |
---|
149 | Messages With Paraphrases</a></h3> |
---|
150 | <p>The name or definition of a token in your lexer is rarely meaningful to the user of |
---|
151 | your recognizer or translator. For example, instead of seeing</p> |
---|
152 | <pre>T.java:1:9: expecting ID, found ';'</pre> |
---|
153 | <p>you can have the parser generate:</p> |
---|
154 | <pre>T.java:1:9: expecting an identifier, found ';'</pre> |
---|
155 | <p>ANTLR provides an easy way to specify a string to use in place of the token name. |
---|
156 | In the definition for ID, use the paraphrase option:</p> |
---|
157 | <pre>ID |
---|
158 | options { |
---|
159 | paraphrase = "an identifier"; |
---|
160 | } |
---|
161 | : ('a'..'z'|'A'..'Z'|'_') |
---|
162 | ('a'..'z'|'A'..'Z'|'_'|'0'..'9')* |
---|
163 | ;</pre> |
---|
164 | <p>Note that this paraphrase goes into the token types text file (ANTLR's persistence |
---|
165 | file). In other words, a grammar that uses this vocabulary will also use the |
---|
166 | paraphrase. </p> |
---|
167 | <h3><a name="ParserExceptionHandling">Parser Exception Handling</a></h3> |
---|
168 | <p>ANTLR generates recursive-descent recognizers. Since recursive-descent recognizers |
---|
169 | operate by recursively calling the rule-matching methods, this results in a call stack |
---|
170 | that is populated by the contexts of the recursive-descent methods. Parser exception |
---|
171 | handling for grammar rules is a lot like exception handling in a language like C++ or |
---|
172 | Java. Namely, when an exception is thrown, the normal thread of execution is stopped, and |
---|
173 | functions on the call stack are exited sequentially until one is encountered that wants to |
---|
174 | catch the exception. When an exception is caught, execution resumes at that point. </p> |
---|
175 | <p>In ANTLR, parser exceptions are thrown when (a) there is a syntax error, (b) there |
---|
176 | is a failed validating semantic predicate, or (c) you throw a parser exception from an |
---|
177 | action. </p> |
---|
178 | <p>In all cases, the recursive-descent functions on the call stack are exited until an |
---|
179 | exception handler is encountered for that exception type or one of its base classes (in |
---|
180 | non-object-oriented languages, the hierarchy of execption types is not implemented by a |
---|
181 | class hierarchy). Exception handlers arise in one of two ways. First, if you do nothing, |
---|
182 | ANTLR will generate a default exception handler for every parser rule. The default |
---|
183 | exception handler will report an error, sync to the follow set of the rule, and return |
---|
184 | from that rule. Second, you may specify your own exception handlers in a variety of ways, |
---|
185 | as described later. </p> |
---|
186 | <p>If you specify an exception handler for a rule, then the default exception handler is |
---|
187 | not generated for that rule. In addition, you may control the generation of default |
---|
188 | exception handlers with a <a href="options.html#defaultErrorHandler">per-grammar or |
---|
189 | per-rule option</a>. </p> |
---|
190 | <h3><a name="SpecifyingParserException-Handlers">Specifying Parser Exception-Handlers</a></h3> |
---|
191 | <p>You may attach exception handlers to a rule, an alternative, or a labeled element. The |
---|
192 | general form for specifying an exception handler is:</p> |
---|
193 | <pre><tt> |
---|
194 | exception [label] |
---|
195 | catch [exceptionType exceptionVariable] |
---|
196 | { action } |
---|
197 | catch ... |
---|
198 | catch ... |
---|
199 | </tt></pre> |
---|
200 | <p>where the label is only used for attaching exceptions to labeled elements. The <tt>exceptionType</tt> |
---|
201 | is the exception (or class of exceptions) to catch, and the <tt>exceptionVariable</tt> is |
---|
202 | the variable name of the caught exception, so that the action can process the exception if |
---|
203 | desired. Here is an example that catches an exception for the rule, for an alternate and |
---|
204 | for a labeled element: </p> |
---|
205 | <pre><tt> |
---|
206 | rule: a:A B C |
---|
207 | | D E |
---|
208 | exception // for alternate |
---|
209 | catch [RecognitionException ex] { |
---|
210 | reportError(ex.toString()); |
---|
211 | } |
---|
212 | ; |
---|
213 | exception // for rule |
---|
214 | catch [RecognitionException ex] { |
---|
215 | reportError(ex.toString()); |
---|
216 | } |
---|
217 | exception[a] // for a:A |
---|
218 | catch [RecognitionException ex] { |
---|
219 | reportError(ex.toString()); |
---|
220 | } |
---|
221 | </tt> </pre> |
---|
222 | <p>Note that exceptions attached to alternates and labeled elements <b>do not</b> cause |
---|
223 | the rule to exit. Matching and control flow continues as if the error had not occurred. |
---|
224 | Because of this, you must be careful not to use any variables that would have been set by |
---|
225 | a successful match when an exception is caught. </p> |
---|
226 | <h3><a name="Default Exception Handling in the Lexer">Default Exception Handling in the |
---|
227 | Lexer</a></h3> |
---|
228 | <p>Normally you want the lexer to keep trying to get a valid token upon lexical error. |
---|
229 | That way, the parser doesn't have to deal with lexical errors and ask for another |
---|
230 | token. Sometimes you want exceptions to pop out of the lexer--usually when you want |
---|
231 | to abort the entire parsing process upon syntax error. To get ANTLR to generate |
---|
232 | lexers that pass on <font face="Courier New">RecognitionException</font>'s to the parser |
---|
233 | as <font face="Courier New">TokenStreamException</font>'s, use the <font |
---|
234 | face="Courier New">defaultErrorHandler=false</font> grammar option. Note that IO |
---|
235 | exceptions are passed back as <font face="Courier New">TokenStreamIOException</font>'s |
---|
236 | regardless of this option.</p> |
---|
237 | <p>Here is an example that uses a bogus semantic exception (which is a subclass of <font |
---|
238 | face="Courier New">RecognitionException</font>) to demonstrate blasting out of the lexer:</p> |
---|
239 | <pre>class P extends Parser; |
---|
240 | { |
---|
241 | public static void main(String[] args) { |
---|
242 | L lexer = new L(System.in); |
---|
243 | P parser = new P(lexer); |
---|
244 | try { |
---|
245 | parser.start(); |
---|
246 | } |
---|
247 | catch (Exception e) { |
---|
248 | System.err.println(e); |
---|
249 | } |
---|
250 | } |
---|
251 | } |
---|
252 | |
---|
253 | start : "int" ID (COMMA ID)* SEMI ; |
---|
254 | |
---|
255 | class L extends Lexer; |
---|
256 | options { |
---|
257 | defaultErrorHandler=false; |
---|
258 | } |
---|
259 | |
---|
260 | {int x=1;} |
---|
261 | |
---|
262 | ID : ('a'..'z')+ ; |
---|
263 | |
---|
264 | SEMI: ';' |
---|
265 | {if ( <em>expr</em> ) |
---|
266 | throw new |
---|
267 | SemanticException("test", |
---|
268 | getFilename(), |
---|
269 | getLine());} ; |
---|
270 | |
---|
271 | COMMA:',' ; |
---|
272 | |
---|
273 | WS : (' '|'\n'{newline();})+ |
---|
274 | {$setType(Token.SKIP);} |
---|
275 | ;</pre> |
---|
276 | <p>When you type in, say, "<font face="Courier New">int b;</font>" you get the |
---|
277 | following as output:</p> |
---|
278 | <pre>antlr.TokenStreamRecognitionException: test</pre> |
---|
279 | <pre><font face="Arial" size="2">Version: $Id: //depot/code/org.antlr/release/antlr-2.7.7/doc/err.html#2 $</font></pre> |
---|
280 | </body> |
---|
281 | </html> |
---|