Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

glossary.html @ 1

Last change on this file since 1 was 1, checked in by lnalod, 15 years ago
Initial import of YAO sources
File size: 23.0 KB

Line
1	<html>
2	<head>
3	<title>ANTLR-centric Language Glossary</title>
4	</head>
5	<body bgcolor="#FFFFFF" text="#000000">
6
7	<font face=Arial>
8
9	<h1><a id="ANTLR-centric_Language_Glossary" name="ANTLR-centric_Language_Glossary">ANTLR-centric Language Glossary</a></h1>
10
11	<i>Terence Parr</i>
12	<br><br>
13	This glossary defines some of the more important terms used in the
14	ANTLR documentation. I have tried to be very informal and provide
15	references to other pages that are useful. For another great source
16	of information about formal computer languages, see <a
17	href="http://www.wikipedia.org/wiki/Context-free_grammar"><b>Wikipedia</b></a>.
18
19	<font size=2>
20	<!--index-->
21	<ul>
22	<li><a href="#Ambiguous">Ambiguous</a></li>
23	<li><a href="#ANTLR">ANTLR</a></li>
24	<li><a href="#AST">AST</a></li>
25	<li><a href="#Bit_set">Bit set</a></li>
26	<li><a href="#Child-sibling_Tree">Child-sibling Tree</a></li>
27	<li><a href="#Context-free_grammar">Context-free grammar</a></li>
28	<li><a href="#Context-sensitive">Context-sensitive</a></li>
29	<li><a href="#DFA">DFA</a></li>
30	<li><a href="#FIRST">FIRST</a></li>
31	<li><a href="#FOLLOW">FOLLOW</a></li>
32	<li><a href="#Grammar">Grammar</a></li>
33	<li><a href="#Hoisting">Hoisting</a></li>
34	<li><a href="#Inheritance,_grammar">Inheritance, grammar</a></li>
35	<li><a href="#LA(n)">LA(n)</a></li>
36	<li><a href="#Left-prefix,_left_factor">Left-prefix, left factor</a></li>
37	<li><a href="#Literal">Literal</a></li>
38	<li><a href="#Linear_approximate_lookahead">Linear approximate lookahead</a></li>
39	<li><a href="#LL(k)">LL(k)</a></li>
40	<li><a href="#LT(n)">LT(n)</a></li>
41	<li><a href="#Language">Language</a></li>
42	<li><a href="#Lexer">Lexer</a></li>
43	<li><a href="#Lookahead">Lookahead</a></li>
44	<li><a href="#nextToken">nextToken</a></li>
45	<li><a href="#NFA">NFA</a></li>
46	<li><a href="#Nondeterministic">Nondeterministic</a></li>
47	<li><a href="#Parser">Parser</a></li>
48	<li><a href="#Predicate,_semantic">Predicate, semantic</a></li>
49	<li><a href="#Predicate,_syntactic">Predicate, syntactic</a></li>
50	<li><a href="#Production">Production</a></li>
51	<li><a href="#Protected">Protected</a></li>
52	<li><a href="#Recursive-descent">Recursive-descent</a></li>
53	<li><a href="#Regular">Regular</a></li>
54	<li><a href="#Rule">Rule</a></li>
55	<li><a href="#Scanner">Scanner</a></li>
56	<li><a href="#Semantics">Semantics</a></li>
57	<li><a href="#Subrule">Subrule</a></li>
58	<li><a href="#Syntax">Syntax</a></li>
59	<li><a href="#Token">Token</a></li>
60	<li><a href="#Token_stream">Token stream</a></li>
61	<li><a href="#Tree">Tree</a></li>
62	<li><a href="#Tree_parser">Tree parser</a></li>
63	<li><a href="#Vocabulary">Vocabulary</a></li>
64	<li><a href="#Wow">Wow</a></li>
65	</ul>
66	<!--/index-->
67	</font>
68
69	<h3><a id="Ambiguous" name="Ambiguous">Ambiguous</a></h3>
70
71	A language is ambiguous if the same sentence or phrase can be
72	interpreted in more than a single way. For example, the following
73	sentence by Groucho Marx is easily interpreted in two ways: "I once
74	shot an elephant in my pajamas. How he got in my pajamas I'll never
75	know!" In the computer world, a typical language ambiguity is the
76	if-then-else ambiguity where the else-clause may be attached to either
77	the most recent if-then or an older one. Reference manuals for
78	computer languages resolve this ambiguity by stating that else-clauses
79	always match up with the most recent if-then.
80
81	<p>
82	A grammar is ambiguous if the same input sequence can be derived in
83	multiple ways. Ambiguous languages always yield ambiguous grammars
84	unless you can find a way to encode semantics (actions or predicates
85	etc...) that resolve the ambiguity. Most language tools like ANTLR
86	resolve the if-then-else ambiguity by simply choosing to match
87	greedily (i.e., as soon as possible). This matches the else with the
88	most recent if-then. See nondeterministic.
89
90	<h3><a id="ANTLR" name="ANTLR">ANTLR</a></h3>
91
92	<b>AN</b>other <b>T</b>ool for <b>L</b>anguage <b>R</b>ecognition, a
93	predicated-LL(k) parser generator that handles lexers, parsers, and
94	tree parsers. ANTLR has been available since 1990 and led to a
95	resurgence of recursive-descent parsing after decades dominated by LR
96	and other DFA-based strategies.
97
98	<h3><a id="AST" name="AST">AST</a></h3>
99
100	<b>A</b>bstract <b>S</b>yntax <b>T</b>ree. ASTs are used as internal
101	representations of an input stream, normally constructed during a
102	parsing phase. Because AST are two-dimensional trees they
103	can encode the structure (as determined by the parser) of the input as
104	well as the input symbols.
105
106	<p>
107	A homogeneous AST is in one in which the physical objects are all of
108	the same type; e.g., <tt>CommonAST</tt> in ANTLR. A heterogeneous
109	tree may have multiple types such as <tt>PlusNode</tt>,
110	<tt>MultNode</tt> etc...
111
112	<p> An AST is not a parse tree, which encodes the sequence of rules
113	used to match input symbols. See <a
114	href="http://www.jguru.com/faq/view.jsp?EID=814505"><b>What's the
115	difference between a parse tree and an abstract syntax tree (AST)? Why
116	doesn't ANTLR generate trees with nodes for grammar rules like JJTree
117	does?</b></a>.
118
119	<p>An AST for input <tt>3+4</tt> might be represented as
120	<pre>
121	+
122	/ \
123	3 4
124	</pre>
125
126	or more typically (ala ANTLR) in child-sibling form:
127
128	<pre>
129	+
130	\|
131	3--4
132	</pre>
133
134	Operators are usually subtree roots and operands are usually leaves.
135
136	<h3><a id="Bit_set" name="Bit_set">Bit set</a></h3>
137
138	Bit sets are an extremely efficient representation for dense integer
139	sets. You can easily encode sets of strings also by mapping unique
140	strings to unique integers. ANTLR uses bitsets for lookahead
141	prediction in parsers and lexers. Simple bit set implementations do
142	not work so well for sparse sets, particularly when the maximum
143	integer stored in the set is large.
144
145	<p> ANTLR's bit set represents membership with a bit for each possible
146	integer value. For a maximum value of <i>n</i>, a bit set needs
147	<i>n/64</i> long words or <i>n/8</i> bytes. For ASCII bit sets with a
148	maximum value of 127, you only need 16 bytes or 2 long words. UNICODE
149	has a max value of \uFFFF or 65535, requiring 8k bytes, and these sets
150	are typically sparse. Fortunately most lexers only need a few of
151	these space inefficient (but speedy) bitsets and so it's not really a
152	problem.
153
154	<h3><a id="Child-sibling_Tree" name="Child-sibling_Tree">Child-sibling Tree</a></h3>
155
156	A particularly efficient data structure for representing trees. See AST.
157
158	<h3><a id="Context-free_grammar" name="Context-free_grammar">Context-free grammar</a></h3>
159
160	A grammar where recognition of a particular construct does not depend
161	on whether it is in a particular syntactic context. A context-free
162	grammar has a set of rules like
163
164	<pre>
165	stat : IF expr THEN stat
166	\| ...
167	;
168	</pre>
169
170	where there is no restriction on when the <tt>IF</tt> alternative may
171	be applied--if you are in rule <tt>stat</tt>, you may apply the
172	alternative.
173
174	<h3><a id="Context-sensitive" name="Context-sensitive">Context-sensitive</a></h3>
175
176	A grammar where recognition of a particular construct may depend on a
177	syntactic context. You never see these grammars in practice because
178	they are impractical (Note, an Earley parser is O(n^3) worst-case for
179	context-<i>free</i> grammars). A context-free rule looks like:
180
181	<pre>
182	Α → γ
183	</pre>
184
185	but a context-<i>sensitive</i> rule may have context on the left-side:
186
187	<pre>
188	αΑβ → αγβ
189	</pre>
190
191	meaning that rule Α may only be applied (converted to γ)
192	in between α and β.
193
194	<p>
195	In an ANTLR sense, you can recognize context-sensitive constructs with
196	a semantic predicate. The action evaluates to true or false
197	indicating the validity of applying the alternative.
198
199	<p>
200	See <a
201	href="http://www.wikipedia.org/wiki/Context-sensitive"><b>Context-sensitive gramar</b></a>.
202
203	<h3><a id="DFA" name="DFA">DFA</a></h3>
204
205	<b>D</b>eterministic <b>F</b>inite <b>A</b>utomata. A state machine
206	used typically to formally describe lexical analyzers. <tt>lex</tt>
207	builds a DFA to recognize tokens whereas ANTLR builds a recursive
208	descent lexer similar to what you would build by hand. See <a
209	href="http://www.wikipedia.org/wiki/Finite_state_machine"><b>Finite
210	state machine</b></a> and ANTLR's lexer documentation.
211
212	<h3><a id="FIRST" name="FIRST">FIRST</a></h3>
213
214	The set of symbols that may be matched on the left-edge of a rule.
215	For example, the FIRST(decl) is set {ID, INT} for the following:
216
217	<pre>
218	decl : ID ID SEMICOLON
219	\| INT ID SEMICOLON
220	;
221	</pre>
222
223	The situation gets more complicated when you have optional
224	constructs. The FIRST(a) below is {A,B,C}
225
226	<pre>
227	a : (A)? B
228	\| C
229	;
230	</pre>
231
232	because the A is optional and the B may be seen on the left-edge.
233
234	<p>
235	Naturally k>1 lookahead symbols makes this even more complicated.
236	FIRST_k must track sets of k-sequences not just individual symbols.
237
238	<h3><a id="FOLLOW" name="FOLLOW">FOLLOW</a></h3>
239
240	The set of input symbols that may follow any reference to the
241	specified rule. For example, FOLLOW(decl) is {RPAREN, SEMICOLON):
242
243	<pre>
244	methodHead : ID LPAREN decl RPAREN ;
245	var : decl SEMICOLON ;
246	decl : TYPENAME ID ;
247	</pre>
248
249	because RPAREN and SEMICOLON both follow references to rule decl.
250	FIRST and FOLLOW computations are used to analyze grammars and
251	generate parsing decisions.
252
253	<p>
254	This grammar analysis all gets very complicated when k>1.
255
256	<h3><a id="Grammar" name="Grammar">Grammar</a></h3>
257
258	A finite means of formally describing the structure of a possibly
259	infinite language. Parser generators build parsers that recognize
260	sentences in the language described by a grammar. Most parser
261	generators allow you to add actions to be executed during the parse.
262
263	<h3><a id="Hoisting" name="Hoisting">Hoisting</a></h3>
264
265	Semantic predicates describe the semantic context in which a rule or
266	alternative applies. The predicate is hoisted into a prediction
267	expression. Hoisting typically refers to pulling a predicate out of
268	its enclosing rule and into the prediction expression of another
269	rule. For example,
270
271	<pre>
272	decl : typename ID SEMICOLON
273	\| ID ID SEMICOLON
274	;
275	typename : {isType(LT(1))}? ID
276	;
277	</pre>
278
279	The predicate is not needed in typename as there is no decision,
280	however, rule decl needs it to distinguish between its two
281	alternatives. The first alternative would look like:
282
283	<pre>
284	if ( LA(1)==ID && isType(LT(1)) ) {
285	typename();
286	match(ID);
287	match(SEMICOLON);
288	}
289	</pre>
290
291	PCCTS 1.33 did, but ANTLR currently does not hoist predicates into
292	other rules.
293
294	<h3><a id="Inheritance,_grammar" name="Inheritance,_grammar">Inheritance, grammar</a></h3>
295
296	The ability of ANTLR to define a new grammar as it differs from an
297	existing grammar. See the ANTLR documentation.
298
299	<h3><a id="LA(n)" name="LA(n)">LA(n)</a></h3>
300	The nth lookahead character, token type, or AST node type depending
301	on the grammar type (lexer, parser, or tree parser respectively).
302
303	<h3><a id="Left-prefix,_left_factor" name="Left-prefix,_left_factor">Left-prefix, left factor</a></h3>
304
305	A common sequence of symbols on the left-edge of a set of alternatives
306	such as:
307
308	<pre>
309	a : A B X
310	\| A B Y
311	;
312	</pre>
313
314	The left-prefix is A B, which you can remove by left-factoring:
315
316	<pre>
317	a : A B (X\|Y)
318	;
319	</pre>
320
321	Left-factoring is done to reduce lookahead requirements.
322
323	<h3><a id="Literal" name="Literal">Literal</a></h3>
324
325	Generally a literal refers to a fixed string such as <tt>begin</tt>
326	that you wish to match. When you reference a literal in an ANTLR
327	grammar via <tt>"begin"</tt>, ANTLR assigns it a token type like any
328	other token. If you have defined a lexer, ANTLR provides information
329	about the literal (type and text) to the lexer so it may detect
330	occurrences of the literal.
331
332	<h3><a id="Linear_approximate_lookahead" name="Linear_approximate_lookahead">Linear approximate lookahead</a></h3>
333
334	An approximation to full lookahead (that can be applied to both LL and LR
335	parsers) for k>1 that reduces the complexity of storing and testing
336	lookahead from O(n^k) to O(nk); exponential to linear reduction. When
337	linear approximate lookahead is insufficient (results in a
338	nondeterministic parser), you can use the approximate lookahead to
339	attenuate the cost of building the full decision.
340
341	<p>Here is a simple example illustrating the difference between full
342	and approximate lookahead:
343
344	<pre>
345	a : (A B \| C D)
346	\| A D
347	;
348	</pre>
349
350	This rule is LL(2) but not linear approximate LL(2). The real
351	FIRST_2(a) is {AB,CD} for alternative 1 and {AD} for alternative 2.
352	No intersection, so no problem. Linear approximate lookahead
353	collapses all symbols at depth i yielding k sets instead of a possibly
354	n^k k-sequences. The approximation (compressed) sets are
355	{AB,AD,CD,CB} and {AD}. Note the introduction of the spurious
356	k-sequences AD and CB. Unfortunately, this compression introduces a
357	conflict upon AD between the alternatives. PCCTS did full LL(k) and
358	ANTLR does linear approximate only as I found that linear approximate
359	lookahead works for the vast majority of parsing decisions and is
360	extremely fast. I find one or two problem spots in a large grammar
361	usually with ANTLR, which forces me to reorganize my grammar in a
362	slightly unnatural manner. Unfortunately, your brain does full LL(k)
363	and ANTLR does a slightly weaker linear approximate lookahead--a
364	source of many (invalid) bug reports ;)
365
366	<p>
367	This compression was the subject of <a href="http://www.antlr.org/papers/parr.phd.thesis.pdf"><b>my doctoral
368	dissertation</b></a> (PDF 477k) at Purdue.
369
370	<h3><a id="LL(k)" name="LL(k)">LL(k)</a></h3>
371
372	Formally, LL(k) represents a class of parsers and grammars that parse
373	symbols from left-to-right (beginning to end of input stream) using a
374	leftmost derivation and using k symbols of lookahead. A leftmost
375	derivation is one in which derivations (parses) proceed by attempting
376	to replace rule references from left-to-right within a production.
377	Given the following rule
378
379	<pre>
380	stat : IF expr THEN stat
381	\| ...
382	;
383	</pre>
384
385	an LL parser would match the IF then attempt to parse expr rather than
386	a rightmost derivation, which would attempt to parse stat first.
387
388	<p>
389	LL(k) is synonymous with a "top-down" parser because the
390	parser begins at the start symbol and works its way down the
391	derivation/parse tree (tree here means the stack of method activations
392	for recursive descent or symbol stack for a table-driven parser). A
393	recursive-descent parser is particular implementation of an LL parser
394	that uses functions or method calls to implement the parser rather
395	than a table.
396
397	<p>
398	ANTLR generates predicate-LL(k) parsers that support syntactic and
399	sematic predicates allowing you to specify many context-free and
400	context-sensitive grammars (with a bit of work).
401
402
403	<h3><a id="LT(n)" name="LT(n)">LT(n)</a></h3>
404	In a parser, this is the nth lookahead Token object.
405
406	<h3><a id="Language" name="Language">Language</a></h3> A possibly infinite set of valid sentences. The
407	vocabulary symbols may be characters, tokens, and tree nodes in an
408	ANTLR context.
409
410	<h3><a id="Lexer" name="Lexer">Lexer</a></h3>
411	A recognizer that breaks up a stream of characters into
412	vocabulary symbols for a parser. The parser pulls vocabulary symbols
413	from the lexer via a queue.
414
415	<h3><a id="Lookahead" name="Lookahead">Lookahead</a></h3>
416
417	When parsing a stream of input symbols, a parser has matched (and no
418	longer needs to consider) a portion of the stream to the left of its
419	read pointer. The next k symbols to the right of the read pointer are
420	considered the fixed lookahead. This information is used to direct
421	the parser to the next state. In an LL(k) parser this means to
422	predict which path to take from the current state using the next k
423	symbols of lookahead.
424
425	<p> ANTLR supports syntactic predicates, a manually-specified form of
426	backtracking that effectively gives you infinite lookahead. For
427	example, consider the following rule that distinguishes between sets
428	(comma-separated lists of words) and parallel assignments (one list
429	assigned to another):
430
431	<pre>
432	stat: ( list "=" )=> list "=" list
433	\| list
434	;
435	</pre>
436
437	If a list followed by an assignment operator is found on the input
438	stream, the first production is predicted. If not, the second
439	alternative production is attempted.
440
441	<h3><a id="nextToken" name="nextToken">nextToken</a></h3>
442
443	A lexer method automatically generated by ANTLR that figures out which
444	of the lexer rules to apply. For example, if you have two rules ID
445	and INT in your lexer, ANTLR will generate a lexer with methods for ID
446	and INT as well as a nextToken method that figures out which rule
447	method to attempt given k input characters.
448
449	<h3><a id="NFA" name="NFA">NFA</a></h3>
450
451	<b>N</b>ondeterministic <b>F</b>inite <b>A</b>utomata. See <a
452	href="http://www.wikipedia.org/wiki/Finite_state_machine"><b>Finite
453	state machine</b></a>.
454
455	<h3><a id="Nondeterministic" name="Nondeterministic">Nondeterministic</a></h3>
456
457	A parser is nondeterministic if there is at least one decision point
458	where the parser cannot resolve which path to take. Nondeterminisms
459	arise because of parsing strategy weaknesses.
460
461	<ul>
462
463	<li>If your strategy works only for unambiguous grammars, then
464	ambiguous grammars will yield nondeterministic parsers; this is true
465	of the basic LL, LR strategies. Even unambiguous grammars can yield
466	nondeterministic parsers though. Here is a nondeterministic LL(1)
467	grammar:
468
469	<pre>
470	decl : ID ID SEMICOLON
471	\| ID SEMICOLON
472	;
473	</pre>
474
475	Rule <tt>decl</tt> is, however, LL(2) because the second lookahead
476	symbol (either ID or SEMICOLON) uniquely determines which alternative
477	to predict. You could also left-factor the rule to reduce the
478	lookahead requirements.<br><br>
479
480	<li>
481	If you are willing to pay a
482	performance hit or simply need to handle ambiguous grammars, you can
483	use an Earley parser or a Tomita parser (LR-based) that match all
484	possible interpretations of the input, thus, avoiding the idea of
485	nondeterminism altogether. This does present problems when trying to
486	execute actions, however, because multiple parses are, in effect,
487	occurring in parallel.
488
489	</ul>
490
491	<p>
492	Note that a parser may have multiple decision points that are
493	nondeterministic.
494
495	<h3><a id="Parser" name="Parser">Parser</a></h3>
496	A recognizer that applies a grammatical structure to a stream
497	of vocabulary symbols called tokens.
498
499	<h3><a id="Predicate,_semantic" name="Predicate,_semantic">Predicate, semantic</a></h3>
500
501	A semantic predicate is a boolean expression used to alter the parse
502	based upon semantic information. This information is usually a
503	function of the constructs/input that have already been matched, but
504	can even be a flag that turns on and off subsets of the language (as
505	you might do for a grammar handling both K&R and ANSI C). One of the
506	most common semantic predicates uses a symbol table to help
507	distinguish between syntactically, but semantically different
508	productions. In FORTRAN, array references and function calls look the
509	same, but may be distinguished by checking what the type of the
510	identifier is.
511
512	<pre>
513	expr : {isVar(LT(1))}? ID LPAREN args RPAREN // array ref
514	\| {isFunction(LT(1))}? ID LPAREN args RPAREN // func call
515	;
516	</pre>
517
518	<h3><a id="Predicate,_syntactic" name="Predicate,_syntactic">Predicate, syntactic</a></h3>
519
520	A selective form of backtracking used to recognize language constructs
521	that cannot be distinguished without seeing all or most of the
522	construct. For example, in C++ some declarations look exactly like
523	expressions. You have to check to see if it is a declaration. If it
524	parses like a declaration, assume it is a declaration--reparse it with
525	"feeling" (execute your actions). If not, it must be an expression or
526	an error:
527
528	<pre>
529	stat : (declaration) => declaration
530	\| expression
531	;
532	</pre>
533
534	<h3><a id="Production" name="Production">Production</a></h3>
535
536	An alternative in a grammar rule.
537
538	<h3><a id="Protected" name="Protected">Protected</a></h3>
539
540	A protected lexer rule does not represent a complete token--it is a
541	helper rule referenced by another lexer rule. This overloading of the
542	access-visibility Java term occurs because if the rule is not visible,
543	it cannot be "seen" by the parser (yes, this nomeclature sucks).
544
545	<h3><a id="Recursive-descent" name="Recursive-descent">Recursive-descent</a></h3>
546
547	See LL(k).
548
549	<h3><a id="Regular" name="Regular">Regular</a></h3>
550
551	A regular language is one that can be described by a regular grammar
552	or regular expression or accepted by a DFA-based lexer such as those
553	generated by <tt>lex</tt>. Regular languages are normally used to
554	describe tokens.
555
556	<p> In practice you can pick out a regular grammar by noticing that
557	references to other rules are not allowed accept at the end of a
558	production. The following grammar is regular because reference to
559	<tt>B</tt> occurs at the right-edge of rule <tt>A</tt>.
560
561	<pre>
562	A : ('a')+ B ;
563	B : 'b' ;
564	</pre>
565
566	Another way to look at it is, "what can I recognize without a stack
567	(such as a method return address stack)?".
568
569	<p>
570	Regular grammars cannot describe context-free languages, hence, LL or
571	LR based grammars are used to describe programming languages. ANTLR
572	is not restricted to regular languages for tokens because it generates
573	recursive-descent lexers. This makes it handy to recognize HTML tags
574	and so on all in the lexer.
575
576	<h3><a id="Rule" name="Rule">Rule</a></h3>
577
578	A rule describes a partial sentence in a language such as a statement
579	or expression in a programming language. Rules may have one or more
580	alternative productions.
581
582	<h3><a id="Scanner" name="Scanner">Scanner</a></h3>
583
584	See Lexer.
585
586	<h3><a id="Semantics" name="Semantics">Semantics</a></h3> See <a href="http://www.jguru.com/faq/view.jsp?EID=81"><b>What
587	do "syntax" and "semantics" mean and how are they different?</b></a>.
588
589	<h3><a id="Subrule" name="Subrule">Subrule</a></h3>
590
591	Essentially a rule that has been expanded inline. Subrules are
592	enclosed in parenthesis and may have suffixes like star, plus, and
593	question mark that indicate zero-or-more, one-or-more, or optional.
594	The following rule has 3 subrules:
595
596	<pre>
597	a : (A\|B)+ (C)* (D)?
598	;
599	</pre>
600
601	<h3><a id="Syntax" name="Syntax">Syntax</a></h3>
602	See <a href="http://www.jguru.com/faq/view.jsp?EID=81"><b>What
603	do "syntax" and "semantics" mean and how are they different?</b></a>.
604
605	<h3><a id="Token" name="Token">Token</a></h3>
606
607	A vocabulary symbol for a language. This term typically refers to the
608	vocabulary symbols of a parser. A token may represent a constant
609	symbol such as a keyword like <tt>begin</tt> or a "class" of input
610	symbols like <tt>ID</tt> or <tt>INTEGER_LITERAL</tt>.
611
612	<h3><a id="Token_stream" name="Token_stream">Token stream</a></h3>
613
614	See <a href="http://www.antlr.org/doc/streams.html"><b>Token
615	Streams</b></a> in the ANTLR documentation.
616
617	<h3><a id="Tree" name="Tree">Tree</a></h3>
618	See AST and <a
619	href="http://www.jguru.com/faq/view.jsp?EID=814505"><b>What's the
620	difference between a parse tree and an abstract syntax tree (AST)? Why
621	doesn't ANTLR generate trees with nodes for grammar rules like JJTree
622	does?</b></a>.
623
624	<h3><a id="Tree_parser" name="Tree_parser">Tree parser</a></h3> A recognizer that applies a grammatical structure to a
625	two-dimensional input tree. Grammatical rules are like an "executable
626	comment" that describe the tree structure. These parsers are useful
627	during translation to (i) annotate trees with, for example, symbol
628	table information, (2) perform tree rewrites, and (3) generate output.
629
630	<h3><a id="Vocabulary" name="Vocabulary">Vocabulary</a></h3>
631
632	The set of symbols used to construct sentences in a language. These
633	symbols are usually called tokens or token types. For lexers, the
634	vocabulary is a set of characters.
635
636	<h3><a id="Wow" name="Wow">Wow</a></h3> See ANTLR.
637
638	</font>
639	</body>
640	</html>

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/yao/share/antlr-2.7.7/doc/glossary.html @ 1

Download in other formats: