Chapter 2: Context-Free Languages

In this chapter we present context-free grammars, a more powerful method of describing languages. Such grammars can describe certain features that have a recursive structure.
The collection of languages associated with context-free grammars are called the context-free languages. They include all the regular languages and many additional languages.
We also introduce pushdown automata, a class of machines recognizing the context-free languages. Pushdown automata are useful because they allow us to gain additional insight into the power of context-free grammars.


2.1 Context-Free Grammars

Definition

A Context-Free Grammars is a 4-tuple \( (V,\Sigma,R,S)\), where

  1. V is a finite set called the variables,

  2. \( \Sigma\) is a finite set, disjoint from V, called the terminals,

  3. R is a finite set of rules, with each rule being a variable and a string of variables and terminals, and

  4. \( S \in V\) is the start variable.

A grammar consists of a collection of substitution rules, also called productionas. Each rule comprises a variable and a string separated by an arrow.
If u, v, and w are strings, and \(A \rightarrow w\) is a rule of the grammar, We say that uAv yields uwv, written \(uAv \Rightarrow uwv\). Say that u derives v, written \(u \overset{*}{\Rightarrow} v\), if u = v or if a sequence \(u \Rightarrow u_1 \Rightarrow u_2 \Rightarrow \dots \Rightarrow u_k \Rightarrow v\) exists for \(k \ge 0\).
The language of grammar is \(\{ w \in \Sigma^* | S \overset{*}{\Rightarrow} w \}\)

Example:
The language of grammar \(S_1 \rightarrow 0S_11 | \varepsilon\) is \(\{ 0^n1^n | n \ge 0 \}\)

Ambiguity

A derivation of a string w in a grammar G is a leftmost derivation if at every step the leftmost remaining variable is the one replaced.
A string w is derived ambiguously in context-free grammar G if it has two or more different leftmost derivations.
Grammar G is ambiguous if it generates some string ambiguously.
(We use leftmost derivation to define ambiguously because tow derivations may differ merely in the order in which they replace variables yet not in their overall structure.)

Examples:

The two parser trees for the string a+a*a in grammar:
\(\langle \text{EXPR} \rangle \rightarrow \langle \text{EXPR} \rangle + \langle \text{EXPR} \rangle\ |\ \langle \text{EXPR} \rangle * \langle \text{EXPR} \rangle\ |\ ( \langle \text{EXPR} \rangle )\ |\ a\)

Chomsky normal form

A context-free grammar is in Chomsky normal form if every rule is of the form
\(A \rightarrow BC\)
\(A \rightarrow a\)
where a is any terminal and A, B, and C are any variables–except that B and C may not be the start variable.
In addition, we permit the rule \(S \rightarrow \varepsilon\), where S is the start variable.

Any context-free language is generated by a context-free grammar in Chomsky normal form.


2.2 Pushdown Automata

Deterministic and nondeterministic pushdown automata are not equivalent in power.
We refer PDA to nondeterministic pushdown automata in this section because it’s equivalent in power to context-free grammars.

The PDA are like nondeterministic finite automata but have an extra stack.
The unlimited nature of a stack allows the PDA to store numbers of unbounded size and to recognize some nonregular languages.

Definition

A pushdown automaton is a 6-tuple \( (Q,\Sigma,\Gamma,\delta,q_0,F)\), where \( Q,\Sigma,\Gamma\), and F are all finite sets, and

  1. Q is the set of states,

  2. \( \Sigma\) is the input alphabet,

  3. \( \Gamma\) is the stack alphabet,

  4. \( \delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \longrightarrow P(Q \times \Gamma_\varepsilon)\) is the transition function,

  5. \( q_0 \in Q\) is the start state, and

  6. \( F \subseteq Q\) is the set of accept states.

Computation

Just like the computation for NFA excepts that we should have \((r_{i+1}, b) \in \delta(r_i, w_{i+1}, a)\) now.
\(a\) means the content popped from the stack (may be \(\varepsilon\)).
\(b\) means the content pushed to the stack (may be \(\varepsilon\)).

PDA can test the empty of the stack by initially placing a special symbol $ on the stack. Then if it ever sees the $ again, it knows that the stack effectively is empty.
PDA can test the end of the input because the accept state takes effect only when the machine is at the end of the input.

Theorem 2.20

A language is context free if and only if some pushdown automaton recognizes it.

lemma 2.21

If a language is context free, then some pushdown automaton recognizes it.

We design a PDA P to determine whether some series of substitutions using the rules of the grammar G can lead from the start variable to the input string w.
(其实就是用PDA的栈模拟CFG最左派生CFL的过程
1. Place the marker symbol $ and the start variable on the stack.
2. Repeat the following steps forever:

  • If the top of stack is a variable symbol A,
    nondeterministically select one of the rules for A and substitute A by the string on the right-hand side of the rule.

  • If the top of stack is a terminal symbol a,
    read the next symbol from the input and compare it to a.
    If they match, repeat; else, reject on this branch of the nondeterminism.

  • If the top of stack is the symbol $,
    enter the accept state.
    Doing so accepts the input if it has all been read.

lemma 2.27

If a pushdown automaton recognizes some language, then it is context free.

First, we simplify our task by modifying P slightly to give it the following three features:
1. It has a single accept state, \(q_{accept}\).
2. It empties its stack before accepting.
3. Each transition either pushes a symbol onto the stack (a push move) or pops one off the stack (a pop move), but it does not do both or neither at the same time.

Second, set variable \( A_{pq}\) generates all the strings that can take P from state p with an empty stack to state q with an empty stack.

Third, construct G from P:
The variables of G are \( \{A_{pq}|p,q \in Q\}\).
The start variable is \( A_{q_0,q_{accept}}\).

The rules of G consists of three parts:

  • For each \( p,q,r,s \in Q,u \in \Gamma\), and \( a,b \in \Sigma_\varepsilon\),
    if \( \delta(p,a,\varepsilon)\) contains \((r,u)\) and \( \delta(s,b,u)\) contains \((q,\varepsilon)\),
    put the rule \( A_{pq} \longrightarrow aA_{rs}b\) in G.

  • For each \( p,q,r \in Q\),
    put the rule \( A_{pq} \longrightarrow A_{pr}A_{rq}\) in G.

  • Finally, for each \( p \in Q\),
    put the rule \( A_{pq} \longrightarrow \varepsilon\) in G.

考虑某个 \( A_{pq}\)
根据 P 第一步 push 进来的 u 和最后一步 pop 出去的 v 是不是同一个可分两种情况:
同一个:第一步 push 进去的 u 一直到最后才被 pop 出去,那么在中间的过程中,u 一直呆在栈底没动过,所以其实对于中间的过程来说,可以认为也是带着空栈开始,带着空栈结束,那么 \( A_{pq} = aA_{rs}b\) (a 是读入的第一个字符,b 是读入的最后一个字符)
非同一个:那么说明 u 在中间某一刻就被 pop 出去了,假设 u 是在到达 s 这一状态时被 pop 出去的,那么 \( A_{pq} = A_{ps}A_{sq}\)

Relationship of regular and context-free languages

With Theorem 2.20, we can know that every regular language is also a context-free language because every finite automaton is automatically a pushdown automaton that simply ignore its stack.


2.3 Non-Context-Free Languages

Just like Section 1.4 where we introduced the pumping lemma for regular languages, we present a similar pumping lemma for context-free languages for proving that certain languages are not context free.

Pumping Lemma

If A is a context-free language, then there is a number p (the pumping length) where, if s is any string in A of length at least p, then s may be divided into five pieces s = uvxyz satisfying the conditions:
1. for each \( i \ge 0,\ uv^ixy^iz \in A\),
2. \( |vy| > 0\), and
3. \( |vxy| \le p\).

Proof:
Let |V| be the number of variables in G, then we can set p, the pumping length, to be \(b^{|V|+1}\)
Now if s is a string in A and its length is p or more, its parse tree must be as least \(|V| + 1\) high.
If s has several parse tree, choose \(\tau\) to be a parse tree that has the smallest number of nodes.
In the parse tree \(\tau\), we select R to be a variable that repeats among the lowest \(|V| + 1\) variables on this path.

The upper occurrence of R has a larger subtree and generates \(vxy\), whereas the lower occurrence generates just x with a smaller subtree. Both of these subtrees are generated by the same variable, so we may substitute one for the other and still obtain a valid parse tree. That establishes condition 1 of the lemma.

To get condition 2, we must be sure that v and y are not both \(\varepsilon\). If they were, the parse tree obtained by substituting the smaller subtree for the larger would have fewer nodes than \(\tau\) does and would still generate s. This result isn’t possible because we had already chosen \(\tau\) to be a parse tree for s with the smallest number of nodes.

In order to get condition 3, we choose R so that both occurrences fall within the buttom |V| + 1 variables on the path, and we choose the longest path in the parse tree, so the subtree where R generates \(vxy\) is at most |V| + 1 high. A tree of this height can generate a string of length at most \(b^{|V| + 11} = p\)

Example:
Let B be the language \(\{a^nb^nc^ | n \ge 0\}\).
Assume to contrary that B is context free.
Let p be the pumping length given by the pumping lemma.
Choose s to be the string \(a^pb^pc^p\).
It is easy to see that s can’t be pumped, which is a contradiction.


2.4.1 Deterministic Context-Free Languages: DPDA

In contract with finite automata, nondeterministic pushdown automata are more powerful than their deterministic counterparts.
The languages that are recognizable by DPDA (Deterministic Pushdown Automata) are called DCFL (Determinisitc context-free languages).

Definition

A deterministic pushdown automaton is a 6-tuple \( (Q,\Sigma,\Gamma,\delta,q_0,F)\), where \( Q,\Sigma,\Gamma\), and F are all finite sets, and

  1. Q is the set of states,

  2. \( \Sigma\) is the input alphabet,

  3. \( \Gamma\) is the stack alphabet,

  4. \( \delta: Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \longrightarrow (Q \times \Gamma_\varepsilon) \cup \{\emptyset\}\) is the transition function,

  5. \( q_0 \in Q\) is the start state, and

  6. \( F \subseteq Q\) is the set of accept states.

The transition function must satisfy the following condition.
For every \( q \in Q, a \in \Sigma,\text{ and }x \in \Gamma\), exactly one of the values
\( \delta(q,a,x),\delta(q,a,\varepsilon),\delta(q,\varepsilon,x),\text{ and }\delta(q,\varepsilon,\varepsilon)\)
is not \( \emptyset\).
The condition enforces deterministic behavior by preventing the DPDA from taking two different actions in the same situation.
The ε moves allow DPDA to read an input symbol without popping a stack symbol, and vice versa.

To reads the entire input string

A DPDA may reject inputs if it fails to read the entire input string, which introduces messy case.

Lemma 2.41
Every DPDA has an equivalent DPDA that always reads the entire input string.
For simplicity, we wil assume henceforth that DPDAs read their input to the end.

Proof
A DPDA may fail to read the entire input if the DPDA tries to pop an empty stack (hanging) or if the DPDA makes an endless sequence of ε-input moves (looping).
We solve the hanging problem by initializing the stack with a special symbol. If that symbol is later popped before the end of the input, the DPDA reads to the end of the input and rejects.
We solve the looping problem by identifying the looping situations and reprogramming the DPDA so that it reads and rejects the input instead of looping.

Property

Theorem 2.42

The class of DCFLs is closed under complementaion.

Swapping the accept and non-accept states of a DFA yields a new DFA that recognizes the complementary language, thereby proving that the class of regular languages is closed under complementation.
The same approach works for DPDAs, except for one problem. The DPDA may accept its input by entering both accept and non-accept states in a sequence of ε-input moves at the end of the input string. Interchanging accept and non-accept states would still accept the input in this case.
Therefore, We modify the DPDA so that it can’t take ε-input moves in accept states by the following approach:
only reading states–states that always read an input symbol without popping the stack (\(\delta (q, a, \varepsilon ) \ne \emptyset\)–may be the accept states.
Thus, If \(\delta (q, a, x) = (r, y)\) for \(a \in \Sigma\) and \(x \in \Gamma\), add a new state \(q_x\) and modify δ (divide that step into two: a pop and then a read) so \(\delta(q, \varepsilon, x) = (q_x, \varepsilon)\) and \(\delta(q_x, a, \varepsilon) = (r, y)\). Designate \(q_x\) to be a reading state and assign \(q_x\) to be an accept state if \(q \in F\). Finally, remove the accepting state designation from any state which isn’t a reading state.
Then, by swapping acceptance and non-acceptance only among these reading states, we invert the output of the DPDA.

This theorem implies that some CFLs are not DCFLs. Any CFL whose complement isn’t a CFL isn’t a DCFL.

Theorem 2.43

To simplify arguments, we will append the endmarker symbol \(\dashv\) to the input string.
Design DPDAs on endmarked input is often easier because we can take advantage of knowing when the input string ends.
We write the endmarked language A\(\dashv\) to be the collection of strings w\(\dashv\) where \(w \in A\).
A is a DCFL if and only if A\(\dashv\) is a DCFL.

forward direction:
Say DPDA P recognizes A. Then DPDA P’ recognizes A\(\dashv\) by simulating P until P’ reads \(\dashv\). At that point, P’ accepts If P had entered an accept state during the previous symbol. P’ doesn’t read any symbols after \(\dashv\).

reverse direction:
Let DPDA P recognize A\(\dashv\) and construct a DPDA P’ that recognizes A.
First, modify P so that each of its move does exactly one of the following operations: read, pop and push by introducing new states.
Prior to reading each input symbol, P’ determines whether P would accept if that symbol is the end of the input (\(\dashv\)). If so, P’ enters an accept state. However, P may operate the stack after it reads \(\dashv\), so determining whether it accepts after reading \(\dashv\) may depend on the stack contents.
(因为我们不能在每读入一个字符之前都把栈内容全部弹出来看一遍再做出决定,我们可持久化地记录“以当前栈内容不经过读入能达到接受状态的状态集合”
P’ simulates P, while maintaining a copy of stack contents (of P) interleaved with additional information on the stack.
Initially, P’ pushes the set \(R_0\) on the stack, where \(R_0\) contains every state q such that when P is started in q with empty stack, it eventually accepts without reading any input symbols.
To simulate a pop move, P’ first pops the set of states and discards it, then it pops again to obtain the symbol that P would have popped at this point, and uses it to determine the next move of P.
To simulate a push move \(\delta(q, \varepsilon, \varepsilon) = (r, x)\), P’ examines the set of states R on the top of its stack, then it pushes x, and the set S, where \(q \in S\) if \(q \in F\) or if \(\delta(q, \varepsilon, x) = (r, \varepsilon)\) and \(r \in R\).
To simulate a read move \(\delta(q, a, \varepsilon) = (r, \varepsilon)\), P’ examines the set R on the top of its stack, then enters an accept state \(r_x\) (copy of r) if \(r \in R\). If P’ is at the end of the input when it enters this state, it will accept the input. If if is not at the end of the input, it will continue simulating P, so \(r_x\) must be a copy of r with acceptance designated.


2.4.2 Deterministic Context-Free Languages: DCFG

This section defines DCFG (Deterministic Context-Free Grammars), the counterpart to DPDA.
However, the endmarkers are necessary in the equivalence of DCFG and DPDA.

Definition

Derivations in CFG begin with the start the start variable and proceed “top down” with a series of substituitions according to the grammar’s rules, until the derivation obtains a string of terminals.
For defining DCFG, we take a “bottom up” approach, by starting with a string of terminals and processing the derivation in reverse, employing a series of reduce steps until reaching the start variable.
More formally, if u and v are strings, write \(u \rightarrowtail v\) means the same as \(v \Rightarrow u\), that is v can be obtained from u by a reduce step. We say that u is reducible to v, written \(u \overset{*}{\rightarrowtail} v\), whenever \(v \overset{*}{\Rightarrow} u\).
In a leftmost reduction, each reducing string is reduced only after all other reducing strings that lie entirely to its left, which is a rightmost derivation in reverse.
A handle of a string u that appears in a leftmost reduction of \(w \in L(G)\) is the occurrence of the string in u, together with the reducing rule in this reduction.
A string that appears in a leftmost reduction of some string in L(G) is called a valid string. We define handles only for valid strings.
If the grammar is ambiguous, a valid string may have several handles. Unambiguous grammars generate a string by only one parse tree, and therefore in the leftmost reductions, handles are unique.
Observe that the portion of u following a handle, is always a string of terminals because the reduction is leftmost.
Unique handle doesn’t imply DCFG because choose the unique handle may require examining the entire input string, while DPDA hasn’t read the entire input when it needs to select the handle.
We say that a handle h of a valid string \(v = xhy\) is a forced handle if h is the unique handle in every valid string \(xh\hat{y}\) where \(\hat{y} \in \Sigma^*\).

A Deterministic Context-Free Grammar is a context-free grammar such that every valid string has a forced handle.

For simplicity, we’ll assume that the start variable doesn’t appear on the right-hand side of any rule and that every variables appears in a reduction of some string in the grammar’s language.

DK-test

DK

Specifically, the DFA DK accepts its input z if: z is the prefix of some valid string \(v = zy\), and z ends with a handle of v.
Moreover, each accept states of DK indicates the associated reducing rule(s).
In a general CFG, multiple reducing rules may apply, depending on which valid v extends z.
But in DCFG, each accpet state corresponds to exactly one reducing rule and have no outgoing path that leads to an accept states by reading a string in \(\Sigma^*\). Otherwise, the handle of \(zy\) would not be unique or it would depend on y.

To construct DFA DK, we’ll construct an equivalent NFA K and then convert K to DK.
In the NFA K, the start state has an ε-move to \(S_1 \rightarrow .u\) for every rule involving the start variable \(S_1\).
On each branch of its computation, K matches a potential reducing rule with a substring of the input.
If that rule’s right-hand side contains a variable, K may nondeterministically switch to some rule that expands that variable.

Lemma 2.48
K may enter state \(T \rightarrow u.v\) on reading input z iff \(z = xu\) and xuvy is a valid string with handle uv and reducing rule \(T \rightarrow uv\), for some \(y \in \Sigma^*\).
(Because K is nondeterministic, we say that it “may” enter a state to mean that K does enter that state on some branch of its nondeterminism.)
Therefore, K may enter state \(T \rightarrow h.\) on input z iff \(z = xh\) and h is a handle of some valid string xhy with reducing rule \(T \rightarrow h\).

Finally, we convert NFA K to DFA DK by the subset construction introduced in Chapter 1, and then remove all states that are unreachable from the start state.
Each of DK’s states thus contains one or more dotted rules.
Each accept state contains at least one completed rule.

(However, when building the DFA DK in pratice, a direct construction may be faster than first constructing the NFA K.)

test

In the DK-test, starting with a CFG G, we construct the associated DFA DK, and then determine whether G is deterministic by examining DK’s accept states.
The DK-test stipulates that every accept state contains exactly one completed rule and no dotted rule in which a terminal symbol immediately follows the dot (\(B \rightarrow u.av\) for \(a \in \Sigma\)).

(In the DK-test, only terminal symbol can not immediately follows the dot, because in the definition of forced handle, \(\hat{y} \in \Sigma^*\).)

Theorem 2.52
G passes the DK-test iff G is a DCFG.
Equivalently, the test fails iff some handle isn’t forced.1
If the DK-test fails because an accept state has two completed rules, extend the associated string to two valid strings with differing handles at that point.
Similarly, if it has a completed rule and a dotted rule with a terminal following the dot, employ Lemma 2.48 to get two valid extensions with differing handles at that point.

Relationship of DPDA and DCFG

We restrict the equivalence to endmarked languages, because the although endmarkers don’t affect the class of languages that DPDA recognize, but they do affect the class of languages that DCFG generate.
Without endmarkers, DCFG generate only a subclass of the DCFL–those that are prefix-free.
(Note that every endmarked language is prefix-free)

Theorem 2.57
An endmarked language is generated by a deterministic context-free grammar iff it is deterministic context free.

lemma 2.58

Every DCFG has an equivalent DPDA.

We will convert a DCFG G to an equivalent DPDA P.
P simulates the DK of G on symbols it reads from the input until DK accepts. Because G is deterministic, P can use this handle to identify the first reduce step fro its input string, even though is has read only a part of its input at this point.
In order to identify the second and subsequent reduce step, P store the states of DK on the stack.
(P can’t store the input string in the stack because the modified input would not remain available fro later steps.)
(可持久化地记录下 DK 的状态)
Every time P reads an input symbol and simulates a move in DK, it records DK’s state by pushing it on the stack.
When it performs a reduce step using reducing rule \(T \rightarrow u\), it pops |u| (the length of u) states off the stack, revealing the state DK was in prior to reading u. It resets DK to that state, then simulates it on input T and pushes the resulting state on the stack.
When P pushes the start variable on the stack, it has found a reduction of its input to the start variable, so it enters an accept state.

lemma 2.59

Every DPDA that recognizes an endmarked language has an equivalent DCFG.

The proof is a modification of the construction in Lemma 2.27 that describes the conversion of a PDA P to an equivalent CFG G. Here P and G are deterministic.
In the proof for Lemma 2.27, we altered P to empty its stack before enter a specific accept state because P can use its nondeterminism to guess that it is at the end of its input. Instead, we use the assumption that L(P) is endmarked.
Next, we apply a similar grammar construction to obtain G, then show its determinism by the DK-test.

Parsing and LR(k) Grammars

Definition

Let h be a handle of a valid string \(v = xhy\). Say that h is forced by lookahead K if h is the unique handle of every valid string \(xh\hat{y}\) where \(\hat{y} \in \Sigma^*\) and where \(y\) and \(\hat{y}\) agree on their first k symbols. (If either string is shorter than k, the strings must agree up to the length of the shorter one.)
An LR(k) grammar is a context-free grammar such that the handle of every valid string is forced by lookahead k.

Many algorithm for membership and parsing are based on DPDA, and therefore are efficient.
However, the requirement that all handles are forced is often an obstacle to designing intuitive DCFG.
Fortunately, the LR(k) grammars, close enough to DCFG to allow direct conversion into DPDA (therefore are equivalent in power for all k and DCFG), are expressive enough for many applications.
In a DCFG, all handles are forced (independent of the terminal symbols that follow the handle).
In an LR(k) grammar, a handle may also depend on symbols that follow the handle, but only on the first k of these.
The acronym LR(k) stands for: Left to right input processing, Rightmost derivations (or equivalentlt, leftmost reductions), and k symbols of lookahead.

LR(1) grammars

The LR(0) grammar is the same as DCFG.
we will show as an example that LR(1) grammars are more convenient than DCFG for specifying certain languages.

The \(DK_1\text{-test}\), first construct the DFA \(DK_1\), and then stipulates that every accpet state must not contain any two consistent dotted rules.
Let \(R_1\) be a completed rule with lookahead symbol \(a_1\), and let \(R_2\) be a dotted rule with lookahead symbol \(a_2\).
Say that \(R_1\) and \(R_2\) are consistent if \(R_2\) is completed and \(a_1 = a_2\), or \(R_2\) is not completed and \(a_1\) immediately follows its dot.

Theorem 2.63
G passes the \(DK_1\text{-test}\) iff G is an LR(1) grammar.

Theorem 2.66
An endmarked language is generated by an LR(1) grammar iff it is a DCFL.


Leave a Reply