Posted onEdited onInComputer ScienceViews: Symbols count in article: 7.7kReading time ≈7 mins.
Finite Automata and Regular
Sets
Deterministic Finite Automata (DFA)
Definition: a DFA is a directed graph, where vertices are called
states, that has the following properties:
One state is designated as the start state
There is a set of final states(which may include the start
state)
Edges are labelled by characters drawn from a finite alphabet
A DFA accepts a string: if by starting at the start state, and by
processing the entire string from left to right one character ata time
following edges of the DFA, finally a final state is reached
5 things needed to specify a DFA \(M\)
The set of states: \(Q = \{q_0, ...,
q_n\}\)
The set of alphabet: \(\Sigma\)
The start state: \(q_0\)
The set of final states: \(F = \{q_i, ...,
q_j\}\), denoted using double circle in the graph
The transition function: \(\delta: Q *
\Sigma \rightarrow Q\)
\(L(M)\) = the languages accepted
by DFA \(M\) = the set of all strings
accepted by DFA \(M\)
Some applications of finite automata
Pattern matching
Lexical analysis
Control structures
Circuit design
An abstract formulation of what can be done with any computer that
uses \(O(1)\) space
Language recognition and accepting a set: to study basic computation
we use model of language recognition where an input of finite-length
string is received and the program returns a yes or no
Non-deterministic Finite Automata(NFA)
Definition: a NFA is like a DFA except that for any given character
c and any given state q, there may be more than one transition from q on
character c
A string s is defined to be accepted by a NFA m if there exists
choices that cause M to accept s
Theorem: every NFA can be converted to a DFA that accepts the same
set
5 things to specify a NFA \(M\)
The set of states: \(Q = \{q_0, ...,
q_n\}\)
The set of alphabet: \(\Sigma\)
The start state: \(q_0\)
The set of final states: \(F = \{q_i, ...,
q_j\}\), denoted using double circle in the graph
The transition function: \(\delta: Q *
\Sigma \rightarrow 2^Q\)
NFA with Epsilon moves
Idea: generalize a NFA to have \(\epsilon\)-moves that allow the machine to
change states without reading any input
\(\epsilon\)-closure
Definition: the \(\epsilon\)-closure \(EC(q)\) is all states reachable from state
q by a sequence of \(\epsilon\)-moves(
q itself is in EC(q))
Theorem: for every NFA \(M = (Q, \Sigma,
\delta, q_0, F)\) withe \(\epsilon\)-moves, there is a NFA \(M'\) without \(\epsilon\)-moves that accepts the same
language
\(\delta^{'}(q, a) =
\bigcup\limits_{r\in EC(q)}\delta(q,a)\)
Idea: regular expressions are a formal syntax for specifying a
regular set, and can be useful for describing patterns in string
matching applications
The empty set: \(\epsilon\) denotes
the empty string, a string of length 0. It is different than the empty
set, \(\{\epsilon\}\) is a non-empty
set containing the string of length 0
Regular expressions: a regular expression describes a set of strings
over an alphabet \(\Sigma\):
\(\Phi\) denotes the empty set
\(\epsilon\) denotes \(\{\epsilon\}\)
For a character c in \(\Sigma\),
\(c\) denotes \(\{c\}\)
For two regular expressions r and s that denote sets X and Y,
respectively:
\(r + s = X \bigcup Y\)
\(r*s = \{xy: \text{s is a string in X
},\text{y is a string in Y }\}\)
\(r^{*} =
\bigcup\limits_{i=0}^{\infty}r^{i}\), where \[
\begin{equation}
r^{*} =\begin{cases}
\epsilon & \text{if i = 0} \\
rr^{i-1} & \text{if i > 0}
\end{cases}
\end{equation}
\]
Some regular expression examples
\(ac + bc = (a+b)c = \{ac,
bc\}\)
\((a+b)(\epsilon+c)d = \{ad, bd, acd,
bcd\}\)
\((a+b)^{*} = \text{all possible strings
of a's and b's(including } \epsilon)\)
\((aa+bbb)^{*} = \text{strings of a's
and b's with a's in pairs and b's in triples}\)
McNaughton-Yamada Algorithm: to convert a regular expression to a
NFA with epsilon moves
Pumping lemma
Definition: let M be a DFA with n states, if M accepts a string s of
n or more characters, then it must be that s can be written as s = uvw
where:
uv is at most n characters long
v is a least one character long(the string is turning around from a
state and back to it in v)
For any \(i \ge 0, uv^{i}w\) is
accepted by M
Corollary: the pumping lemma applies to any substring s of n
characters of a string that is accepted by M
Context free languages
A context free language is a language generated by some context free
grammar
The set of CFL is identical to the set of languages accepted by
Pushdown Automata
CFG is defined by \(G = \{V, \Sigma, S,
P\}\) where
\(V\) is the set of variables or
non-terminal symbols
\(\Sigma\) is the set of terminal
symbols
\(S\) is the start symbol
\(P\) is the production rule: \(A \rightarrow \alpha\), wher \(a \in \{V \cup \Sigma\}\) and \(A \in V\)
Derivation tree
A derivation tree or parse tree is an ordered rooted tree that
graphically represents the semantic information of strings derived from
a context free grammar
The root is the start symbol
Vertex: non-terminal symbol
Leaves: terminal symbol or \(\epsilon\)
Left derivation tree: apply production to the leftmost variable in
each step
Right derivation tree: apply production to the rightmost variable in
each step
Ambiguous grammar
A grammar is said to be ambiguous if there exists two or more
derivation tree for a string w (that means two or more left derivation
trees)
Simplification of CFG
Some production rules are not needed for the derivation of
strings
There may also be \(\epsilon\)
productions and unit productions
Simplification consists of the following steps
Reduction of CFG
Removal of unit productions
Removal of \(\epsilon\)
productions
Reduction of CFG
Phase 1: derivation of an equivalent grammar \(G'\) from CFG, G, suh that each
variable derives some terminal string
Decide what are the terminal symbols
\(W_1\) is the set of variables
that derive a terminal symbol
\(W_{i + 1}\) is the set variables
that derive a symbol in \(W_i\), all
variables in \(W_i\) are in \(W_{i+1}\)
Repeat until \(W_{i+1} = W_i\)
Include all production rules that result in a terminal symbol or a
variable in \(W_i\) in \(G'\)
Phase 2: derivation of an equivalent grammar \(G''\) from CFG \(G'\), such that each symbol appears in
a sentential form
\(Y_1 = \{S\}\)
Include all symbols \(Y_{i + 1}\)
that can be derived from \(Y_{i}\)
Include all production rules that include symbols in \(Y_{i}\)
Removal of unit productions
Unit production: any production rule of the form \(A \rightarrow B\) where A, B are
non-terminals is called unit production
Removal procedure
To remove \(A \rightarrow B\), add
\(A \rightarrow x\) where \(B \rightarrow x\) occurs in the
grammar
Delete \(A \rightarrow B\) from the
grammar
Repeat until all unit productions are removed
If there are unreachable symbol, remove them
Removal of \(\epsilon\) productions
\(\epsilon\) production: a
non-terminal symbol A is a nullable variable is there is a production
rule \(A \rightarrow \epsilon\) or
there is a derivation that starts at A and leads to \(\epsilon\)
Removal procedure
To remove \(A \rightarrow
\epsilon\), look for all dependencies whose right side contains
A
Replace each occurrence of A in each of these productions with \(\epsilon\)
Add the resultant productions to the grammar
Convert CFG to Chomsky Normal Form(CNF)
In CNF, we have a restriction on the length of RHS: elements should
be two variables or a terminal
A CFG is in CNF if the productions are in the following forms: \(A \rightarrow a\), \(A \rightarrow BC\)
Convert CFG to CNF
If start symbol occurs on some right side, create S' and add \(S' \rightarrow S\) to the grammar
Remove \(\epsilon\)
productions
Remove unit productions
Replace \(A \rightarrow
B_{1}...B_{n}\) where \(n >
2\) with \(A \rightarrow
B_{1}C\) where \(C \rightarrow
B_{2}...B_{n}\). Repeat for all productions having two or more
symbols on the right side
For the production in the form \(A
\rightarrow aB\), replace it by \(A
\rightarrow XB\) and \(X \rightarrow
a\). Repeat for all productions in the form of \(A \rightarrow aB\)
Turing machines And
Undecidability
Introduction
A tape, with tape alphabets, and the special symbol empty to build
the infinite tape
Tape head: read, write, move left, move right
Control portion of a turing machine: a deterministic program,
similar to FSM or PDA
Decidability and Undecidability
Recursive language: A language L is said to be recursive if there
exists a TM which accepts all the strings in L and reject all the
strings not in L. The TM will halt every time and give an answer for
each string input in L
Recursive enumerable language: A language L is said to be recursive
enumerable if there exists a TM which accepts all the strings in L but
may or may not halt for all input strings which are not in L
Decidable language: a language L is decidable if it is a recursive
language
Partially decidable language: a language L is partially decidable if
L is a recursively enumerable language
Undecidable language: a language L is undecidable if not decidable.
An undecidable language may sometimes be partially decidable, if a
language is not even partially decidable, then there exists no TM for
that language