Finite Automata and Regular Sets

Deterministic Finite Automata (DFA)
- Definition: a DFA is a directed graph, where vertices are called states, that has the following properties:
  - One state is designated as the start state
  - There is a set of final states(which may include the start state)
  - Edges are labelled by characters drawn from a finite alphabet
- A DFA accepts a string: if by starting at the start state, and by processing the entire string from left to right one character ata time following edges of the DFA, finally a final state is reached
- 5 things needed to specify a DFA $M$
  - The set of states: $Q = {q_{0}, . . ., q_{n}}$
  - The set of alphabet: $Σ$
  - The start state: $q_{0}$
  - The set of final states: $F = {q_{i}, . . ., q_{j}}$ , denoted using double circle in the graph
  - The transition function: $δ : Q * Σ \to Q$
- $L (M)$ = the languages accepted by DFA $M$ = the set of all strings accepted by DFA $M$
- Some applications of finite automata
  - Pattern matching
  - Lexical analysis
  - Control structures
  - Circuit design
  - An abstract formulation of what can be done with any computer that uses $O (1)$ space
- Language recognition and accepting a set: to study basic computation we use model of language recognition where an input of finite-length string is received and the program returns a yes or no
Non-deterministic Finite Automata(NFA)
- Definition: a NFA is like a DFA except that for any given character c and any given state q, there may be more than one transition from q on character c
- A string s is defined to be accepted by a NFA m if there exists choices that cause M to accept s
- Theorem: every NFA can be converted to a DFA that accepts the same set
- 5 things to specify a NFA $M$
  - The set of states: $Q = {q_{0}, . . ., q_{n}}$
  - The set of alphabet: $Σ$
  - The start state: $q_{0}$
  - The set of final states: $F = {q_{i}, . . ., q_{j}}$ , denoted using double circle in the graph
  - The transition function: $δ : Q * Σ \to 2^{Q}$
- NFA with Epsilon moves
  - Idea: generalize a NFA to have $ϵ$ -moves that allow the machine to change states without reading any input
  - $ϵ$ -closure
    - Definition: the $ϵ$ -closure $E C (q)$ is all states reachable from state q by a sequence of $ϵ$ -moves( q itself is in EC(q))
    - Theorem: for every NFA $M = (Q, Σ, δ, q_{0}, F)$ withe $ϵ$ -moves, there is a NFA $M^{'}$ without $ϵ$ -moves that accepts the same language
      - $δ^{^{'}} (q, a) = ⋃_{r \in E C (q)} δ (q, a)$
      - $F = F ⋃ {q : E C (q) c o n t a i n s a f i n a l s t a t e o f F}$
Regular expression and pattern matching
- Idea: regular expressions are a formal syntax for specifying a regular set, and can be useful for describing patterns in string matching applications
- The empty set: $ϵ$ denotes the empty string, a string of length 0. It is different than the empty set, ${ϵ}$ is a non-empty set containing the string of length 0
- Regular expressions: a regular expression describes a set of strings over an alphabet $Σ$ :
  - $Φ$ denotes the empty set
  - $ϵ$ denotes ${ϵ}$
  - For a character c in $Σ$ , $c$ denotes ${c}$
  - For two regular expressions r and s that denote sets X and Y, respectively:
    - $r + s = X ⋃ Y$
    - $r * s = {x y : s is a string in X, y is a string in Y}$
    - $r^{*} = ⋃_{i = 0}^{\infty} r^{i}$ , where $\begin{matrix} (1) & r^{*} = {\begin{cases} ϵ & if i = 0 \\ r r^{i - 1} & if i > 0 \end{cases} \end{matrix}$
- Some regular expression examples
  - $a c + b c = (a + b) c = {a c, b c}$
  - $(a + b) (ϵ + c) d = {a d, b d, a c d, b c d}$
  - $(a + b)^{*} = all possible stringsof a's and b's(including ϵ)$
  - $(a a + b b b)^{*} = strings of a'sand b's with a's in pairs and b's in triples$
- McNaughton-Yamada Algorithm: to convert a regular expression to a NFA with epsilon moves
Pumping lemma
- Definition: let M be a DFA with n states, if M accepts a string s of n or more characters, then it must be that s can be written as s = uvw where:
  - uv is at most n characters long
  - v is a least one character long(the string is turning around from a state and back to it in v)
  - For any $i \geq 0, u v^{i} w$ is accepted by M
- Corollary: the pumping lemma applies to any substring s of n characters of a string that is accepted by M

Context free languages

A context free language is a language generated by some context free grammar
The set of CFL is identical to the set of languages accepted by Pushdown Automata
CFG is defined by $G = {V, Σ, S, P}$ where
- $V$ is the set of variables or non-terminal symbols
- $Σ$ is the set of terminal symbols
- $S$ is the start symbol
- $P$ is the production rule: $A \to α$ , wher $a \in {V \cup Σ}$ and $A \in V$
Derivation tree
- A derivation tree or parse tree is an ordered rooted tree that graphically represents the semantic information of strings derived from a context free grammar
- The root is the start symbol
- Vertex: non-terminal symbol
- Leaves: terminal symbol or $ϵ$
- Left derivation tree: apply production to the leftmost variable in each step
- Right derivation tree: apply production to the rightmost variable in each step
Ambiguous grammar
- A grammar is said to be ambiguous if there exists two or more derivation tree for a string w (that means two or more left derivation trees)
Simplification of CFG
- Some production rules are not needed for the derivation of strings
- There may also be $ϵ$ productions and unit productions
- Simplification consists of the following steps
  - Reduction of CFG
  - Removal of unit productions
  - Removal of $ϵ$ productions
- Reduction of CFG
  - Phase 1: derivation of an equivalent grammar $G^{'}$ from CFG, G, suh that each variable derives some terminal string
    - Decide what are the terminal symbols
    - $W_{1}$ is the set of variables that derive a terminal symbol
    - $W_{i + 1}$ is the set variables that derive a symbol in $W_{i}$ , all variables in $W_{i}$ are in $W_{i + 1}$
    - Repeat until $W_{i + 1} = W_{i}$
    - Include all production rules that result in a terminal symbol or a variable in $W_{i}$ in $G^{'}$
  - Phase 2: derivation of an equivalent grammar $G^{″}$ from CFG $G^{'}$ , such that each symbol appears in a sentential form
    - $Y_{1} = {S}$
    - Include all symbols $Y_{i + 1}$ that can be derived from $Y_{i}$
    - Include all production rules that include symbols in $Y_{i}$
Removal of unit productions
- Unit production: any production rule of the form $A \to B$ where A, B are non-terminals is called unit production
- Removal procedure
  - To remove $A \to B$ , add $A \to x$ where $B \to x$ occurs in the grammar
  - Delete $A \to B$ from the grammar
  - Repeat until all unit productions are removed
  - If there are unreachable symbol, remove them
Removal of $ϵ$ productions
- $ϵ$ production: a non-terminal symbol A is a nullable variable is there is a production rule $A \to ϵ$ or there is a derivation that starts at A and leads to $ϵ$
- Removal procedure
  - To remove $A \to ϵ$ , look for all dependencies whose right side contains A
  - Replace each occurrence of A in each of these productions with $ϵ$
  - Add the resultant productions to the grammar
Convert CFG to Chomsky Normal Form(CNF)
- In CNF, we have a restriction on the length of RHS: elements should be two variables or a terminal
- A CFG is in CNF if the productions are in the following forms: $A \to a$ , $A \to B C$
- Convert CFG to CNF
  - If start symbol occurs on some right side, create S' and add $S^{'} \to S$ to the grammar
  - Remove $ϵ$ productions
  - Remove unit productions
  - Replace $A \to B_{1} . . . B_{n}$ where $n > 2$ with $A \to B_{1} C$ where $C \to B_{2} . . . B_{n}$ . Repeat for all productions having two or more symbols on the right side
  - For the production in the form $A \to a B$ , replace it by $A \to X B$ and $X \to a$ . Repeat for all productions in the form of $A \to a B$

Turing machines And Undecidability

Introduction
- A tape, with tape alphabets, and the special symbol empty to build the infinite tape
- Tape head: read, write, move left, move right
- Control portion of a turing machine: a deterministic program, similar to FSM or PDA
Decidability and Undecidability
- Recursive language: A language L is said to be recursive if there exists a TM which accepts all the strings in L and reject all the strings not in L. The TM will halt every time and give an answer for each string input in L
- Recursive enumerable language: A language L is said to be recursive enumerable if there exists a TM which accepts all the strings in L but may or may not halt for all input strings which are not in L
- Decidable language: a language L is decidable if it is a recursive language
- Partially decidable language: a language L is partially decidable if L is a recursively enumerable language
- Undecidable language: a language L is undecidable if not decidable. An undecidable language may sometimes be partially decidable, if a language is not even partially decidable, then there exists no TM for that language
The universal TM

My Blog

COSI 130A Theory of Computation

Finite Automata and Regular Sets

Context free languages

Turing machines And Undecidability