0%

COSI 130A Theory of Computation

Finite Automata and Regular Sets

  1. Deterministic Finite Automata (DFA)
    • Definition: a DFA is a directed graph, where vertices are called states, that has the following properties:
      • One state is designated as the start state
      • There is a set of final states(which may include the start state)
      • Edges are labelled by characters drawn from a finite alphabet
    • A DFA accepts a string: if by starting at the start state, and by processing the entire string from left to right one character ata time following edges of the DFA, finally a final state is reached
    • 5 things needed to specify a DFA M
      • The set of states: Q={q0,...,qn}
      • The set of alphabet: Σ
      • The start state: q0
      • The set of final states: F={qi,...,qj}, denoted using double circle in the graph
      • The transition function: δ:QΣQ
    • L(M) = the languages accepted by DFA M = the set of all strings accepted by DFA M
    • Some applications of finite automata
      • Pattern matching
      • Lexical analysis
      • Control structures
      • Circuit design
      • An abstract formulation of what can be done with any computer that uses O(1) space
    • Language recognition and accepting a set: to study basic computation we use model of language recognition where an input of finite-length string is received and the program returns a yes or no
  2. Non-deterministic Finite Automata(NFA)
    • Definition: a NFA is like a DFA except that for any given character c and any given state q, there may be more than one transition from q on character c
    • A string s is defined to be accepted by a NFA m if there exists choices that cause M to accept s
    • Theorem: every NFA can be converted to a DFA that accepts the same set
    • 5 things to specify a NFA M
      • The set of states: Q={q0,...,qn}
      • The set of alphabet: Σ
      • The start state: q0
      • The set of final states: F={qi,...,qj}, denoted using double circle in the graph
      • The transition function: δ:QΣ2Q
    • NFA with Epsilon moves
      • Idea: generalize a NFA to have ϵ-moves that allow the machine to change states without reading any input
      • ϵ-closure
        • Definition: the ϵ-closure EC(q) is all states reachable from state q by a sequence of ϵ-moves( q itself is in EC(q))
        • Theorem: for every NFA M=(Q,Σ,δ,q0,F) withe ϵ-moves, there is a NFA M without ϵ-moves that accepts the same language
          • δ(q,a)=rEC(q)δ(q,a)
          • F=F{q:EC(q)containsafinalstateofF}
  3. Regular expression and pattern matching
    • Idea: regular expressions are a formal syntax for specifying a regular set, and can be useful for describing patterns in string matching applications
    • The empty set: ϵ denotes the empty string, a string of length 0. It is different than the empty set, {ϵ} is a non-empty set containing the string of length 0
    • Regular expressions: a regular expression describes a set of strings over an alphabet Σ:
      • Φ denotes the empty set
      • ϵ denotes {ϵ}
      • For a character c in Σ, c denotes {c}
      • For two regular expressions r and s that denote sets X and Y, respectively:
        • r+s=XY
        • rs={xy:s is a string in X ,y is a string in Y }
        • r=i=0ri, where (1)r={ϵif i = 0rri1if i > 0
    • Some regular expression examples
      • ac+bc=(a+b)c={ac,bc}
      • (a+b)(ϵ+c)d={ad,bd,acd,bcd}
      • (a+b)=all possible stringsof a's and b's(including ϵ)
      • (aa+bbb)=strings of a'sand b's with a's in pairs and b's in triples
    • McNaughton-Yamada Algorithm: to convert a regular expression to a NFA with epsilon moves
  4. Pumping lemma
    • Definition: let M be a DFA with n states, if M accepts a string s of n or more characters, then it must be that s can be written as s = uvw where:
      • uv is at most n characters long
      • v is a least one character long(the string is turning around from a state and back to it in v)
      • For any i0,uviw is accepted by M
    • Corollary: the pumping lemma applies to any substring s of n characters of a string that is accepted by M

Context free languages

  1. A context free language is a language generated by some context free grammar
  2. The set of CFL is identical to the set of languages accepted by Pushdown Automata
  3. CFG is defined by G={V,Σ,S,P} where
    • V is the set of variables or non-terminal symbols
    • Σ is the set of terminal symbols
    • S is the start symbol
    • P is the production rule: Aα, wher a{VΣ} and AV
  4. Derivation tree
    • A derivation tree or parse tree is an ordered rooted tree that graphically represents the semantic information of strings derived from a context free grammar
    • The root is the start symbol
    • Vertex: non-terminal symbol
    • Leaves: terminal symbol or ϵ
    • Left derivation tree: apply production to the leftmost variable in each step
    • Right derivation tree: apply production to the rightmost variable in each step
  5. Ambiguous grammar
    • A grammar is said to be ambiguous if there exists two or more derivation tree for a string w (that means two or more left derivation trees)
  6. Simplification of CFG
    • Some production rules are not needed for the derivation of strings
    • There may also be ϵ productions and unit productions
    • Simplification consists of the following steps
      • Reduction of CFG
      • Removal of unit productions
      • Removal of ϵ productions
    • Reduction of CFG
      • Phase 1: derivation of an equivalent grammar G from CFG, G, suh that each variable derives some terminal string
        • Decide what are the terminal symbols
        • W1 is the set of variables that derive a terminal symbol
        • Wi+1 is the set variables that derive a symbol in Wi, all variables in Wi are in Wi+1
        • Repeat until Wi+1=Wi
        • Include all production rules that result in a terminal symbol or a variable in Wi in G
      • Phase 2: derivation of an equivalent grammar G from CFG G, such that each symbol appears in a sentential form
        • Y1={S}
        • Include all symbols Yi+1 that can be derived from Yi
        • Include all production rules that include symbols in Yi
  7. Removal of unit productions
    • Unit production: any production rule of the form AB where A, B are non-terminals is called unit production
    • Removal procedure
      • To remove AB, add Ax where Bx occurs in the grammar
      • Delete AB from the grammar
      • Repeat until all unit productions are removed
      • If there are unreachable symbol, remove them
  8. Removal of ϵ productions
    • ϵ production: a non-terminal symbol A is a nullable variable is there is a production rule Aϵ or there is a derivation that starts at A and leads to ϵ
    • Removal procedure
      • To remove Aϵ, look for all dependencies whose right side contains A
      • Replace each occurrence of A in each of these productions with ϵ
      • Add the resultant productions to the grammar
  9. Convert CFG to Chomsky Normal Form(CNF)
    • In CNF, we have a restriction on the length of RHS: elements should be two variables or a terminal
    • A CFG is in CNF if the productions are in the following forms: Aa, ABC
    • Convert CFG to CNF
      • If start symbol occurs on some right side, create S' and add SS to the grammar
      • Remove ϵ productions
      • Remove unit productions
      • Replace AB1...Bn where n>2 with AB1C where CB2...Bn. Repeat for all productions having two or more symbols on the right side
      • For the production in the form AaB, replace it by AXB and Xa. Repeat for all productions in the form of AaB

Turing machines And Undecidability

  1. Introduction
  • A tape, with tape alphabets, and the special symbol empty to build the infinite tape
  • Tape head: read, write, move left, move right
  • Control portion of a turing machine: a deterministic program, similar to FSM or PDA
  1. Decidability and Undecidability
  • Recursive language: A language L is said to be recursive if there exists a TM which accepts all the strings in L and reject all the strings not in L. The TM will halt every time and give an answer for each string input in L
  • Recursive enumerable language: A language L is said to be recursive enumerable if there exists a TM which accepts all the strings in L but may or may not halt for all input strings which are not in L
  • Decidable language: a language L is decidable if it is a recursive language
  • Partially decidable language: a language L is partially decidable if L is a recursively enumerable language
  • Undecidable language: a language L is undecidable if not decidable. An undecidable language may sometimes be partially decidable, if a language is not even partially decidable, then there exists no TM for that language
  1. The universal TM