Posted onEdited onInComputer ScienceViews: 32 Symbols count in article: 7.7kReading time ≈7 mins.
Finite Automata and Regular
Sets
Deterministic Finite Automata (DFA)
Definition: a DFA is a directed graph, where vertices are called
states, that has the following properties:
One state is designated as the start state
There is a set of final states(which may include the start
state)
Edges are labelled by characters drawn from a finite alphabet
A DFA accepts a string: if by starting at the start state, and by
processing the entire string from left to right one character ata time
following edges of the DFA, finally a final state is reached
5 things needed to specify a DFA
The set of states:
The set of alphabet:
The start state:
The set of final states: , denoted using double circle in the graph
The transition function:
= the languages accepted
by DFA = the set of all strings
accepted by DFA
Some applications of finite automata
Pattern matching
Lexical analysis
Control structures
Circuit design
An abstract formulation of what can be done with any computer that
uses space
Language recognition and accepting a set: to study basic computation
we use model of language recognition where an input of finite-length
string is received and the program returns a yes or no
Non-deterministic Finite Automata(NFA)
Definition: a NFA is like a DFA except that for any given character
c and any given state q, there may be more than one transition from q on
character c
A string s is defined to be accepted by a NFA m if there exists
choices that cause M to accept s
Theorem: every NFA can be converted to a DFA that accepts the same
set
5 things to specify a NFA
The set of states:
The set of alphabet:
The start state:
The set of final states: , denoted using double circle in the graph
The transition function:
NFA with Epsilon moves
Idea: generalize a NFA to have -moves that allow the machine to
change states without reading any input
-closure
Definition: the -closure is all states reachable from state
q by a sequence of -moves(
q itself is in EC(q))
Theorem: for every NFA withe -moves, there is a NFA without -moves that accepts the same
language
Regular expression and pattern matching
Idea: regular expressions are a formal syntax for specifying a
regular set, and can be useful for describing patterns in string
matching applications
The empty set: denotes
the empty string, a string of length 0. It is different than the empty
set, is a non-empty
set containing the string of length 0
Regular expressions: a regular expression describes a set of strings
over an alphabet :
denotes the empty set
denotes
For a character c in ,
denotes
For two regular expressions r and s that denote sets X and Y,
respectively:
, where
Some regular expression examples
McNaughton-Yamada Algorithm: to convert a regular expression to a
NFA with epsilon moves
Pumping lemma
Definition: let M be a DFA with n states, if M accepts a string s of
n or more characters, then it must be that s can be written as s = uvw
where:
uv is at most n characters long
v is a least one character long(the string is turning around from a
state and back to it in v)
For any is
accepted by M
Corollary: the pumping lemma applies to any substring s of n
characters of a string that is accepted by M
Context free languages
A context free language is a language generated by some context free
grammar
The set of CFL is identical to the set of languages accepted by
Pushdown Automata
CFG is defined by where
is the set of variables or
non-terminal symbols
is the set of terminal
symbols
is the start symbol
is the production rule: , wher and
Derivation tree
A derivation tree or parse tree is an ordered rooted tree that
graphically represents the semantic information of strings derived from
a context free grammar
The root is the start symbol
Vertex: non-terminal symbol
Leaves: terminal symbol or
Left derivation tree: apply production to the leftmost variable in
each step
Right derivation tree: apply production to the rightmost variable in
each step
Ambiguous grammar
A grammar is said to be ambiguous if there exists two or more
derivation tree for a string w (that means two or more left derivation
trees)
Simplification of CFG
Some production rules are not needed for the derivation of
strings
There may also be
productions and unit productions
Simplification consists of the following steps
Reduction of CFG
Removal of unit productions
Removal of
productions
Reduction of CFG
Phase 1: derivation of an equivalent grammar from CFG, G, suh that each
variable derives some terminal string
Decide what are the terminal symbols
is the set of variables
that derive a terminal symbol
is the set variables
that derive a symbol in , all
variables in are in
Repeat until
Include all production rules that result in a terminal symbol or a
variable in in
Phase 2: derivation of an equivalent grammar from CFG , such that each symbol appears in
a sentential form
Include all symbols
that can be derived from
Include all production rules that include symbols in
Removal of unit productions
Unit production: any production rule of the form where A, B are
non-terminals is called unit production
Removal procedure
To remove , add
where occurs in the
grammar
Delete from the
grammar
Repeat until all unit productions are removed
If there are unreachable symbol, remove them
Removal of productions
production: a
non-terminal symbol A is a nullable variable is there is a production
rule or
there is a derivation that starts at A and leads to
Removal procedure
To remove , look for all dependencies whose right side contains
A
Replace each occurrence of A in each of these productions with
Add the resultant productions to the grammar
Convert CFG to Chomsky Normal Form(CNF)
In CNF, we have a restriction on the length of RHS: elements should
be two variables or a terminal
A CFG is in CNF if the productions are in the following forms: ,
Convert CFG to CNF
If start symbol occurs on some right side, create S' and add to the grammar
Remove
productions
Remove unit productions
Replace where with where . Repeat for all productions having two or more
symbols on the right side
For the production in the form , replace it by and . Repeat for all productions in the form of
Turing machines And
Undecidability
Introduction
A tape, with tape alphabets, and the special symbol empty to build
the infinite tape
Tape head: read, write, move left, move right
Control portion of a turing machine: a deterministic program,
similar to FSM or PDA
Decidability and Undecidability
Recursive language: A language L is said to be recursive if there
exists a TM which accepts all the strings in L and reject all the
strings not in L. The TM will halt every time and give an answer for
each string input in L
Recursive enumerable language: A language L is said to be recursive
enumerable if there exists a TM which accepts all the strings in L but
may or may not halt for all input strings which are not in L
Decidable language: a language L is decidable if it is a recursive
language
Partially decidable language: a language L is partially decidable if
L is a recursively enumerable language
Undecidable language: a language L is undecidable if not decidable.
An undecidable language may sometimes be partially decidable, if a
language is not even partially decidable, then there exists no TM for
that language