Ramblings by Benjamin Kovach
Theme adapted from minimal by orderedlist.
Notes on Demand More of Your Automata from the Monad Reader 16.
Intro: We want to match on more than just strings, and in a way that actually makes sense instead of stuff like
(?P<area>\d{3})[- .](?P<n1>\d{3})[-.](?P<n2>\d{4})
Regexes might “match” a string.
Three operations:
data RegularExpression
= Empty
| Singleton Char
| Kleene RegularExpression
| Catenation RegularExpression RegularExpression
| Alternation RegularExpression RegularExpression
for (ab)*c|d
, we have syntax mapping:
Singleton 'a'
\(\mapsto\) a
Kleene (...)
\(\mapsto\) (...)*
Catenation (Singleton 'a') (Singleton 'b')
\(\mapsto\) ab
Alteration (Singleton 'a') (Singleton 'b')
\(\mapsto\) a|b
True regexes guarantee that everything can happen in one pass of the input. Regex matchers usually implemented in NDAs or DFAs (only powerful enough to match regular languages).
Overview: DFA, NFA in text.
Haskell representations:
-- Assume "type State = Int" or something like that.
-- transition returns Nothing if in "error state" (w/ no transitions)
data DFA
= DFA { transition :: State -> Char -> Maybe State
, start :: State
, accepting :: State -> Bool
}
data NFA
= NFA { transition :: State -> Maybe Char -> Set State
, start :: State
, accepting :: State -> Bool
}
In NFAs, \(\epsilon\) transitions are represented by Nothing
.
Why just [Char]
? We can do better!
data RegularExpression a
= Empty
| Singleton a
| Kleene (RegularExpression a)
| Catenation (RegularExpression a) (RegularExpression a)
| Alternation (RegularExpression a) (RegularExpression a)
deriving (Show, Functor)
A way to make NFAs directly:
type StateLabel = Int
type MapNFATransitionMap a
= Map.Map (StateLabel, Maybe a) (Set.Set StateLabel)
data MapNFA a = MapNFA { transitionMap :: MapNFATransitionMap a
, startKey :: StateLabel
, acceptKeys :: Set.Set StateLabel
} deriving (Show
empty :: Ord a => MapNFA a
singleton :: Ord a => a -> MapNFA a
catenate :: Ord a => MapNFA a -> MapNFA a -> MapNFA a
alternate :: Ord a => MapNFA a -> MapNFA a -> MapNFA a
loop :: Ord a => MapNFA a -> MapNFA a
We implement loop
here instead of kleene
, which can be represented as kleene = optional . loop where
optional = alternate empty`.
The rest of the article is about HOFs for abstracting away regex design, visualizing with GraphViz, compiling to LLVM.
I skimmed through the rest of it, because it’s the Friday of my first week back at school and I’m tired. It looks like the final implementation of the Automata
library isn’t on hackage or github, and the link is broken. :(
Revisit this one later, it’s pretty cool. One of the non-string matching regexes was about matching lists of user actions (clicking, typing, following a link). Neat.