For Programmers: Free Programming Magazines  


Home > Archive > Compilers > October 2007 > Intersection of regular expression languages









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Intersection of regular expression languages
Hans Aberg

2007-10-21, 7:18 pm

Can the intersection of two regular expression languages be
constructed as a regular expression language?

Hans Aberg

Neelakantan Krishnaswami

2007-10-22, 4:39 am

Hans Aberg <haberg@math.su.se> wrote:
> Can the intersection of two regular expression languages be
> constructed as a regular expression language?


Yes.

Here's an Ocaml program that tests whether a string is recognized by a
regular expression language including intersection, which uses
Brzozowski's method of derivatives.

This code is not terribly efficient, which is not a problem with the
method of derivatives, but rather with my toy implementation.

(* The datatype of regular expressions. Intersection and union are
And and Or respectively. *)

type re =
| Char of char
| Epsilon
| Seq of re * re
| Star of re
| False
| Or of re * re
| True
| And of re * re

(* Check whether a regular expression matches the empty string. *)

let rec nullable re =
match re with
| Char _ -> false
| Epsilon -> true
| Seq(r1, r2) -> (nullable r1) && (nullable r2)
| Star r -> true
| False -> false
| Or(r1, r2) -> (nullable r1) || (nullable r2)
| True -> true
| And(r1, r2) -> (nullable r1) && (nullable r2)

(* deriv c r computes the c-th "derivative" of r. What this
means is that we compute a regular expression r' from r, such
that if (c ^ s) is recognized by r, then r' recognizes s. *)

let rec deriv c re =
match re with
| Char c' -> if c = c' then Epsilon else False
| Epsilon -> False
| Seq(r1, r2) ->
let r1' = deriv c r1 in
if nullable r1 then
Or(Seq(r1', r2), deriv c r2)
else
Seq(r1', r2)
| Star r -> Seq(deriv c r, (Star r))
| False -> False
| Or(r1, r2) -> Or(deriv c r1, deriv c r2)
| True -> True
| And(r1, r2) -> And(deriv c r1, deriv c r2)

(* check uses derivatives to match the substring of s starting at
position at i. If the i-th character of s is c, then we compute
the c'th derivative and look at position i+1. *)

let rec check s re i =
if i = String.length s then
if nullable re then Some i else None
else
check s (deriv s.[i] re) (i+1)

--
Neel R. Krishnaswami
neelk@cs.cmu.edu
Chris F Clark

2007-10-22, 4:39 am

haberg@math.su.se (Hans Aberg) writes:

> Can the intersection of two regular expression languages be
> constructed as a regular expression language?


I'm going to say something which I hope isn't stupid in response to
this. The class of regular languages forms a Boolean algebra with
respect to the union and intersection operators, which I believe means
that they also form a field with those operators, which means they are
closed with respect to both of them. I believe one can even use the
class of regular languages as a topology.

So, in case, this is a homework problem (since it is a commonly asked
hw problem in automata theory). Identify what the identity languages
are for the two operators. Show that idempotence holds. Find the
union and intersection inverses of a given language. Is there
something interesting about the two inverses? Does DeMorgan's theorem
hold?

Hope this helps,
-Chris

****************************************
*************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
Gene

2007-10-22, 4:39 am

On Oct 19, 3:39 pm, haberg@math.su.se (Hans Aberg) wrote:
> Can the intersection of two regular expression languages be
> constructed as a regular expression language?


Sure, but either your terminology is off a little or I'm going to
answer the wrong question. There are regular languages and there are
regular expressions, which are a language to describe regular
languages. So I think you are asking, given two regexes, can you find
a third regex that represents the intersection of the languages
described by the first two?

For this, see just about any textbook on finite automata. Thompson's
construction gets you from regex to NFA. Subset construction gets you
to DFA. There is a standard algorithm for finding a DFA that
recognizes the intersection of languages recognized by two others A
and B. Roughly speaking, you "simulate" A and B running in parallel
with a new DFA C having states labeled by pairs of original states
(something like the subset construction), one from A and one from B.
The accepting states of C are those labeled by two accepting states.
Then there is another standard algorithm to get from a DFA to a
regex. This is often presented as part of the proof of Kleene's
theorem. In practice you'd probably want to minimize the intersection
machine, which is another standard algorithm.

Some of the above steps are given in many compiler textbooks, but not
all.
SM Ryan

2007-10-22, 4:39 am

haberg@math.su.se (Hans Aberg) wrote:
# Can the intersection of two regular expression languages be
# constructed as a regular expression language?

Type 3 languages are closed under intersection, so it is possible.
I don't recall offhand, but I think it might just be the intersection
of the state graphs. Convert the REs to DFAs, work with the graphs,
and then you can convert DFAs back to REs.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
Haven't you ever heard the customer is always right?
Hans Aberg

2007-10-22, 7:15 pm

Gene <gene.ressler@gmail.com> wrote:

>
> Sure, but either your terminology is off a little or I'm going to
> answer the wrong question. There are regular languages and there are
> regular expressions, which are a language to describe regular
> languages.


There are regular or Chomsky hierarchy type 3 grammars, and the
languages they produce, I gather are called regular languages.

But a regular expression algebra is just a language expressing some of
the language set and monoid operations:

If C is a character set, let C* denote the free monoid (set of finite
strings), then a regular expression language has symbols for elements
in C, the union, monoid unit and multiplication (concatenation), and
the submonoid generated by a set (Kleene closure).

I was not sure how that related to regular languages.

> So I think you are asking, given two regexes, can you find a third
> regex that represents the intersection of the languages described by
> the first two?


Yes. As the language complement can be constructed on the DFA level,
de Morgan shows the intersection can be constructed as well. I
wondered if there was a more direct construction.

> There is a standard algorithm for finding a DFA that recognizes the
> intersection of languages recognized by two others A and B. Roughly
> speaking, you "simulate" A and B running in parallel with a new DFA
> C having states labeled by pairs of original states (something like
> the subset construction), one from A and one from B.


So that would do it. Thanks.

Hans Aberg
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com