are module
Library for defining and operating on abstract regular expressions that work with any symbol type, with an emphasis on supporting scenarios in which it is necessary to work with regular expressions as abstract mathematical objects.
- class are.are.are(iterable=(), /)[source]
Bases:
tuple
Base class for abstract regular expression instances (and the individual nodes found within an abstract syntax tree instance). Abstract regular expressions can contain symbols of any immutable type and can be built up using common operators such as concatenation, alternation, and repetition.
>>> a = con(lit(1), con(lit(2), lit(3))) >>> a([1, 2, 3]) 3
This class is derived from the built-in
tuple
type. Each instance of this class acts as a node within the abstract syntax tree representing an abstract regular expression. The elements inside the instance are the child nodes of the node represented by the instance and can be accessed in the usual manner supported by thetuple
type.>>> a = con(lit(1), con(lit(2), lit(3))) >>> a[1] con(lit(2), lit(3))
- to_nfa() nfa.nfa.nfa [source]
Convert this abstract regular expression instance into a nondeterministic finite automaton (NFA) that accepts the set of iterables that satisfies this instance.
>>> a = con(lit(1), con(lit(2), lit(3))) >>> a.to_nfa() nfa({1: nfa({2: nfa({3: nfa()})})})
- compile() are.are.are [source]
Convert this instance into an equivalent NFA and store it internally as an attribute (to enable more efficient matching). Return the original abstract regular expression instance.
>>> a = alt(lit('x'), rep(con(lit('y'), lit('z')))) >>> a = a.compile() >>> a(['x']) 1 >>> a(['y', 'z', 'y', 'z']) 4
- to_re() str [source]
If this instance has string symbols (and no other symbols of any other type), convert it to an equivalent regular expression string that is compatible with the built-in
re
module.>>> rep(alt(con(lit('a'), lit('b')), emp())).to_re() '((((a)(b))|)*)' >>> rep(alt(con(lit('a'), con(lit('b'), nul())), emp())).to_re() '((((a)((b)[^\\w\\W]))|)*)'
Any attempt to convert an instance that has non-string symbols raises an exception.
>>> rep(alt(con(lit(123), lit(456)), emp())).to_re() Traceback (most recent call last): ... TypeError: all symbols must be strings
- __call__(string: Iterable, full: bool = True) Optional[int] [source]
Determine whether an iterable of symbols (i.e., an abstract string in the formal sense associated with the mathematical definition of a regular expression) is in the formal language represented by this instance. By default, the length of the abstract string is returned if the abstract string satisfies this instance.
>>> a = rep(con(lit(1), lit(2))) >>> a([1, 2, 1, 2, 1, 2]) 6 >>> a = alt(rep(lit(2)), rep(lit(3))) >>> a([2, 2, 2, 2, 2]) 5 >>> a([3, 3, 3, 3]) 4
If the supplied abstract string does not satisfy this instance, then
None
is returned.>>> a([1, 1, 1]) is None True
If the optional parameter
full
is set toFalse
, then the length of the longest prefix of the abstract string that satisfies this instance is returned. If no prefix satisfies this instance, thenNone
is returned.>>> a = con(lit(1), con(lit(2), lit(3))) >>> a([1, 2, 3, 4, 5], full=False) 3 >>> a = con(lit(1), con(lit(2), lit(3))) >>> a([4, 4, 4], full=False) is None True
If an instance is satisfied by the empty abstract string and
full
is set toFalse
, then the empty prefix of any abstract string satisfies the abstract regular expression instance (and, thus, a successful integer result of0
is returned in such cases).>>> a = alt(lit(2), emp()) # Satisfied by the empty abstract string. >>> a([1, 1, 1], full=False) # Empty string is a prefix of ``[1, 1, 1]``. 0
Any attempt to apply an abstract regular expression instance to a non-iterable raises an exception.
>>> nul()(123) Traceback (most recent call last): ... ValueError: input must be an iterable
- __str__() str [source]
Return string representation of instance.
>>> a = rep(con(lit(1), alt(lit(2), lit(3)))) >>> str(a) 'rep(con(lit(1), alt(lit(2), lit(3))))'
Assuming that this module has been imported in a manner such that the
are
subclasses are associated with the variables as they appear in this module (e.g.,con
is associated with the variablecon
in the relevant scope), the strings returned by this method can be evaluated to reconstruct the instance.>>> a = rep(con(lit(1), alt(lit(2), lit(3)))) >>> eval(str(a)) rep(con(lit(1), alt(lit(2), lit(3))))
- class are.are.nul[source]
Bases:
are.are.are
Singleton class containing an object that corresponds to the sole abstract regular expression instance that cannot be satisfied by any iterable (i.e., that cannot be satisfied by any abstract string).
>>> (nul()(iter('ab')), nul()(iter('abc'), full=False)) (None, None) >>> r = nul() >>> (r(''), r('abc'), r('', full=False), r('abc', full=False)) (None, None, None, None)
More usage examples involving compilation of
are
instances that contain instances of this class are presented below.>>> r = r.compile() >>> (r(''), r('abc'), r('', full=False), r('abc', full=False)) (None, None, None, None) >>> ((con(nul(), lit('a')))('a'), (con(nul(), lit('a'))).compile()('a')) (None, None) >>> ((con(lit('a'), nul()))('a'), (con(lit('a'), nul())).compile()('a')) (None, None) >>> ((alt(nul(), lit('a')))('a'), (alt(nul(), lit('a'))).compile()('a')) (1, 1) >>> ((alt(lit('a'), nul()))('a'), (alt(lit('a'), nul())).compile()('a')) (1, 1) >>> ((alt(nul(), nul()))('a'), (alt(nul(), nul())).compile()('a')) (None, None) >>> (con(rep(nul()), lit('a')).compile())('a') 1
Any attempt to apply an abstract regular expression instance to a non-iterable raises an exception.
>>> nul()(123) Traceback (most recent call last): ... ValueError: input must be an iterable
- class are.are.emp[source]
Bases:
are.are.are
Singleton class containing an object that corresponds to the sole abstract regular expression instance that is satisfied only by an empty iterable (i.e., an abstract string with a length of zero).
>>> (emp()(''), emp()('ab')) (0, None) >>> emp()(iter('ab')) is None True >>> emp()('abc', full=False) 0 >>> emp()(iter('abc'), full=False) 0 >>> r = emp().compile() >>> (r(''), r('abc')) (0, None)
Any attempt to apply an abstract regular expression instance to a non-iterable raises an exception.
>>> emp()(123) Traceback (most recent call last): ... ValueError: input must be an iterable
- class are.are.lit(argument)[source]
Bases:
are.are.are
Abstract regular expression instances that are satisfied by exactly one symbol. Instances of this class also serve as the leaf nodes (i.e., base cases) corresponding to abstract string literals (in the formal sense associated with the mathematical definition of a regular expression).
>>> (lit('a')(''), lit('a')('a'), lit('a')('ab')) (None, 1, None) >>> (lit('a')('', full=False), lit('a')('ab', full=False)) (None, 1) >>> lit('a')(iter('ab'), full=False) 1 >>> r = lit('a').compile() >>> (r('a'), r('')) (1, None)
Any attempt to apply an abstract regular expression instance to a non-iterable raises an exception.
>>> lit('a')(123) Traceback (most recent call last): ... ValueError: input must be an iterable
- class are.are.con(*arguments)[source]
Bases:
are.are.are
Concatenation operation for two
are
instances. Instances of this class also serve as the internal nodes of the tree data structure representing an abstract regular expression.>>> r = con(lit('a'), lit('b')) >>> (r('ab'), r('a'), r('abc'), r('cd')) (2, None, None, None) >>> (r(iter('ab')), r(iter('a')), r(iter('abc')), r(iter('cd'))) (2, None, None, None) >>> (r('a', full=False), r('abc', full=False), r('cd', full=False)) (None, 2, None) >>> (r(iter('a'), full=False), r(iter('abc'), full=False), r(iter('cd'), full=False)) (None, 2, None) >>> r = con(lit('a'), con(lit('b'), lit('c'))) >>> (r('abc'), r('abcd', full=False), r('ab')) (3, 3, None) >>> (r(iter('abc')), r(iter('abcd'), full=False), r(iter('ab'))) (3, 3, None) >>> r = con(con(lit('a'), lit('b')), lit('c')) >>> r('abc') 3 >>> r(iter('abc')) 3 >>> r = con(lit('a'), lit('b')).compile() >>> (r('ab'), r('a'), r('abc'), r('cd')) (2, None, None, None)
Any attempt to apply an abstract regular expression instance to a non-iterable raises an exception.
>>> r(123) Traceback (most recent call last): ... ValueError: input must be an iterable
- class are.are.alt(*arguments)[source]
Bases:
are.are.are
Alternation operation for two
are
instances. Instances of this class also serve as the internal nodes of the tree data structure representing an abstract regular expression.>>> r = alt(con(lit('a'), lit('a')), lit('a')) >>> r('aa') 2 >>> r = alt(lit('b'), con(lit('a'), lit('a'))) >>> r('aa') 2 >>> r = con(alt(lit('a'), lit('b')), alt(lit('c'), lit('d'))) >>> (r('ac'), r('ad'), r('bc'), r('bd')) (2, 2, 2, 2) >>> r = con(alt(lit('a'), lit('b')), lit('c')) >>> (r('ac'), r('bc'), r('c'), r('a'), r('b')) (2, 2, None, None, None) >>> r = alt(con(lit('a'), lit('b')), lit('a')) >>> r('abc', full=False) 2 >>> r = alt(lit('a'), con(lit('a'), lit('a'))) >>> r('aaa', full=False) 2 >>> r = alt(lit('a'), con(lit('a'), lit('a'))) >>> r('aaa') is None True >>> r = alt(con(lit('a'), lit('a')), lit('a')) >>> r('aa', full=False) 2 >>> r = alt(lit('a'), lit('a')) >>> r('a') 1 >>> r = alt(lit('a'), lit('b')) >>> r('ac') is None True >>> (r('a'), r('b'), r('c')) (1, 1, None) >>> r('ac', full=False) 1 >>> r0 = alt(lit('a'), alt(lit('b'), lit('c'))) >>> r1 = con(r0, r0) >>> {r1(x + y) for x in 'abc' for y in 'abc'} {2} >>> r = alt(lit('b'), con(lit('c'), lit('a'))) >>> r('aab') is None True >>> r(iter('aab')) is None True >>> r = alt(con(lit('a'), lit('a')), con(lit('a'), con(lit('a'), lit('a')))) >>> (r('aaa'), r('aa')) (3, 2) >>> r = alt(con(lit('a'), lit('a')), con(lit('a'), con(lit('a'), lit('a')))) >>> (r('aaa', full=False), r('aa', full=False)) (3, 2) >>> (r(iter('aaa'), full=False), r(iter('aa'), full=False)) (3, 2) >>> r = alt(con(lit('a'), lit('a')), con(lit('a'), con(lit('a'), lit('a')))) >>> r = r.compile() >>> (r('aaa'), r('aa'), r('a')) (3, 2, None)
Any attempt to apply an abstract regular expression instance to a non-iterable raises an exception.
>>> r(123) Traceback (most recent call last): ... ValueError: input must be an iterable
- class are.are.rep(argument)[source]
Bases:
are.are.are
Repetition operation (zero or more times) for an
are
instance. Instances of this class also serve as the internal nodes of the tree data structure representing an abstract regular expression.>>> r = rep(lit('a')) >>> all([r('a'*i) == i for i in range(100)]) True >>> {r('a'*i + 'b') for i in range(10)} {None} >>> {r(iter('a'*i + 'b')) for i in range(10)} {None} >>> r = con(lit('a'), rep(lit('b'))) >>> r('a' + 'b'*10) 11 >>> r(iter('a' + 'b'*10)) 11 >>> r = con(rep(lit('a')), lit('b')) >>> r('aaab') 4 >>> r(iter('aaab')) 4 >>> r = con(rep(lit('a')), lit('b')).compile() >>> r('aaab') 4 >>> r = rep(lit('a')).compile() >>> all([r('a'*i) == i for i in range(100)]) True
Note that the empty abstract string satisfies any instance of this class.
>>> r('') 0 >>> r('bbbb', full=False) 0 >>> all([r('a'*i + 'b', full=False) == i for i in range(100)]) True >>> all([r(iter('a'*i + 'b'), full=False) == i for i in range(100)]) True
Any attempt to apply an abstract regular expression instance to a non-iterable raises an exception.
>>> r(123) Traceback (most recent call last): ... ValueError: input must be an iterable