Easy Extend



                               


                       
                           Author: Kay Schluehr
                           Date: 2008 - 03 - 01
                           Version: For EasyExtend 3.0
                                                                          
                                                             

                                    

Intro

Gallery is a set of samples for EasyExtend.


1. repeat - statement

    The inevitable repeat statement. Maybe do-while would have been the better choice?
    But anyway here is the "Hello World" of EasyExtend.

2. extended if - statement

    The extended if-statement provides some shortcuts. This is very Python specific since arbitrary values can be conditionals not
    just booleans.

3.  Chainlets and the switch - statement

    The switch statement is another classic but the semantics of Gallerys switch might be surprising because
    it is not imperative but supports OO. A new lightweight object type the so called chainlet is introduced and it's
    usage is compared to more conventional OO solutions.

4.  Thunk statements

    This opens some new perspectives. Instead of bringing down the expression / statement distinction which would give a Lispy flavor
    to Python we can also enrich expressions and small statements by thunking them. We want to consider Python properties as an
    obvious use case for thunking assignments.

5.  IP address literals

     The Trail based tokenizer makes it easy to introduce new literals. Here we show how to define literals for IPv4 addresses.
     This simple example also shows the unobtrusive character of defining tiny DSLs using EasyExtend.
    


Used Gallery Grammar for Python2.4

1. repeat - statement

An implementation of the classical repeat-statement known e.g. from Pascal. The repeat-statement bears not much surprises.
Opposed to a while statement the conditional expression follows the suite. The body/suite of the repeat-statement is evaluated
until the test of the until-clause returns true.

Syntax

repeat_stmt
::= 'repeat' ':' suite 'until' ':' (NEWLINE INDENT test NEWLINE DEDENT | test NEWLINE )

Translation

The translation to Python is straightforward:


repeat:
    SUITE
until:
    TEST
 

    ==>
while 1:
    SUITE
    if TEST:
        break

Examples

gal> x = 0
gal> repeat:
....     x+=1
.... until:
....     x==10
....
gal> x
10

The until-clause is allowed to be inlined :

gal> x = 0
gal> repeat:
....     x+=1
.... until: x==10
....
gal> x
10



2. Extended If - statement

How often does it happen that one types the following boilerplate?


var = dictionary.get(key)
if var:
    SUITE_1
else:
    SUITE_2


Gallery suggests using a with-statement style assignment instead:



if dictionary.get(key) as var:
    SUITE_1
else:
    SUITE_2


Note that the assignment produces a side-effect and var will always be accessible outside the on-statements scope.

Syntax

on_stmt
::= 'on' NAME '=' test ':' suite ['else' ':' suite]

Translation

The on-statement transforms like:



on NAME = test:
    SUITE1
else:
    SUITE2


==>   

NAME = test
if NAME:  
    SUITE
else:
    SUITE

Example

gal> on n0 = find_node(tree, symbol.stmt):
....     on n1 = find_node(n0, symbol.flow_stmt):
....         print n1


Using the semantics of on- and repeat statements it is easy to see how they can be combined and how they do fit with other Python statements.

d = {2:5, 3:9}
x = 0
repeat:
    on m = d.get(x):
        x = m
        break
    x+=1
until: x == 3

assert x == 5



3. Chainlets and the switch-statement

The introduction of a switch-statement was motivated by the search for a programming language construct that supports chainlets naturally. Both the repeat-statement and the on-statement introduced in the previous chapters were trivial excercises. Chainlets on the other hand do not only have a real use and show benefit in certain programming tasks but their implementation also demonstrates a very common and important theme in EasyExtend: chainlets are implemented in a separated Python module that can be imported independently. EasyExtend just implements a syntactical wrapper and passes chainlets into the builtins such as they were a primitive type. Well, they actually are - mostly.

3.1 Chainlets

                                                                


One of the most basic concepts in programming is that of imperative control expressed by if- or switch-statements. They seem to be so simple and natural that it needed quite some time to notice their flaws. The most severe drawback is that they refer to the identity of an object under control. If you write:


if language == Python_2_4:
   BLOCK

with a numerical constant Python_2_4 you see immediately that the condition is evaluated to False whenever you or your customer uses a different Python release. In C programs you often set integer defines like:

                  #define Python_2_3      1
                  #define Python_2_4      2
                  #define Python_2_4_1    4
                       ....

and combine them to extend the condition:


if language & (Python_2_4_2|Python_2_5):
   BLOCK

But you actually have to extend this condition wherever you want to deal with the new release in the existing program instead of adapting the program in just those places where the new release makes a difference. What we need here is an ordering scheme. When release numbers are totally ordered and backwards compatibility among releases is strictly preserved you might rewrite your condition


if language <= Python_2_4_2:
   BLOCK

and everything works fine. But often you can't do this and each version could be a bifurkation point where a new development branch takes off. Then you get a tree or an even more complicated structure when some features vary independently and run through an own release sequence.

Now OO suggested to abandon those handy control structures in favour for more robust but also more heavyweight solutions. Everything is organized in classes and instead of writing a conditional as above you might write a dispatch_on_language() method or alike that is called with the correct language class. A class representing Python_2_5 is derived from a Python_2_4_2 class and if you really have to change the behaviour in Python_2_5 compared to Python_2_4_2 you overwrite the dispatch_on_language() method in your Python_2_5 class. Finally you have just to instantiate the object of the right class and let client objects operate on an abstract representation of your object. Those clients aren't interested in the particular language version. All they have to know is how and when to call the dispatch_on_language method. This OO approach while elegant is sometimes to heavyweight and programmers unlikely change their architecture ( if they created any ;-) for very tiny changes. So a mixture of OO and imperative code is the consequence.

From a technical point of view a method call on an object calls either the method implemented by the objects class or it propagates along the chain of parent classes to the root class ( in a single inheritance simplification ) and calls the first method of the same name and signature it finds on the way up. This is the basic ordering scheme we are looking for. We have generalized our #defines to classes and encapsulated the BLOCKs within methods. But after getting the structural idea why not stripping down the whole class chain and doing a restart with what we already had
 

if language <= Python_2_4_2:
   BLOCK

but now giving the conditional expression consisting of the variable language, the constant Python_2_4_2 and the comparison '<='  a different interpretation and representation than we did in the naive, though runtime efficient C-ish approach?

3.1  Definition

A chainlet is a constant object without internal structure that is organized in hierarchies. A chainlet hierarchy mimicks a class-hierarchy on an object level. Therefore chainlet hierarchies can be implemented by any OO language even if it lacks classes as first order types like C++/Java/C# etc. While a chainlet has no own properties it can be combined to built new chainlets using the operators | ( alternative ) and * ( product ).

Example:

gal> A = Chainlet()
gal> B = A()
gal> B<A            # A is parent of B
True
gal> C = A()
gal> C<A           
# A is parent of C
True
gal> C<=B          
# The pair (B,C) is not element of the parental order relationship
False
gal> B<=C
False


The parental relationship is expressed using the "<" or "<=" operator respectively.
__lt__( chainlet)
Returns True if the calling chainlet is derived from the chainlet passed into __lt__ otherwise it returns False. The parental relationship expressed by__lt__ is irreflexive, antisymmetric and transitive. In particular we have: if A < B and B < C then A < C. The __lt__ relation establishes a partial order on the set of chainlets.
__le__( chainlet)
Like __lt__ but being reflexive: A <=A is always True.

A chainlet has a select method used to create a selector from a chainlet. In languages with restricted polymorphism of switch like C++ the selector must have a different type than a chainlet. It has to be either an integer type, a float type or a char type. Values of other types cannot be passed into a switch. In Python we don't need to restrict the switch arguments but can use chainlets directly.

select( *chainlets)
Seeks for the best approximation of the calling chainlet in the passed tuple of chainlets and returns the approximation. The approximation is evaluated according to the partial order on chainlets. If S and T are chainlets with S<=T and we have a chainlet S' with S<=S'<=T than S' is a better approximation to S than T. If no approximation could be found for the calling chainlet the select method returns None.


Example:

gal> PY = Chainlet()
gal> PY_1X = PY()
gal> PY_24 = PY()
gal> PY_25 = PY_24()
gal> PY_25.select(PY, PY_1X)
PY
gal> PY_25.select(PY,PY_24)
PY_24
gal> PY_24.select(PY_25)  # returns None
gal>


3.2  Chainlet combinations 

A chainlet implements the methods __or__ and __mul__. While the first method can be used to create alternatives the second operator is used to create conjunctions.

__or__( chainlet)
Associative and commutative operator | on chainlets ( "alternative" ). For chainlets A, B and C following rules hold:

           C < A|B  <=>  C < A or C < B
                A|B < C  <=>  A < C or B < C

          The approximation quality of a chainlet alternative is the best approximation of one of it's parts.

__mul__( chainlet)
Associative and non-commutative operator * on chainlets ("product"). The product is used to express an independent specialization - a combination of different features. If some product A*B should be compared with a chainlet C without product structure one assumes that C = C*Top | Top*C where Top is the root chainlet in a chainlet hierarchy. We state following rules:
                                
          A*B < C    <=>  A < C or B < C
                C < A*B  <=>  C = C1*C2 and C1 < A and C2 < B               
         A|B < C*D  <=>  A < C*D or B < C*D

3.3  Why chainlets - revisiting a refactoring example

Chainlets support a kind of "neo-structured" programming without dismissing the value of change. Instead of replacing switches by whole class diagrams you generalize conditional expressions using chainlets. We want to revisit an example given Martin Fowler in his famous refactoring book ( Refactoring, Addison-Wesley 1st edition (June 28, 1999)). The section we revisit here is called "Replacing condition with polymorphism" which is related  to our topic and provides following initial example, translated into Python:


def get_charge(days_rented):
    result = 0
    price_code = get_price_code()
    if price_code == MOVIE.REGULAR:
        result+=2
        if days_rented>2:
            result+= (days_rented-2)*1.5
    elif price_code == MOVIE.NEW_RELEASE:
        result+= days_rented*3
    elif
price_code == MOVIE.CHILDRENS:
        result+=1.5
        if days_rented>3:
            result+= (days_rented-3)*1.5
    return result
       

Instead of refactoring this function into 4 classes ( a Price class and derived classes ChildrenPrice, NewReleasePrice, and RegularPrice ) as suggest by Fowler et al we start with the observation of the common pattern in the charge-algorithm:


def get_charge(days_rented, start, barrier, factor = 1.5):
    result = start      
    if days_rented>barrier:
        result+=(days_rented-barrier)*factor
    return result
     

This algorithm could be slightly extended to incorporate the case handled by a NEW_RELEASE:


def get_charge(days_rented, start, barrier, factor = 1.5):
    result = start      
    if barrier:
        if days_rented>barrier:
            result+=(days_rented-barrier)*factor
    else:
        result+=2*factor
    return result
     

Instead of numerical price codes we are using chainlets instead. Now think about a new price code category for students. Students are most likely not children but
in case of price charging they might be treated as such. We can either adapt the current chainlet selections yet in each place where they are used in the application code but this is not superiour to the classical imperative solution:

  
code = price_code.select(MOVIE.REGULAR, MOVIE.CHILDRENS, MOVIE.NEW_RELEASE)     

                        ====>

code = price_code.select(MOVIE.REGULAR, MOVIE.CHILDRENS|MOVIE.STUDENTS, MOVIE.NEW_RELEASE)     

or we introduce a STUDENTS chainlet derived from a CHILDRENS chainlet where no transformation is needed at all. The select() method has to be nowhere adapted:


STUDENTS = CHILDREN()     

An even better solution would be a reclassification.


REDUCTION   = PRICE_CODE()
STUDENTS    = REDUCTION()
CHILDREN    = REDUCTION()
PUPILS      = REDUCTION()
PENSIONEERS = REDUCTION()

Now you can use all kinds of combinations and special cases:


code = price_code.select(MOVIE.REDUCED)     
code = price_code.select(MOVIE.REDUCED, MOVIE.PUPILS|MOVIE.STUDENTS)     
code = price_code.select(MOVIE.REDUCED, MOVIE.CHILDREN, MOVIE.PENSIONEERS)     
...

A switch over the code chainlet is robust against the introduction of a new chainlet that is derived from any of the chainlets we have introduced before. We just have to adapt those cases where the new chainlet really makes a difference.

3.4   switch

We now define a chainlet aware switch. If the selector expression that is passed into a switch is a chainlet the behaviour of the switch is adapted so that the select method of the chainlet is executed with the selectors of the case branches as arguments. Finally the correct case branch ( if any ) is selected and evaluated. Instead of a default case branch the switch statement contains an optional else branch which is executed if no chainlet matches. If the switch selector is no chainlet but e.g. an int the selector will be compared to the case selectors directly using the comparison operator '=='.

Syntax:  

switch_stmt
::= 'switch' expr ':' NEWLINE INDENT case_stmt DEDENT ['else' ':' suite]
case_stmt ::= 'case' expr ':' suite ('case' expr ':' suite)*

Translation
:
switch EXPR:
    case EXPR1:
          SUITE1
    case EXPR2:
          SUITE2
else:
    SUITE3



==>   
if isChainlet(EXPR):
    SELECT = EXPR.select(EXPR1,EXPR2)
else:
    SELECT = EXPR
if SELECT == EXPR1:
    SUITE1
elif SELECT == EXPR2:
    SUITE2
else:
    SUITE3

Note that the switch statement uses an expr instead of a test grammar node. It's usefull with respect to direct value comparisons in a switch and causes the parser raising a syntax error, if the switch is defined like this:


switch x:
    case x in range(100):
        SUITE
    ....


Chainlets are not particular fast. Each switch needs linear effort in the number of chainlets that have to be compared in the select method. Personally I've never used them in tight loops but mostly for configuration purposes in application programming. The execution model introduced here might be a starting point to find and implement them more efficiently.

Examples for switch:

Using integers as switch selectors. The switch does not perform a select( ):

gal> x = 0; y = 5
gal> switch x:
....    case y-4:
....        print 1
....    case y-5:
....        print 2
.... else:
....    print 3
2
gal>


Using chainlets as switch selectors:

switch code:
    case MOVIE.REGULAR:
       
charge = get_charge(days_rented, 2, 2)
    case MOVIE.REDUCED:
       
charge = get_charge(days_rented, 1.5, 3)
    case MOVIE.NEW_RELEASE:
       
charge = get_charge(days_rented, 0, 0, 3)
else:
    raise ValueError, "Unknown price code %s"%code



4. Thunk statements

The basic idea of thunk statements is to optionally turn each small_stmt into a compound statement adding a suite.

Syntax

thunk_stmt
::= small_stmt ':' suite                                   

Thunk statements are challenges for Pythons traditional LL(1) parsing scheme because the kind of statement can only be
selected when the colon is entered.

Small statements are statements like assert, return or raise but also assignments. Expressions are small statements as
long as they span an entire logical line and being terminated by a newline character.

Applications

1. property syntax

Using thunk statements suggests following property syntax:

class A(object):
    x = property:
        def fget(self): return self._x
        def fset(self, val): self._x = val

2. Expression formatters

Another idea is to use so called expression formatters or expression interpolations. The analogy with with string formatting
shall be obvious:

def foo(x):
    print x
    return 1 + %fn():
        if x<0:
            return x**2 -1
        else:
            return x**2 +1

The thunked block is called as an anonymous closure of zero arguments where the placeholder %fn is used in the return
statement. In this form it works only for single anonymous closures thunked after the return.

Implementation

In Gallery for EasyExtend 3.0 we give just a very basic semantics for property syntax as stated above. The grammar rule
accordingly is

thunk_stmtGallery_3.0
::= NAME '=' NAME ':' suite                                   
This is not the production rule stated in Grammar.ext. So the parser will accept any small_stmt preceeding the colon but the
thunk_stmt transformer will only accept this particular form of an assignment and raises a syntax error otherwise.

Translation



NAME_1 = NAME_2:
    SUITE                   

   ==>
def tunk():  
    SUITE
    return locals()

NAME_1 = NAME_2(**thunk())

del thunk


This captures precisely the usecase for properties we've given above. The accessor functions will never be exposed to the namespace
containing the property.


5. IP address literals

Address literals for IPv4 addresses can simply be typed as some form of a dotted sequence of numbers:

gal> 127.0.0.1
<IPv4:127.0.0.1>
gal> 127.0.0.1[0]
127
gal> 127.0.0.1[2:]
(0,1)


Unlike previous features presented in this document IPv4 Address literals are defined in Token.ext.

Syntax: 

IPv4Address
Triple
::=
::=
Triple '.' Triple '.' Triple '.' Triple                         
A_DIGIT [A_DIGIT] [A_DIGIT]

Translation:

The IPv4 address literal translates into an IPv4 object that is derived from tuple. So they will always be 4-tuples. These are defined in the ip.py
module of the Gallery langlet.

The definition of IPv4 addresses demonstrates both advantages and disadvantages of the current Trail based lexers over the former regular
expression based ones. Obviously the syntax description above lacks some conciseness. It is not possible to simply state

    IPv4Address: Triple ('.' Triple){3}
        Triple: A_DIGIT [A_DIGIT]{1,2}

because precise numbers of repititions of patterns are not implemented. The advantage is that we don't have to rely on ordered choices. You can simply
drop in the grammar rule and it will not be shadowed by floating point literal definitions. In EasyExtend 2.0 one had to care for the location of the pattern.
It wasn't ever executed when ordered behind the one of other numbers.