Easy Extend



                  



                  Author: Kay Schluehr
                  Date: 08-08-08
                  Vers: 1.0




                            
      

1. Introduction 

Bytelets are networks of binary data encoded within a P4D data structure. Bytelets originate from a flexible representation of BER-TLVs which lets users easily create, modify and parse TLV ( Tag-Length-Value) objects. Bytelets are more general than TLVs and L fields can  refer to arbitrary subfields of V which reflects many practical applications found in technical specifications. L fields are represented by functions that respond to changes of fields values they are bound to. This way a simple dataflow network emerges. This network can be extended to support more advanced causal relationships between Bytelet fields.

All Bytelets can be serialized as binary data. Those binary data can be parsed into Bytelets using again Bytelets as structural descriptions. A Bytelet used as such a description is called a Bytelet Schema or just Schema. Often Bytelet Schemas and the Bytelets created by parsing hexcode from those Schemas are isomorphic i.e. they have the same structural properties and differ only in field values. This is not much unlike C structs that can also be used to cast on memory chunks to create new instances of those structs.

Otherwise Bytelets are somewhat more low level than C-structs or ASN.1 based encodings because they are untyped There are no strings, ints, chars, arrays,  pointers etc.and there is no metalevel description available that translates type information into sizes of memory chunks.

2. Using Bytelets

Bytelets are part of P4D and you need to install P4D to run the follwoing code.

Bytelets are represented by ordinary P4D elements. Using the bl prefix of the namespace

  prefix: bl
  uri: http://fiber-space.de/byteletns

informs the P4D langlet to create a Bytelet object instead of a P4D object.

The first example shows how to create a simple Bytelet.

p4d> elm bl:A:
....    Tag: 0x01
....    Len: &LEN
....    Val: 0x77AA
....
p4d> A
Bytelet<bl:A>
p4d> A.hex()
0x01 0x02 0x77 0xAA

The Bytelet class is derived from P4D. This means all Bytelets are also P4D objects. The Bytelet method hex() is new and Bytelet specific. It enables serialization of Bytelets as hexacdecimal numbers. The LEN object we assigned to the Len field as a Python object is one of two builtin Flow objects. The other one is called VAL. We will treat them in detail in section 2.2.

2.1 Bytelets and Hex objects

The content of P4D objects are all strings. We assign either strings or numbers and numbers will be converted into strings. Bytelets are somewhat more featureful in this hindsight. We can also assign Flow objects like LEN as seen in the example above but also hexadecimal literals which have a special status.

In P4D 1.2 we have broken Pythons semantics for hexadecimal numbers. When you type a hexadecimal number in a Python console its base is automatically converted into that of a decimal number:

>>> 0x0E
14

So hexadecimal numbers don't have a proper status but their base changes automatically. In P4D 1.2 hex literals will not be mapped onto Python ints but on Hex objects that are defined in EasyExtend.util.hexobject. Hex objects are used to store, manipulate and represent hexcode.

p4d> 0x0E 0xA0
0x0E 0xA0
p4d> type(
0x0E 0xA0)
<class 'EasyExtend.util.hexobject.Hex'>

Hex objects have plenty of conversion functions and representations:

p4d> Hex("89 07 0A")
0x89 0x07 0x0A

p4d> Hex.format = Hex.F_STD        # changing the representation format for Hex objects

p4d>
Hex("89 07 0A")
89 07 0A

p4d> Hex.format = Hex.F_0x         # changing the representation format for Hex objects again

p4d> Hex(0x89070A)                 # another constructor call
0x89 0x07 0x0A

p4d> Hex("89 {Hello Hex} 00")      # curly braces are used to inline ASCII - strings
0x89 0x48 0x65 0x6C 0x6C 0x6F 0x20 0x48 0x65 0x78 0x00

p4d> Hex("89 {escape\{ and \}}")   # inlined curlies
0x89 0x65 0x73 0x63 0x61 0x70 0x65 0x7B 0x20 0x61 0x6E 0x64 0x20 0x7D

p4d> Hex.from_decimal("16")        # string conversion from decimals obviously differs from ...
0x10

p4d> Hex("16")                     # ... interpretation as Hex
0x16

p4d> Hex("{Hex Hex}0").binary()    # sort of 0-terminated string
'Hex Hex\x00'

Of course we have plenty of operations we can apply immediately to Hex object literals

p4d> h = 0x0E 0xA0
p4d> h // 0x00                                       # concatenation

0x0E 0xA0 0x00
p4d> 0x0E + 0x03                                     # addition
0x11
p4d> p4d> 0x0E 0x89 0x89 & 0x78 0x75 0x56            # bitwise and
0x08 0x01 0x00
p4d> p4d> 0x0E 0x89 0x89 ^ 0x78 0x75 0x56            # xor
0x76 0xFC 0xDF

For a more comprehensive overview you have to look at tha API description of the Hex object.

2.2 Flow objects

2.2.1 Anatomy of a Flow object

The Flow object is the key feature of Bytelets. One can understand a Flow object roughly by considering three properties. A Flow object has

1) an internal state which is expressed by the _value attribute. The _value attribute can be accessed using the flow_value() 
    method. The type of _value is Hex. On initialization _value is None.

2) a reference to the containing Bytelet. A Flow object is usually the CONTENT of a Bytelet having the following simple
    structure  [TAG, {}, [], CONTENT].  We aren't so much interested in this containing Bytelet though but in its parent:
    [TAG, {}, [[TAG, {}, [], CONTENT]...], '']. The containing Bytelet and the parent of the containing Bytelet are
    represented by the Flow attributes _node and _parent.

3) an update() method. The update() method is either called when a _value attribute is demanded and is _value None
    or it is called when a Bytelet changes e.g. when a field gets updated using __setattr__. The update() method implements the
    behaviour of the Flow object. It is the distinctive property of flow objects and shall be overridden in subclasses of Flow.

Coming back to our initial example.

p4d> elm bl:A:
....    Tag:0x01
....    Len:&LEN
....    Val:0x77

The LEN object is a Flow object assigned to Len as a Python object. Actually LEN is of type FlowLen and it is a global or builtin Flow object. On assignment to Len it will be copied. So you can use it in arbitrary many places without danger of sharing its value. Initially the attribute values are all None.

p4d> import pprint
p4d> pprint.pprint(A._tree)
['bl:A',
 {},
 [['Tag', {}, [], 0x01],
  ['Len',
   {},
   [],
   <EasyExtend.langlets.p4d.bytelet.FlowLen object at 0x014E40B0>],
  ['Val', {}, [], 0x77]],
 '']
p4d>
p4d> flow = p4d> A._tree[2][1][-1]
p4d> flow

p4d> assert
flow._value is None
p4d> assert flow._node is None
p4d> assert flow._parent is None

This changes when we try to compute the hex value of Len:

p4d> A.Len.hex()
0x01
p4d> flow._value
0x01
p4d> flow._node
Bytelet<Len>
p4d> flow._parent
Bytelet<bl:A>


2.2.2 How LEN works

The update() method of LEN locates _node with _parent and computes the sum of the lengths of all subsequent fields So it is roughly:
      children = _parent.children()
      sum( len(field.hex()) for field in children[children.index(_node)+1:]))

Notice that LEN._value is BER encoded:

            If len(_value) > 0x80 then the first byte of the _value encodes the number of subsequent bytes used to store the length value.
            The most significant bit is always set in this case ( most significant =  left most bit in the bitmap representation of a
            number where the powers of 2 are ordered from left to right).

Examples:

len(_value)
BER encoding
0x78
0x78
0xA2
0x81 0xA2
0x12F
0x82 0x01 0x2F

With this scheme one can encode byte streams up to a length of 2120. This shall be sufficient for most practical purposes.

2.2.3 LEN and  RAWLEN

Sometimes a BER encoded length field is inappropriate and we just want to count the raw number of bytes within fixed bounds. In those cases we use RAWLEN.

Example: for a length field having a fixed length of 2 we have following encodings:

len(_value)
LEN
RAWLEN
0x78
0x78
0x00 0x78
0xA2
0x81 0xA2
0x00 0xA2
0x12F
-
0x01 0x2F


2.2.4  LEN Expressions

Computing the length of all fields subsequent to the Bytelet containing LEN is the most common application of Flow objects in Bytelets. But sometimes one might want to compute just the length of a particular field or the length of two adjecent fields, one with a fixed size and one with a variable size.

All those cases one can new Flow objects from LEN by applying simple arithmetic operations or subscripting.

p4d> elm bl:A:
....    Tag:0x01
....    Len:&LEN
....    DCS:0x04
....    Text: "{some bytelet text}"
p4d>
p4d> A.Len.hex()
0x12

This is what we already know. But we can extend it:

p4d> A.Len = LEN["Text"]
p4d> A.Len.hex()
0x11

or

p4d> A.Len = LEN["Text"]+1
p4d> A.Len.hex()
0x12


Supported operations:

operator
operand type
+ | - | * | / int
[]
string

Same operations are valid for RAWLEN.

2.3  Bytelet Schemas

Bytelet Schemas are Bytelets used to parse hexcode into other Bytelets. Often Bytelet Schemas are isomorphic with target Bytelets i.e. they have the same structure but different values. This isn't very much different from using a structure type in C for casting on an array and produce a new C struct. There are different types of Schemas but two of them are the most relevant. The first is called Standard Schema and the second one TLVList Schema. These shall be treated in more detail now.

2.3.1  The Standard Schema

A Schema for the Bytelet

elm bl:A:
   Tag:0x01
   Len:&LEN
   DCS:0x04
   Text: "{some bytelet text}"

looks like this

elm bl-schema:AS:
   Tag: 1
   Len: &LEN
   DCS: 1
   Text: &VAL -1

Now we can use the AS Schema and parse a Hex object.

p4d> R = AS.parse( A.hex() )
p4d> R
Bytelet<bl:AS>
p4d> R.DCS.hex()
0x04
p4d> R.Text.hex().ascii()
"some bytelet text"


The Standard Schema defines a new namespace defined by

  prefix: bl-schema
  uri: http://fiber-space.de/byteletschemans

It corresponds with a new class ByteletSchema which is derived from Bytelet and implements the single public method

2.3.1.1  How VAL works

The VAL object is FlowFieldVal instance. It is complementary to LEN and it is used in Schemas only. The functionality is this: VAL refers to a LEN field and when the _value of the LEN field is k, k bytes will be read from the hexcode stream and assigned to Text._value.

In the last example the value read from the A.hex() for LEN was 0x12. Therefore VAL-1 was 0x11 and as much bytes were read from A.hex(). As for LEN which might refer to a particular field F via LEN["F"] a reference can be used with VAL: VAL["F"] reads F.hex() bytes.

2.3.2  The tlvlist Schema

A ByteletSchema element specifies an attribute use. If use is omitted than the value std is implied. Another value is tlvlist. A
tlvlist value changes the way a hexcode object is parsed.

When a tlvlist Schema is used it is assumed that a TLV object has a tree structure with TLV objects specified by sub-TLVs and their tags.

Example 1:

elm bl-schema:A(use="tlvlist"):
   TLV(tag = 0x00):
       SubTLV1(tag = 0x01)
      
SubTLV2(tag = 0x02):
           SubSubTLV1(tag = 0x00)
           SubSubTLV2(tag = 0x01)
       SubTLV3(tag = 0x03)

There are some default assumptions being made about tlvlist Schemas when parsing hexcode. Those are

  1. A TLV object is uniquely specified by its tag within a TLV structure. So the sequence of occurence in the tlvlist Schema is arbitrary.The schema specified above is the same as
 
elm bl-schema:A(use="tlvlist"):
   TLV(tag = 0x00):
       SubTLV2(tag = 0x02):
           SubSubTLV1(tag = 0x00)
           SubSubTLV2(tag = 0x01)
       SubTLV1(tag = 0x01)
       SubTLV3(tag = 0x03)

  1. A TLV object may occur arbitrary many times within a hexcode object. You can interpret the presence of an a sub-TLV specifier as: zero or more occurences in the target.
  2. If some unknown tag is found it will not be ignored but some Bytelet with tag UNDEF is created.
You see that the parsing rules are rather liberal. There isn't much checked by the parser except for consistency with a nested TLV structure. This keeps the Schema simple and delegates more responsibilities to Python libraries.

Notice further that definitions where a tag-name occurs more than once are permitted:

elm bl-schema:B(use="tlvlist"):
   TLV(tag = 0x00):
       Sub(tag = 0x01)
       Sub(tag = 0x02)

2.3.2.1  Mixing the Schema modes

You can flip the schema mode as shown below
elm bl-schema:C(use="tlvlist"):
   TLV(tag = 0x00):
       SubTLV1(tag = 0x01)
      
SubTLV2(tag = 0x02 use = "std"):
           Tag: 0x02
           Len: &LEN
           DCS: 1
           TextLen: 1
           Text: &VAL
       SubTLV3(tag = 0x03)

It is also possible to refer to another Schema this way and assamble a Schema:

elm bl-schema:TextSchema:
    Tag: 0x02
    Len: &LEN
    DCS: 1
    TextLen: 1
    Text: &VAL

elm bl-schema:D(use="tlvlist"):
   TLV(tag = 0x00):
       SubTLV1(tag = 0x01)
      
SubTLV2(tag = 0x02 use = "std"):
           &TextSchema
       SubTLV3(tag = 0x03)


3. Playing with Bits

In section 2 we have seen how Bytelets can be used to specify structures and TLV objects on byte arrays. Now we move to individual bits and bit arrays. Specifying bitarrays requires a little more notation.

3.1 Bin objects

Unlike Hex objects which encapsulate operations on arrays of bytes a Bin object is just a subclass of int with some notational convention that are used to specify bit arrays.

>>> 0b89    # this creates a Bin object
0b89

With Bin objects we can specify ByteletSchemas on a finer level:

elm bl-schema:OneByte:
    F1: 0b1
    F2: 0b3
    F3: 0b4

With the OneByte Schema we parse a single byte:

>>> btl = OneByte.parse(0x85)
>>> btl.F1.hex()
0x01
>>> btl.F2.hex()
0x00
>>> btl.F5.hex()
0x05
>>> btl.hex()
0x85

In order to fit the three field values 0x01, 0x00 and 0x05 in the hex method of btl together in a precise way ( i.e. other than 0x01 0x00 0x05 ) the Bytelet object has to store the bitarray characteristics in the fields. This is done using the w(idth) attribute:

>>> btl.F1.@w
0b1
>>> btl.F2.@w
0b3
>>> btl.F3.@w
0b4

Of course one can start with a Bytelet which predefined widths:

elm bl:btl:
    Tag(w=0b4): 0x07
    Len(w=0b16): &RAWLEN
    DCS(w=0b4): 0x04
   
Text: "{Some Text}"

We get

>>> btl.hex()
0x70 0x00 0xA4 0x53 0x6F 0x6D 0x65 0x20 0x54 0x65 0x78 0x74

Note that using RAWLEN instead of LEN has no meaning in this example. It is just the recommended way to assign length fields in cases of fixed sized length fields.

3.2 Sort of runtime type checks

Currently there are no checks when you assign values to fields and P4D won't complain when you set:

...   Tag(w=0b4): 0x100

However evaluation of hex() will check whether the assigned Hex object values are within the bounds of the specified bit arrays. Otherwise as TypeError exception is raised:

>>> btl.Tag = 0x100
>>> btl.hex()
Traceback (most recent call last):
...
TypeError: Hex object 0x100 does not fit into a 4 bit array.


4. History

Date
Version
Change
2008-08-08 1.2
  • One can now set attribute values and content easily without referring to the internal datastructure.
  • Introduction of Bytelets and the bl namespace which is supported on language level.
  • Introduction of ByteletSchema objects and the bl-schema namespace which is supported on language level.
  • Adding a Hex object context.
  • elm foo:bar: ... works like an assignment  bar = foo:bar:... which is valid for all non keywords and non hyphened names.
2008-07-11 1.1.1
  • Fix: remove py4as3 import from xmlutils and p4dbase
2008-07-11 1.1
  • Fix XML formatAttributes() in xmlutils.
  • Make locals() available for mapeval.
  • Add flexutils module
  • Add elm keyword
  • Add P4D comments
  • Fix double-colon access for namespaces with p4d prefix.
  • Add element lists or tuples.
  • Add P4D.eval()
  • Add CDATA handling
  • Change filter semantics
2008-06-26
1.0
Initial release