Control Flow in PINQ

I was a bit bored so I whipped up this little diagram in OmniGraffle of the control flow within the PINQ framework and an application. Most things will seem familiar to programmers who regularly use MVC frameworks. It's actually slightly more complex than what is going on in there; however, I was limited to using 20 object with the demo version of OmniGraffle and so I was rather limited on what I could show.

The one interesting bit in here is the yield resource exception. As I've written in previous posts, it changes control from one controller to another.

Pythonesque PHP: Self-Documenting Code

One amazing feature about Python is that it is self-documenting. All classes and functions can have documentation strings and they all have access to them using the __doc__ property. Most Python doc strings follow the conventions in PEP 257 and so when printed out they are usually quite nice.

I decided that I wanted a feature like Python's help() function in PINQ and so I made one using PHP's Reflection API. If you remember (and read my articles) I wrote an earlier blog post about manually rebuilding classes using the Reflector classes.

First, here's an example of the output from my help() function on the Dictionary class in PINQ:

class Dictionary
 |  An array-like class for mapping keys to values.
 |  
 |  Author: Peter Goodman
 |  
 |  Constructor:
 |  
 |      Dictionary([array $default_values]) <==> dict($default_values)
 |  
 |  Public Methods:
 |  
 |      offsetExists(...)
 |          $d->offsetExists($key) <==> isset($d[$key])
 |          
 |          Check if an entry in this dictionary exists for a given key.
 |  
 |      offsetGet(...)
 |          $d->offsetGet($key) <==> $d[$key]
 |          
 |          Get the value (entry) for a specific key in the dictionary. If the key
 |          does not exist in the dictionary then this function will return NULL.
 |  
 |      offsetSet(...)
 |          $d->offsetSet($key, $val) <==> $d[$key] = $val
 |          $d->offsetSet(NULL, array $val) <==> $d[] = $val
 |          
 |          Map a key in the dictionary to a specific value.
 |          
 |          Note: When specifying the key as NULL, val must be an array.
 |  
 |      offsetUnset(...)
 |          $d->offsetUnset($key) <==> unset($d[$key])
 |          
 |          Remove a (key,value) entry in the dictionary.
 |  
 |      toArray(...)
 |          $d->toArray(void) -> array
 |          
 |          Return the (key,value) pairs in the dictionary as an array mapping keys
 |          to values.
 |  
 |  Class Descendants:
 |  
 |      InnerRecord
 |      Loader
 |          ConfigLoader
 |          PackageLoader
 |      ModelDictionary
 |      PinqRouteParser
 |      ReadOnlyDictionary
 |      StackOfDictionaries
 |      View
 |          PinqView

As you can see, it gets the PHP Doc Block comments for classes and methods and displays them nicely. It also shows the class constants. I decided not to show class properties because I usually don't do doc-block style commenting on them and so I didn't need them. Another thing this class does is function the extending/implementing classes for an interface/class and shows it as a tree of class descendants. This follows nicely from the idea of self-documenting code as it allows the programmer to discover new and important classes.

I'm currently in the process of re-documenting PINQ's source code to be more Pythonesque as I rather like the format and it suits the help() function well. This, of course, does not mean that my help() function (and its associated functions) do not work for non-PINQ classes because they do.

You can check out/download the code from PINQ's Google Code project repository at http://code.google.com/p/pinq-php/source/browse/trunk/system/functions/reflection.php. The help function works for classes, objects, functions, and class/object methods (using two arguments instead of one).

Yielding Control to a Different Controller

Recently I've implemented a very nice feature in PINQ: the ability to yield control to a different controller. The way it works is by throwing a specific exception and the catch for that exception is within a do-while loop. Inside that loop is the code that parses a route, instantiate a controller and dispatch to an action. The catch changes the route to parse and tells the loop to iterate one more time, thus allowing the programmer to change the controller/action being executed.

This feature's main use case is error handling. With a proper error controller set up, such as the one below:

<?php

class ErrorController extends PinqController {    
    
    public $layout_file = 'error';
    
    public function ANY_403() { set_http_status(403); }
    public function ANY_404() { set_http_status(404); }
    public function ANY_405() { set_http_status(405); }
    public function ANY_500() { set_http_status(500); }
}

I can then do: yield('/error/404') for example, or more easily do yield(ERROR_404);. This feature allows me to move the presentation logic for errors into controllers that are under the control of the programmer (ie: in the application directory) and also not make common errors not something that needs to be handled separately of the rest of the program.

Another use case is extending the class that causes this change in control flow. For example, my DatabaseException class now causes an error 500 when thrown as usual. Here is the code to do that:

<?php

class DatabaseException extends YieldControlException {
    
    public function __construct($message) {
        parent::__construct(ERROR_500);
        $this->message = $message;
    }
}

I think this is an extremely sweet solution to the problem of gracefully handling certain kinds of errors.

2½ Cool Hacks in PHP

As I've been testing out lot's of things for my framework, I've come across a few neat little hacks. The first 1.5 hacks involves PHP's list() construct, and possible ArrayAccess as well.

<?php
list($foo->bar, $foo) = array(new Bar, new Foo);

Look closely at the order of variables in the list(). It seems that list actually sets the variables from right to left. It also indexes into the array numerically, which means that if instead of putting an array as the rvalue of list, you can put an object that implements ArrayAccess.

The second fun hack uses PHP's ArrayAccess interface (as before) but to overload PHP's array push syntaxt. Consider this valid piece of code that in PINQ merges an array into a dictionary:

<?php
$dict = new Dictionary;
$dict[] = array('foo' => 'bar');

This hack works using ArrayAccess::offsetSet, and recognizing when a NULL key is passed and handling it accordingly.

Selecting from more than one database table: How to deal with column name conflicts in PINQ

Obligatory Plug

The short answer is that you don't need to (in PINQ, using PQL). You can if you want to but PINQ will automatically do it for you.

The Problem

Alright, that's enough plugging of PINQ. I'm very excited to say that I just solved this problem and I'm going to explain how I did it. But first, I will set up an example if you're not quite clear of the problem I'm talking about. Consider the following two simplified SQL table defininitions:

CREATE TABLE posts (
    id      INT,
    user_id INT,
    name    VARCHAR,
    body    TEXT
)

CREATE TABLE users (
    id      INT,
    name    VARCHAR
)

If I do a database query and select from both the posts and users tables at the same time (linking the two, of course), then their 'id' and 'name' columns will conflict. PHP's mysql_fetch_assoc will return roughly the following array:

array(
    [id] => 1,
    [user_id] => 1,
    [name] => 'user name',
    [body] => 'the body text of the post'
)

Those are terrible results! We've ended up losing lots of valuable post information because user information has overwritten it. We could solve the problem by returning a numerically indexed array, but then we would lose the convenience of indexing into the result by column names (or aliases for that matter). Alright, so how can we solve this problem?

Solving the Problem

The Normal Way

The obvious and normal way of doing this is manually aliasing the conflicting columns of one of the tables in the SELECT statement. This is a fine solution and isn't too bad. But, it takes the convenience out of doing a simple SELECT posts.*, users.* FROM ... which is honestly, super convenient.

How PINQ does it, and how it benefits you

The PINQ solution is to actually do roughly the same thing. First, PINQ figures out what the conflicting columns are going to be. This can be done with a series of calls to php's array_intersect_key and array_merge, that is, assuming we know all of the columns for a given table (which we assume and which we do in the models). The conflicts array ends up being this:

array(
    [id] => 1,
    [name] => 1,
)

We start by expanding any SELECT *'s into all the columns that need to be selected from and how they are going to be aliased. We then go over all of the column name to alias mappings and check if any of the aliases exist in the conflicts array using a simple isset(). If they do, we prefix the aliases with the model name. Here's what we end up with:

array(
    [posts_id] => 5,
    [user_id] => 1,
    [posts_name] => 'my first post!!!1!',
    [body] => 'the body text of the post'
    
    [users_id] => 1,
    [users_name] => 'user name',
)

That's all good and fine, but it's not a complete solution. If we were to stop here then we would need to know when columns conflict and index them with their new prefixes. That is undesirable behavior. So lets look at the result and see what we notice. Immediately we can see that the results come out in the order that we selected them (remember posts.* then users.*). If we assume that when we select the columns that we group them by their tables then we can count on there being this nice order to the result array (a fair assumption using PQL in PINQ). We've observerd that there is this order to the results, but that still doesn't get us any closer to getting rid of those prefixes we added to those columns.

What if we injected some dummy columns into the query to tell us where the columns for one table end and another begin? Further, we need to know when NOT to do this whole separating process. For example, if in PINQ we use straight SQL in a query then none of our assumptions will hold water and so we might end up mangling the results in an unfortunate way. Remember, the goal is to lose no data!

The solution is clear: we need sentinel columns in there to tell us:

  • When to perform the splitting up and processing of a query
  • Where to split the row data into different models
  • How to identify the prefixes on the unambiguous columns so that we can remove them
We can do step 1 with one sentinel, and steps 2 and 3 at the same time by including the model name (which is the prefix on ambiguous columns) in the sentinel to show where a table begins/ends. Here is our desired result array:

array(
    [__pql__] = 1
    
    [__posts] = 1
    [posts_id] => 5,
    [user_id] => 1,
    [posts_name] => 'my first post!!!1!',
    [body] => 'the body text of the post'

    [__users] = 1
    [users_id] => 1,
    [users_name] => 'user name',
)

Hey, we can actually do that in SQL in a very simple way.

SELECT 1 AS __pql__, 1 AS __posts, posts.*, 1 AS __users, users.* FROM posts, users WHERE posts.user_id=users.id

That is in essence what is done in PQL (except that the posts.* and users.* are expanded to be unambiguous and prefixed). Now that we have the desired row data, all that is left is to separate it into the different contained models and remove any unwanted prefixes.

Why and what are the benefits?

This seems like a feat of over-engineering. We've solved a minor problem with complicated solution that requires extra processing of records from the database. What are the advantages? First, lets look at how the traditional ORM works. We would first get a row from the posts model, and then call a relationship on it, such as $post->users. Calling that relationship on each record means one extra query per record. One of the goals of PQL is to shift the work of calling these relationships on objects (and thus doing queries) to constructing the relationships in the database queries then sorting out the data when its needed (note: my intention is to also support the traditional ORM style of calling on relationships). Still, the question "why?" remains. I was prompted to this idea by seeing these relationships being called in large loops. The number of database queries piles up like crazy and then things start slowing down.

An Example of PQL

So, here is an example of a pql query that does all of the above (from within a controller method):

$db = $this->import('db.my_db');

$posts = $db->findAll(
    from('posts')->select(ALL)->
    from('users)->select(ALL)->link('posts', 'users')
);

foreach($posts as $post) {
    echo $post['id']; // this is the ambiguous id which resolves to the user id
    echo $post->users['id']; // this is the user id from within the columns
    echo $post->posts['id']; // this is the post id
}

As you can see here, my earlier assumption that the select columns would be in order comes true with PQL. The reason is because PQL removes ambiguity from queries by having the programmer select columns from a specific model.


EDIT: I have already come up with a simpler method that I didn't recognize earlier. Instead of calculating the conflicts I prefix every column and then chop off the prefixes later. Regardless, most of the above article still holds and I feel as though others might find the process interesting.

PINQ - PHP5 Framework

So, originally PINQ stood for [P]HP [IN]tegrated [Q]uery; however, I liked the name so much that I've named my new framework PINQ and now the LINQ-like stuff in is called PQL. So, here are the basic ideas I'm pushing in the (in development) framework. First, a non-traditional ORM. Everything ORM-ish is actually based around PQL. The ORM and PQL are also not limited to databases! Also, multiple database connections are *stupidly easy* to manage. In fact there is just so many things that I'm excited about that you really need to look at the code. I've been commenting the hell out of what I'm doing. If anything, I suggest any onlookers of the code take a look at the following directories/files:

  • /system/model/
  • /system/pql/*
  • /system/packages/route-parser.php
  • /system/other/ (the linear-state files, this is just a temporary directory btw)

The assumption that this framework is MVC (based on some of the class names and whatnot) is more apparent than real. I am not trying to adhere to any MVC rules, I'm just writing it the way it comes to mind and makes sense. In fact, I'm thinking of renaming my Controller class to something that's actually representative of what it is (as it doesn't really control anything).

Anyhow, the license I'm probably going with is MIT, and you can find the code (as I work on it!) at its google code project page: http://code.google.com/p/pinq-php/. If you do look into it, I suggest you look into it as a curiosity, not as something to deploy with (at least now). It's in heavy development and I sometimes introduce funny bugs. Regardless, there are some very neat pieces of code in it. Enjoy!

What have I been working on?

In my free time I've been playing around with a few neat ideas. The main one is LINQ inspired querying for PHP. Suffice to say, I have made some very solid progress at this domain-specific language and can now make SQL queries that have complex joins in them. Here an example with some foreshadowing to where I'm going with all of this: http://codepad.org/OEVsQmwt.

Update: My Current progress on PHP LINQ: http://codepad.org/qr9SvC2Y.

Controlling PHP's Superglobals

It's summer again. Summer means coding PHP at work and having time to my own things. Speaking of which, here's a fun little hack I just came up with: http://codepad.org/aNW13yC9.

<?php

error_reporting(E_ALL | E_STRICT);

/**
 * Class that becomes a super global variable.
 * @author Peter Goodman
 */
final class SuperGlobal implements ArrayAccess {
    public $globals = array();

    public function __construct(&$globals) {
        $this->globals = &$globals;
    }

    public function offsetGet($key) {
        return isset($this->globals[$key]) ? $this->globals[$key] : NULL;
    }
    public function offsetSet($key, $val) {
        echo 'hiii';
        $this->globals[$key] = $val;
    }
    public function offsetUnset($key) {
        unset($this->globals[$key]);
    }
    public function offsetExists($key) {
        return isset($this->globals[$key]);
    }
}

// create a new $GLOBALS array, populated with our new superglobals
$super_globals = array('_SERVER' => new SuperGlobal($_SERVER),
                       '_GET' => new SuperGlobal($_GET),
                       '_POST' => new SuperGlobal($_POST),
                       '_REQUEST' => new SuperGlobal($_REQUEST),
                       // ...
                       );

// overwrite the $GLOBALS array, then extract the new super globals by
// reference into the current scope, overwriting the shorthand to the
// normal superglobals.
$GLOBALS = &new SuperGlobal($super_globals);
extract($super_globals, EXTR_OVERWRITE | EXTR_REFS);

// function to test if the overwriting of the superglobals worked as
// expected
function test_scoping() {
    print_r($GLOBALS);
    print_r($_SERVER);
}

test_scoping();

More to come soon.

Nested Sets (Modified Pre-Order Tree Traversal Algorithm)

Many people stay away from nested sets. The whole idea of a node having a left and right identifier to determine nesting confuses people. That or people dislike it because they feel that it de-normalizes their database. In this article I am not going to go into how to implement the mptt algorithm; you can find my own full implementation of mptt in my code folder. Instead, I am going to show how to determine the depth of a node at runtime (in PHP) with only the knowledge of the left and right ids of a node. A quick note: my solution is an iterator and uses neither recursion nor lookbacks or lookaheads.

I would expect that this would be the usual solution to the problem. Most people tend to put the heavy lifting in SQL by doing a COUNT of every node with a left id that is less than the current one and a right id that is greater than the current one (imagine the roots coming off of the trunk of a tree, this starts at the base of a root and looks up to the trunk). That is a sufficient solution; however, it is expensive for a large tree. The solution itself also requires that the depth be calculated for every node in advance of getting the nodes--regardless of whether or not we will use/display them.

So, we know that the data we are working with is a tree. Trees can be represented in many different ways in code. One such way is with a stack. If you remember, I used a stack to parse HTML as the pushing/popping function of a stack cleanly represents entering a new branch and then leaving one. Our approach is known: we will use a stack. But, how will a stack be applied to a result set of nodes?

First, the ordering of the nodes is crucial. Our end goal might be to make a threaded look for the tree (hence the usefulness of knowing the depth) and getting the nodes out in their threading order is simple: they need to be in ascending order of their left ID. If you do not understand why then consult this image on the MySQL website and look closely at the left/right ids. So, we can easily fulfill the first requirement: the nodes need to be in their threading order.

Second, it can be easily shown that for any node on the tree, the formula: (right_id - left_id - 1) / 2 will yield the total number of nodes below it. The total part is important! If the nodes were laid out in a flat array, then given our first assumption--that the nodes are in their threading order--then we can further assume that the result of this formula means that the following X number of nodes in the array after the current node (X being the total number of children of a given node) will be at least one level deeper than our current node.

Okay, so we have an inkling of how this will work. We get to a node, and we know that the next X nodes are children. That's simple enough; however, for nesting above one level, the thinking process gets more confusing: how do we keep track of the number of children there is and that we've already looped over if every parent of a given node includes the number of the children of a given node. Unfortunately, this last sentence might be poorly worded, so I will try to illustrate the problem differently. Consider the following tree:

               +----- child a (1) +----- child b (0)
root (3) ---|
               +----- child c (0)
array(
    'root',
    'child a',
    'child b',
    'child c',
)

Beside each node I have the total number of children that the node has in braces. After that I have a simplified version of the array of the nodes as they would be available from a result set. So, when we get to root we know that the next three nodes are children (child a, child b, child c). When we get to child a we know that the next one node is a child (child b). At child a, we know that we've consumed 1/3 of the children of root. When we're at child b, we know that we've consumed 1/1 children of child b, but how are we to say that we've now consumed 2/3 of the children of root? Finally, when we get to child c, we have consumed 3/3 of the children of root, and can now say that we are back to the same level as root is on.

So, we know that as we go we need to consume nodes. Given that we know how many children a node has, as we iterate over those children, we can reduce that counter by 1 for each node visited so that when the counter hits zero, we know that we're done looking at children and we're back to the parent nodes depth. The only problem is introduced with more than one levels of nesting. So we consume nodes and that affects the counter on the parent node, but it also needs to affect the counters on all parent nodes way up in the tree. Updating the counters of every parent node each time a node is consumed would be a waste of resources. Instead, it can easily be done when we're done looking at all the children of a given node. If we store the number of children twice--once acting as a counter that will be consumed, the other acting as a reference for how many total children were consumed--then we can accurately update parent counters as we go along.

The final solution (in pseudo code) is that at every node we encounter, we push pair($counter, $counter) onto the stack. As we encounter child nodes, we consume a node by reducing top($stack)[0] by one, and leave top($stack)[1] alone. By definition, count($stack) is the current depth that we are on. When top($stack)[0] < = 0, we do $temp = pop($stack) and then account for all the nodes we just consumed by using the second counter: top($stack)[0] -= $temp[1]. We clean up by constantly looking is top($stack)[0] <= 0 and popping those parts off.

That pseudo-code version is lacking but illustrates all of what actually needs to be done. For some, this might have originally seemed like a difficult problem (from the programming side, not from the SQL side). However, the solution is quite simple when we look at the problem for what it is: a tree, and not explicitly what we want: the depth of a node. Go check out the solution to how to find the depth of nodes for a nested set tree in my code folder (mostly comments, includes example at the bottom of the file).

<?php

error_reporting
(E_STRICT);

/**
 * Given a flat array of nodes in a mptt (nested set) tree, in ascending order
 * of their left id, figure out what level each node is on. The only things
 * required to figure out what level a node is on is its left and right ids.
 * This iterator uses a stack to keep track of levels.
 * 
 * @author Peter Goodman
 * @copyright Copyright (c) 2008, Peter Goodman. All rights reserved.
 */
class NestedSetIterator extends IteratorIterator {
    
    const 
LEFT_ID 'left',
          
RIGHT_ID 'right',
          
LEVEL '_level';
    
    
private $nodes,
            
$level,
            
$stack;
    
    
/* Constructor, take in an array or iterator. If an array is passed, turn
       it into an iterator.
     */
    
public function __construct(&$nodes NULL$num NULL) {
        
        if(!(
$nodes instanceof Iterator) && is_array($nodes))
            
$nodes =& new ArrayIterator($nodes);
        
        
parent::__construct($nodes);
        
$this->rewind();
    }

    
/* Push an item onto the stack. The stack holds arrays of pairs that have
       the number of children of a given node. Pair[0] is reduced when an item
       is consumed and pair[1] is kept to reduce the parent stack elements
       pair[0] when the current top stack element is popped.
     */
    
private function push(&$node) {
        
        if(!
is_array($node) && !($node instanceof ArrayAccess))
            
throw new Exception("Node is not an array nor has array access.");
        
        
$num = ($node[self::RIGHT_ID] - $node[self::LEFT_ID] - 1) / 2;
        
$this->stack[] = array($num$num);
        
$this->level++;
    }
    
    
/* Pop a pair off the stack and update the number of consumed nodes of the
       new top node.
     */
    
private function pop() {
        
        
$level array_pop($this->stack);
        
$this->level--;
        
        
/* We have just popped a level off of the stack. Consume works by
           decreasing the number of nodes on the top of the stack, but each
           stack elem knows the *total* number of nodes below it, and so 
           consuming nodes on the top of the stack doesn't account for lower
           elements (parent nodes). Therefor we decrease the current top of
           stack (after the pop) by how many total nodes there were originally
           in the node elem we just popped off.
         */
        
$this->consume($level[1]);
    }
    
    
/* Reduce the number of nodes consumed for the top pair of the stack. */
    
private function consume($num 1) {
        if(isset(
$this->stack[$this->level]))
            
$this->stack[$this->level][0] -= $num;
    }
    
    
/* Clean up after stack elements that have no more nodes to consume. */
    
private function cleanup() {
        
        
/* Move down the stack elems until we hit a level that still has nodes
           that haven't been consumed.
         */
        
for($i $this->level; isset($this->stack[$i]); $i--) {
            
            if(
$this->stack[$i][0] > 0)
                break;
            
            
$this->pop();
        }
    }
    
    
/*
     * Iterator methods
     */
    
    /* Get the current node */
    
public function &current() {
        
        
$node parent::current();
        
        
/* Use cached version of level so that it doesn't need to go through
           the normal consume/push/pop phase.
         */
        
if(isset($node[self::LEVEL]))
            return 
$node;
        
        
/* Cache doesn't exist, consume the node and push a new element onto
           the stack.
         */
        
$this->consume();
        
$this->push(&$node);
        
        
/* Store the level in an unique name */
        
$node[self::LEVEL] = $this->level;
        
        return 
$node;
    }
    
    
/* Move to the next node. */
    
public function next() {
        
$this->cleanup();
        
parent::next();
    }
    
    
/* Reset everything. */
    
public function rewind() {
        
$this->stack = array();
        
$this->level = -1;
        
parent::rewind();
    }
}

Stackless Operator Precedence

I got operator precedence for my parser working properly yesterday and am well into the interpreter (it can interpret expressions!). I thought that the way I solved the problem of generating a tree from some infix expressions was both clever and hackish. The normal solution involves creating a stack, pushing a frame in, parsing any operands/operators we find and putting them into a parse tree on that stack frame. If a '(' is found then a new frame is pushed onto the stack and all operations within the sub-expression are put on the parse tree of the newer stack frame. When a ')' is found, the topmost frame is popped off the stack and its parse tree is merged into the new top frame's parse tree. Interestingly, I managed to achieve the same effect without a stack at all.

The key to this is the tokenizer. My tokenizer is iterative, so I only parse the input text for a token when I ask for a token. For a simple calculator, a tokenizer would not be necessary; however, it was for the language. I have an enumeration of all the possible tokens for my language. I conveniently set it up so that the first token is an unknown token (commonly used as the right link in a cons cell), and then the first twenty tokens are all infix operators. I have a function to register operators along with their left and right precedences, and so they are all included in a convenient operators array. So far it seems as though the tokens are irrelevant. I will explain.

My parse tree is a binary tree laid out using cons cells. Eg: 1 + 2 * 3 becomes:

       (+)
       / \
     (1)  ()
         /  \
        (*)
       /   \
     (2)   ()
          /  \
        (3)

Each empty () is a node with a token of SW_TOKEN_UNKNOWN (token: 0), both + and * are nodes on the tree with token symbols whose values are less than twenty. So, this just shows how I have the parse trees made. Each node knows its parent node, and each node has a left and right child (one or both of which might be null), and whenever a node is added as a child to another, a new leaf is created and the parser is aware of this leaf[1]. Given this information, we can start onto a stackless model.

Imagine that the parser has just accepted the infix operator *. A useful assumption to make is that the parser works properly and that any prior infix operators--in this case adding 1 + 2--have been parsed into the tree correctly. We find the * operator, and given the handy operators table that I mentioned above, we know its left and right binding powers. Since we know what leaf we're on, we can move upward in the tree looking for nodes whose token symbols are <= 20 (unknown nodes and infix operator nodes). If we find a node that has a token symbol greater than twenty, we stop. If we find a node whose right binding power is greater[2] than the left binding power of the infix node that we have parser (the *) then we will record it, overwriting any previously recorded nodes of this sort. If we find a node whose right binding power is less than or equal to the left binding power of the parsed infix operator then we break out of the loop and hope that we've recorded an infix operator in a previous iteration of the loop.

Okay, so that seems like an odd process. Here is another way of looking at this. We have found an infix operator. If no prior infix operators have been parsed then our job is easy: it becomes the root to whatever was on the left of it and whatever is to the right of it. If we have found prior operators then our job is tougher. We need to find exactly which previous operator to bind this with. Consider this: 2^3^4+5. We need to turn this into (2^(3^4))+5. Since the operators effectively become the roots to the expressions on either side of them, then we can say that the + needs to bind with the first ^. An exponent operator has a higher binding power than an addition operator, and it also binds to the expressions to the right of it more strongly then it binds to the expressions to the left of it--hence why we get 2^(3^4) and not (2^3)^4.

When the + is found, we move up in the tree. We find the second ^, and realize that it has a higher binding power than the + operator. This is good because the + might bind to it. We record the second ^ and keep going up the tree. We find the first ^, it has a higher binding power, so we record it. We try to move up the tree--however there is no parent node so the loop is stopped. The node to bind with was the last one recorded--the first ^. Because the ^ has a higher binding power than the +, it means every exponent operation needs to happen before any addition operation. So, we make the tree of 2^(3^4) the child of our + operator. The case is reversed when an operator is found with the same or a lower binding power. The + operator and its left/right expressions become the children of that operator.

Now, on to the key to stackless sub-expressions. This whole scheme works because infix operators' token symbols are represented by a known range of numbers (1-20, including 0 as empty nodes). The above algorithm works by looking for branches with token symbols in that range. If it stumbles upon a branch with any symbol outside of that range then it will simply stop. So, to deal with sub-expressions, all that is needed is to attach another node to the last known leaf of the tree! This way no stack is needed and no merging back onto the tree is needed because the sub-expression is already part of the tree! It works because the sub-expression node, with a token symbol out of the infix operator's range, represents a road block to any infix operations going on within it (acting as the root of the sub-expression).


1. At first, the choice to use cons cells was random. It has, however, paid off significantly. Since every node is a cons cell, every branch has the same format, and all sub-nodes of the branch also follow this format. When a node is added, a cons cell is made. That means that a blank node added as the right child to the parent node. The node we want to add is set as the left child of the cons cell and the right side of the cons cell is set to null.

2. This may seem backward depending on how you have implemented operator precedence in the past. My precedence--or rather: priority-- goes from weak (1) to strong (10).