Refactoring Monitair

Fun fact. Users don’t care how their code is factored. They just want their programs to do what they want. All the time. Programmers have a different perspective. They want code that looks nice and is easy to maintain. This means that you often have to revisit decisions made at the start of the project, and change things to make them better.

Real world architects don’t have this luxury. If half way through the construction of a building they find a better structure there is no-way they can reconstruct it. But programmers can. But it is not without its risks. Ages ago I found a Haiku that I rather liked. I think it went something like this:

Your code was ugly
I refactored it for you
It no longer works

Having said that, I’ve just spent a day re-factoring the Monitair software. Because it was ugly. And it seems to still work. Which is nice.

Snaps now on GitHub

A couple of days ago I got an email from James. He’s been working through my “Begin to Code with C#” book and having fun learning C#, which is great to know. He’s even reached the point where he has spotted an improvement to my code. It occurred to me that what he really wants to do is to put the changes into the GitHub repository where the code was stored, so that everyone could benefit from it.

But, rather stupidly, I’ve never put Snaps on GitHub. Well, that’s easy to fix, and so you can now get the latest version of Snaps for download from here:

https://github.com/CrazyRobMiles/snaps

If you don’t know what GitHub is, then you’re missing out. GitHub is a way that you can manage data, whether its a bunch of code or that book you’re writing. GitHub holds the source and also allows you to make incremental changes, all the while keeping track of the differences so that at any point you can go back to a previous version. It also has fantastic facilities for group working, so that several people can work on a single large project and manage the changes that their different work items produce. Until recently you could only use GitHub for free if you were happy to make your creation public. To have a private GitHub repository cost you a monthly subscription. That’s changed now, users can create private GitHub repositories for free, which is awesome. Three words of advice: Get. Into. GitHub.

As for Snaps, that’s a set of language extensions that make it very easy to create Windows Universal Applications. I created it so that people can learn to program without being hit by a lot of extra stuff that you need to know to create modern applications. It has lots and lots of features, including a sprite engine for game creation. It also has all the sample code from the programming book built into it, so you can play with and modify the sample code very easily.

I’m going to message James and suggest that he propose his modifications and we can put them into place as part of the Snaps release. That will also give him something nice to put on his CV.

Dealing with the impossible

I’ve got a new saying that I quite like. It’s probably not original, but that’s true of a lot of stuff that I do.

If what you are observing is impossible it is either not happening, or not impossible.

I’ve found and fixed my impossible bug. Of course it wasn’t where I thought it was, because if it was, I’d have looked there and mended it. The fault wasn’t even in the code doing the job, it was in the code telling me what the system was doing.

I’d missed a break out of a C switch statement (a frighteningly easy thing to do in that language) so that my reporting code was printing out the wrong messages. Which led me to concoct all kinds of fanciful theories about memory corruption and race conditions - rather than simply checking that what I was being told was happening was actually happening.

The important thing about situations like these is that you change the way that you work to reduce the chances of them happening again. So, in future I’ll make sure that I test all the possible outputs of my diagnostic system.

Debugging with Rob

OK. Today I’m going to find out why my program is crashing. I’m going to do this in an “experimental” way. I’m going to put up a hypothesis (a theory about why something happens) and then try to test it. Before I do that though, perhaps I’d better describe the problem. I apologise if some of this explanation is a bit technical:

The program runs for a while and then stops.

The length of time the program runs varies, and I can’t tell what happens at the point of failure because it is inside a device. And I can’t take a look inside the program because those tools aren’t available to me.

So, let’s try my first hypothesis:

Hypothesis 1: “The program flashes the neopixel leds at the same time as it receives data from the air quality sensor. Perhaps the code that controls the lights is affecting the code that gets the data.

Test:

  • Increased speed of pixel update.

  • Loaded up the serial interface with data.

  • Increased rate of MQTT update to once per second

Results

Display update slowed right down (as we might expect) but no crash

On this basis I’ll conclude that this hypothesis is not valid. So let’s try another:

Hypothesis 2: Serial data reception is interfering with the WiFi transmission. When the device has got a complete reading from the air quality sensor it sends a message over WiFi to the server.

Test:

  • Turned off the Pixel updates

  • Loaded up the serial interface with data

  • Added code to confirm successful network message sent

Result:

When the serial data is being transmitted the data transfer is slower because the serial interface steals cycles from the processor. Eventually the transfer collapses and the Publish method starts to return false. Shortly after that the whole system falls over.

Further Test:

restored the pixels and loaded up the serial port

Result:

When the serial port is loaded the performance collapses before.

So, it is not a good idea to use the network connection while you are receiving data from the serial port.

This is because I’m using a software simulation of the hardware that normally receives serial data. This simulation has specific timing constraints that means that it needs to “lock out” other processes when it runs. And it seems that this is causing the problem. Under normal circumstances the node only sends a network message every six minutes or so, and the chances of interference are small. But when I’m doing “proper” testing - sending lots of messages and receiving loads of data - I notice the problem.

The solution is to re-work the code so that the two things don’t occur at the same time. Which means I get to write more code. Which I quite like.

You might like to know (if you’ve read this far) that the notes above actually came from my diary. I always write these things up each day. The idea is that if I get a similar problem in the future I’ll have something to go back to. If you don’t write a diary/log when you do this kind of thing you’re really missing out on a trick.

Pointers to Functions in C# and Python

After the description of pointers to functions in C it occurred to me that you might like to know how to do function pointers in other languages. Then again you might not, in which case just look at the picture above that I took in Singapore a while back.

Pointers to methods in C#

If you want to make a reference to a function in C# you need to create a delegate type which contains the reference. Instances of the delegate type can then be made to refer to methods. It’s a bit more long-winded than the C way of doing things, but delegates can also be used as the basis of some quite powerful publish and subscribe mechanisms which I might mention later.

class Program
{
delegate void SimpleDelegate();
static void aMethod()
{
Console.WriteLine("Hello from a method");
}
static void Main(string[] args)
{
SimpleDelegate myDelegateInstance = aMethod;
myDelegateInstance();
}
}

This is a tiny example of a program that uses a delegate to make a reference to a method. The delegate type that I’ve created is called SimpleDelegate. Variables of this type (I made one called myDelegateInstance) can be made to refer to a a method and then called.

If you compare this with the C syntax, it’s not that different. The good news is that before the program runs the compiler can make sure a program will never call a method incorrectly using a delegate. Because SimpleDelegate is declared as returning void (i.e. nothing) and accepting no parameters it is impossible to make instances of SimpleDelegate refer to a method that returns a different type or has a different set of parameters. If you worry about these kind of things, this is a good thing.

Pointers to Functions in Python

If you want to make a pointer to a function in Python you just do it:

def a_function():
print("Hello from a function")

function_ref = a_function
function_ref()

The variable function_ref is made to refer to the function a_function and the function_ref variable can then be called as a function. Later in the program function_ref might store an integer, or string, or a reference to an object of type Cheese. That’s how Python is. This means that a Python program can fail in ways that a C# program won’t, because the errors that cause the Python program to fail would be picked up by the C# compiler.

This nicely encapsulates the difference between the cultures of C# and Python. C# is great because it forces you to invest effort to make sure that the objects in your programs can only ever be fitted together correctly. Python takes everything on trust. It lets the programmer just get on with expressing the actions to be performed and assumes that things will turn out OK. If they don’t the program will fail at runtime.

I love the way that C# forces me to think about my code, and I love the way that Python just lets me get on with writing, without me having to worry about adding lots of extra syntax to express what I know I want. You should know about both ways of working.

Pointers to Functions in C

I’m re-writing my neopixel animation software for the new light. I’m using a C feature that is sometimes very useful. I’m using pointers to functions.

Note that this is quite strong magic. If you don’t use it properly your program will just stop with no errors or warnings. But then again, that’s pretty much how it is when you write Arduino code anyway, so feel free to have a go. You won’t break any hardware.

The reason why I’m using function pointers is that I want to make the program to select between different animations to control the lights. The update behaviour for each animation is a different function. For example, I’ve got two different displays, one which walks colours around the lights and one which just displays the text in a particular colour. That means I’ve got two update functions:

void update_coloured_message()
{
// update the coloured message
}

void update_text_message()
{
// update the text message
}

Each of these is a function that accepts no parameters and does not return a result. I can create a C variable that can refer to these methods as follows

void (*animation_update) ();

This rather convoluted syntax creates a variable called animation_update that can refer to methods that accept no parameters and don’t return a value. I can make this variable refer to one of my update methods by using a simple assignment:

animation_update = update_text_message;

Note that normally when you want to create a reference to a value in C or C++ you need to put an ampersand (&) in front of the thing you are getting a pointer to. In the case of function references you can leave this out. This is either a neat trick to make C programming easier or a nasty trick to make C programming more confusing. I’ll let you decide.

Anyhoo, once I’ve set animation_update to refer to a function I can call the pointer just as I would the function:

animation_update();

I’ve actually gone one more, in that I’m using function pointers in a structure. Go me. I’ve got a structure that describes an animation in terms of the function that sets up the animation, the function that updates the animation and the number of frames it should run for:

struct Animation_item
{
void(*animation_setup) ();
void(*animation_update)();
int duration;
};

I then create an array of these items that contain all the animations that I want to move between:

struct Animation_item animations[] = {
{ setup_coloured_message, update_coloured_message, 600 },
{setup_walking_colours, update_walking_colours, 600} };

Then just need to count around the array to move between animations.

animations[animation_number].animation_update();

This statement calls the animation update method for the animation that is selected by the value in animation_number. If I add more animations to the array my program just picks them up automatically with no need to change anything else.

Code like an idiot

Hmm. No matter how good you are at programming, developers will still put bugs in their programs. Including me. Lost a couple of hours today because of this little beauty.

if ((tickCount % lights[i].colourSpeed)!=0 || lights[i].colourSpeed==0)
    return;

This code has been working inside HullOS for a couple of years. No problems. However, when I tried to port the code onto a Wemos device it kept crashing with an invalid instruction message.

Wah. The idea is that the code will exit if is not time to update the colour animation (tick count is not an exact multiple of colourSpeed) or if no animation is being performed (colourSpeed is zero). 

It took me a while to figure out what was happening. Turns out this code is wrong, but under some circumstances it will work perfectly well. 

It's all to do with the % (modulus) operator. This gives the remainder that you get if you divide one number by another (or example 7 % 4 would be 3). If the modulus is zero that we have an exact multiple on our hands. So, every time tickCount reaches the next multiple of colourSpeed the program will perform an update. 

Snag is that I've been stupid, in that I use a colourSpeed value of 0 to mean that the light is not being updated. And the behaviour of a C++ program when you do x % 0 is undefined. In other words, any number modulus zero is a stupid question to ask.

To make things worse the fact that I'm doing this bad thing as one half of an or expression means that, depending on the whim of the compiler, the modulus calculation may not get evaluated when colourSpeed is zero. If the code produced by the compiler does the test for zero before the modulus my program works. If the tests are done the other way round the program explodes.

It's an easy fix, just break the single test into two tests so that I control the order they are performed:

if (lights[i].colourSpeed == 0)
    return;
if ((tickCount % lights[i].colourSpeed) != 0 )
    return;

Now we never use the value of 0 in a modulus expression.  

Oh well, just goes to show that you can learn stuff by doing stuff.....

C# Quick Question 2 resolution

Ha. Nobody seemed to know this. Or nobody cared....

Anyhoo, the question was:

"When would a value type be stored on the heap?" And no, the answer doesn't involve boxing.

A value type just holds a value. Not a reference to something. If I have an object managed by reference I have to be careful about removing it from memory, because I need to be sure that nothing in the program is referring to the object. This means that objects managed by reference are stored in a "heap" of objects. It is the job of the "garbage collector" to find objects that are not referred to by anything, and remove these. 

However, value types are easy to get rid of. There can't be any references to them. 

Think of a value type as a book in a library. We can take the book away and nobody can use it any more. That's sad. But we can be sure that nobody is using the book when we take it away. Think of a reference type as a web site. We can't just delete a web site because we don't know how many people out there have made references to the site. 

So, when do we need to put a value type on the heap, and treat it a bit like a reference type? One answer is when we "box" a value type to convert a value type into a into a reference type, but this this is not what I'm after. 

The answer is when a value type is used in a "closure". We see closures when we use lambda expressions. OK, so what's a lambda expression? Lambda expressions are a pure way of expressing the "something goes in, something happens and something comes out" part of behaviours.

The types of the elements and the result to be returned are inferred from the context in which the lambda expression is used. Instead of writing a method and creating a reference to the method, you can just create a lambda expression.

We get into lambda expressions shortly after we start trying to treat lumps of functional code as we would data. Consider the following method:

static int add(int a, int b)
{
    return a + b;
}

This method is called add, it takes two integers and returns their sum. Maybe, for some reason, I want to my program to select between adding, subtracting and some other mathematical operations that take in two integers and return an integer result. C# lets me create pointers to methods. They are called delegates and we would make such a delegate like this:

delegate int IntOperation(int a, int b);

A delegate is another type I can use in my program. So I can create a variable of type IntOperation. And I can make my delegate refer to a method. Watch carefully:

IntOperation thingToDo;

This statement has made a delegate called thingToDo that can refer to methods that accept two integers and return a single integer result. Now I can make this delegate refer to a method:

thingToDo = add;

Now, if I call thingToDo, the add method will run:

int result = thingToDo(1,10);

This would set the value result to 11, because thingToDo presently refers to the add method. With me so far? Good. We can create our methods "the hard way", or we can use a lambda function instead. Look at this statement:

thingTodo = (a, b) => { return a - b; };

Don't panic. It wants to be your friend. Let's take a careful look at what I've done. I've made thingToDo refer to a lump of code that will perform a subtract operation. The code that does the subtraction doesn't live anywhere and has no name.  It is a lambda expression .Sometimes called an "anonymous function". The sequence => is the lambda operator and it separates the "stuff goes in" from the "stuff comes out" parts of the expression. In this case the parameters a and b to in, and the result of a-b comes out. Because thingToDo accepts integers and returns integers, inside the lambda expression the types of a and b are integers. 

OK. We can put these anonymous lumps of code in our program and pass them around on the end of references, just like any other object. What about closures? Take a look at this strange code:

{
    int localValue = 99;
    thingTodo = (a, b) => { return a + b + localValue; };
}

This is stupid, useless code. The statement inside the lambda function is using the local variable localValue for some silly reason. This is quite legal, compiles fine, and gives the compiler a problem to solve. The problem is that the lifetime of localValue exceeds the block in which it is declared. Remember that local variables are usually stored on the stack and deleted when the program leaves the block of code in which the local variable is declared. However, thingToDo is still referring to a lambda expression that uses the value of localValue. So localValue can't die. It must live on outside the block in which it is declared.

The compiler solves this problem by putting the localValue variable on the heap, rather than the stack. The heap is where we keep stuff that has to stick around for a while. This extension of the life of a local variable is called a "closure". 

Short question. Long answer. Phew. 

C# Quick Question 1 resolution

This was the question:

Can I make some C# that compiles and contains these two statements? What type are d and x?

d = null;
d += x;

I wrote the question as a result of my surprise that you could do this with delegate types. In other words, you could add things to a null delegate.

However, it turns out that it works with other things too, including strings. It seems that a += overload (which is how we get the behaviour that allows us to use += to append strings and add handlers to delegates) is also smart enough to make a new object if it turns out that the original is null. Which makes very good sense I suppose. 

Using the StopWatch class properly

Most people don't really care about the speed that their programs run at. Unless they run too slow of course. 

However, if really do want to know timings, C# provides a rather useful StopWatch class in the System.Diagnostics namespace that you can use to measure the time it takes your program to do something. You use it like this:

Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// Do something that takes a while
stopwatch.Stop();
Console.Error.WriteLine("It took: " + stopwatch.ElapsedMilliseconds);

It is especially useful if you're trying to re-write a program so that it uses parallel processing, and you want to find out how long things take to complete. Parallel programming is where your code makes use of all the processors in your computer, not just one of them. It should be faster. 

I'm writing some stuff about parallel features in C# at the moment. For a book that I might have mentioned. So I wrote two versions of the code and then discovered, to my dismay, that the one I'd carefully optimised actually seemed to run slower than the original. I know that for small data sets a parallel solution might not be worth the effort of setting up all the parallel gubbins, and I also know that if there are any shared variables that the parallel code ends up fighting over, this can impact on speed, but whatever I did, the parallel version always took at least as long as the original. Wah. Took me a little while to find the mistake. Here's my code:

Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
// Do single threaded version
stopwatch.Stop();
Console.Error.WriteLine("It took: " + stopwatch.ElapsedMilliseconds);
stopwatch.Start();
// Do parallel version
stopwatch.Stop();
Console.Error.WriteLine("It took: " + stopwatch.ElapsedMilliseconds);

The parallel version always, always takes longer than the serial version. Have you found the mistake yet?

Turns out that stopping and starting a stopwatch doesn't reset it. So the time for the parallel version is added onto the time for the original. And I'm an idiot. The method: 

stopwatch.Restart();

- resets the stopwatch and starts it, so that my second operation is timed correctly. I'm now getting sensible speedups, which is nice...

"In-band" error signalling is dangerous

I've been working on the Hull Pixelbot for what seems like ages (You probably think I've been blogging about it for roughly as long. Don't care. My blog.)

Anyhoo, today I learned the perils of "in band error signalling". The phrase "In-band" is a radio term. It means that control information is sent in the came channel as the data. My "in-band" errors worked like this:

int findVariablePos (char * name)
{
    int result;
     /* Stuff happens in here to find the variable  */
    return result; // return the offset of the variable */
}

The function findVariablePos returns the offset into the variable name table of the variable with the given name. In a program a variable is a named location that you can use to store data that is used as the program runs. If my program does this:

i = 99

- this is an attempt to store 99 in a variable called 'i' for no readily apparent reason. The thing running the program needs to have a way of finding out where in the computer the variable i is actually stored. That's the job of findVariablePos. Deep inside the program that actually runs the Hull Pixelbot script there is a statement like this:

int iPos = findVariablePos ("i")

The statement above (we're writing C by the way) would get me the position in the variable table of the variable with the name 'i'. "Aha", you say. "What if the program doesn't contain a variable called 'i'. ". Well, in that case the findVariablePos function returns the value -1 to indicate that the name was not found. This is kind of sensible, because you can't have anything at the position -1 in a table, negative numbers are meaningless in this context. All good. To make things clearer I even did this:

#define VARIABLE_NOT_FOUND -1

This gives meaning to the value, so that I can write tests that make sense:

if (iPOS==VARIABLE_NOT_FOUND)
{
    Serial.println("Variable not found");
}

All good. Works fine. Then I re-factor the code and add a bunch of new error codes. I then decide it would be nice to have a set of numbers that the user (and other programs) can use to make sense of error messages. And I make the following change:

#define VARIABLE_NOT_FOUND 2

This makes sense in the context of fiddling with my error numbers, but it means that if the program ever puts a variable in location 2 in the variable table, the program will completely fail to find it because every time it gets the offset value this will be regarded as meaning that no variable was found. 

Which is of course what happened. The problem in caused by a bad design decision (using the data value as a means of signalling errors) and then doing something without considering the consequences. The latest version of findVariablePos looks like this:

int findVariablePos (char * name, int * position)
{
    int result;
     /* Stuff happens in here to find the variable  */
    
    *position = result;
    return FOUND_OK;
}

The result of the call is returned via one channel (the result of function) and the position value is returned by the method setting the value of the second parameter, which is a pointer to the variable. The call is a bit more complicated:

int iPos;
if (findVariablePos ("i",  &iPos) == FOUND_OK)
{
    Serial.println("Found the variable");
}

However, it now doesn't matter what the error numbers are, and whether or not they clash with any valid variable positions. 

"In-band" error handling is great if your'e in a hurry and you're trying to keep the code simple and quick. But they also leave you open to problems further down the tracks. 

I'm an Idiot

I'm an idiot. Spent half a day chasing a bug that isn't really there. I've been writing the code that lets you download new programs into the HullPixelbot over the network. It worked fine for tiny programs, and then failed for larger ones.

I can hear you thinking "Buffer overrun". Except that I'm not that kind of idiot, and my code is carefully protected against too much data arriving when it shouldn't. And the fault didn't appear at a consistent point in the conversation, which it would if the program was hitting a hard limit somewhere. The failure threshold moved about a bit. Sometimes the program failed with a ten line program, other times it failed with an eight line program.

Any ideas? Took me a little while to figure it out. And of course it was my own stupidity.

It turned out that my over enthusiasm for code instrumentation was my undoing. When you are writing embedded programs you need code instrumentation to stay sane. In simple terms code instrumentation is "putting in print statements to see what is happening". My little program was notifying me each time a data byte arrived so I could convince myself that all was well.

And therein lies the stupidity. It was sending out three bytes of diagnostic information for each incoming byte. Since the send and receive rates are the same, this meant that after a while the program became unable to deal with incoming data because it was taking too long to send out the debug information.

The Arduino will do some buffering of data in and out of the device, but at some point this will fill up, at which point bad things happen. It's not going to happen at a consistent point in the program because the serial timings will vary slightly as other events occur on the processor.

So, I took out the debug code and the program worked perfectly. Strange but true.

And I'm an idiot.