PyCUDA on Mac OS X El Capitan

PyCUDA doesn’t seem to build on El Capitan:

ld: file not found: @rpath/CUDA.framework/Versions/A/CUDA for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Here’s how I fixed it:

tar -xzvf pycuda-2015.1.3.tar.gz
cd pycuda-2015.1.3
python --cuda-root=/usr/local/cuda --ldflags="-F/Library/Frameworks -framework CUDA" --cxxflags="-arch x86_64"
sudo make install

Hyperbolic Worm Gear in OpenJsCad

I came across the following design for a worm gear on thingiverse:


In contrast to a normal worm gear where very few teeth are in contact at any time, the majority of the teeth mesh, greatly reducing the friction and more evenly distributing the forces.

The author posted some instructions on reddit that describe how to produce the gears. This is achieved by taking a cross-section of the wheel gear, revolving whilst twisting to produce a toroidal shape, and then subtracting that from a cylinder to get the worm gear.

Here’s the torus shape that is to be subtracted:

Screen Shot 2014-10-18 at 18.18.15

(note: shown here with only 10 teeth to make structure easier to interpret)

This is then subtracted from a cylinder like this:

Screen Shot 2014-10-18 at 18.23.39

To get the final worm gear:

Screen Shot 2014-10-18 at 18.23.17

At first this seemed straightforward to achieve with OpenJsCad – create a polygon for the gear profile, then use solidFromSlices to translate and twist the gear profile. But there’s a catch – the polygon must be convex and the gear profile certainly is not!

I found an alternative – manually constructing the shape out of each individual polygon. This is not too tricky to achieve, but it is fairly slow. The torus is built out of a number of discrete pieces, with the number being a parameter that determines both the quality and the time taken to build the object. Each step is swept at a constant radius with a varying twist angle.

This gives us the vertices that make up the torus shape, the next step is make polygons between each edge:

var gp1 = gear_points[i];
var gp2 = gear_points[(i+1)%gear_points.length];

var gf1 = new CSG.Vector3D(gp1.x, gp1.y, 0).rotateX(90);
var gf2 = new CSG.Vector3D(gp2.x, gp2.y, 0).rotateX(90);

var gf11 = gf1.transform(M1);
var gf12 = gf1.transform(M2);

var gf21 = gf2.transform(M1);
var gf22 = gf2.transform(M2);

polygons.push(new CSG.Polygon([
 new CSG.Vertex(gf11),
 new CSG.Vertex(gf12),
 new CSG.Vertex(gf22)

polygons.push(new CSG.Polygon([
 new CSG.Vertex(gf11),
 new CSG.Vertex(gf22),
 new CSG.Vertex(gf21)

Here, gp1 and gp2 are adjacent edge points on the gear profile. M1 and M2 transforms these points for the current step and the next step, yielding 2 sets of points. The final step is to build two triangles between these points.

One slice of the torus looks like this:

Screen Shot 2014-10-18 at 18.41.07

Here’s the final result:

Screen Shot 2014-10-18 at 18.34.17


And the full code:

// Here we define the user editable parameters: 
function getParameterDefinitions() {
    return [
        { name: 'numTeeth', caption: 'Number of teeth:', type: 'int', default: 60 },
        { name: 'circularPitch', caption: 'Circular pitch:', type: 'float', default: 3 },
        { name: 'pressureAngle', caption: 'Pressure angle:', type: 'float', default: 20 },
        { name: 'clearance', caption: 'Clearance:', type: 'float', default: 0.2 },
        { name: 'wheelHeight', caption: 'Height of wheel gear', default: 10},
        { name: 'stride', caption: 'Gearing stride', default: 1},
        { name: 'wormRadius', caption:'Radius of worm gear', type: 'float', default: 12},
        { name: 'wormGearCentreSupportRadius', caption: 'Radius of central support', type:'float', default: 6},
        { name: 'steps', caption:'Steps used to produce round elements', default: 20 },

function configureGearParams(params) {
    params.twistAngle = params.stride*360/params.numTeeth;
    var addendum = params.circularPitch / Math.PI;
    var dedendum = addendum + params.clearance;
    params.pitchRadius = params.numTeeth * params.circularPitch / (2 * Math.PI);
    params.baseRadius = params.pitchRadius * Math.cos(Math.PI * params.pressureAngle / 180);
    params.outerRadius = params.pitchRadius + addendum;
    params.rootRadius = params.pitchRadius - dedendum;
    return params;

// Main entry point; here we construct our solid: 
function main(params) {
    params = configureGearParams(params);

        // set to 0 when using gear profile to subtract
    params.clearance = 0;
    // get the gear cross-section points
    var gear_points = involuteGearProfile(params);
    OpenJsCad.log("Wheel outer radius: "+params.outerRadius);
    if(0) {
        return worm_screw(params);
        return worm_gear(params);
        return worm_torus(params);
        return wheel_gear(params);

    var wheel = wheel_gear(params).setColor(0.8, 0.2, 0);
    var worm = worm_gear(params);

    var gearRotateRadius = params.baseRadius + params.wormGearCentreSupportRadius;
    return wheel.union(worm.translate([gearRotateRadius, 0, 0]).rotateX(90));
    //var worm = worm_gear(params);
    //var wheel = wheel_gear(params);

function wheel_gear(params) {
    // get the gear cross-section points
    var gear_points = involuteGearProfile(params);
    var gear_poly = new CSG.Polygon2D(gear_points);
    return linear_extrude({
        height: params.wheelHeight,
        twist: params.twistAngle,
        slices: params.steps,
    }, gear_poly).translate([0, 0, -params.wheelHeight/2]);

function worm_torus(params) {
       var clearance = params.clearance;

    // get the gear cross-section points
    var gear_points = involuteGearProfile(params);

    var gearRotateRadius = params.baseRadius + params.wormGearCentreSupportRadius;
    // Manually build polygons for each triangle
    var polygons = [];
    for(var s = 0; s < params.steps; s++) {
        var a1 = s*360/params.steps;
        var a2 = (s+1)*360/params.steps;
        var mT = CSG.Matrix4x4.translation([gearRotateRadius, 0, 0]);
        var mR1 = CSG.Matrix4x4.rotationZ(a1);
        var mR2 = CSG.Matrix4x4.rotationZ(a2);
        var mRTw1 = CSG.Matrix4x4.rotationY(params.twistAngle*s/params.steps);
        var mRTw2 = CSG.Matrix4x4.rotationY(params.twistAngle*(s+1)/params.steps);
        var M1 = mRTw1.multiply(mT).multiply(mR1);
        var M2 = mRTw2.multiply(mT).multiply(mR2);
        for(var i = 0; i < gear_points.length; i++) {
            var gp1 = gear_points[i];
            var gp2 = gear_points[(i+1)%gear_points.length];
            var gf1 = new CSG.Vector3D(gp1.x, gp1.y, 0).rotateX(90);
            var gf2 = new CSG.Vector3D(gp2.x, gp2.y, 0).rotateX(90);

            var gf11 = gf1.transform(M1);
            var gf12 = gf1.transform(M2);
            var gf21 = gf2.transform(M1);
            var gf22 = gf2.transform(M2);

            polygons.push(new CSG.Polygon([
                        new CSG.Vertex(gf11),
                        new CSG.Vertex(gf12),
                        new CSG.Vertex(gf22)
            polygons.push(new CSG.Polygon([
                        new CSG.Vertex(gf11),
                        new CSG.Vertex(gf22),
                        new CSG.Vertex(gf21)

    var torus = CSG.fromPolygons(polygons);
    return torus;

function worm_screw(params) {
    var gearRotateRadius = params.baseRadius + params.wormGearCentreSupportRadius;
    // this portion of the cylinder is the main bit in contact with wheel
    // beyond this the wheel must be tapered to avoid interfering
    var cylinderHeight = Math.sqrt(
        params.outerRadius*params.outerRadius -
        Math.pow(gearRotateRadius - params.wormRadius, 2));
    var coneTop = CSG.cylinder({
        start: [0, 0, cylinderHeight/2],
        end: [0, 0, cylinderHeight/2 + params.wormRadius],
        radiusStart: params.wormRadius,
        radiusEnd: params.wormGearCentreSupportRadius,
        resolution: params.steps,
    var coneBottom = CSG.cylinder({
        start: [0, 0, -cylinderHeight/2],
        end: [0, 0, -(cylinderHeight/2 + params.wormRadius)],
        radiusStart: params.wormRadius,
        radiusEnd: params.wormGearCentreSupportRadius,
        resolution: params.steps,

    var screw = CSG.cylinder({
        start: [0,0,-cylinderHeight/2],
        end: [0,0,cylinderHeight/2],
        resolution: params.steps,

    return screw.union(coneTop).union(coneBottom);

function worm_gear(params) {
    var torus = worm_torus(params);

    var screw = worm_screw(params);
    return screw.subtract(torus);

function involuteGearProfile(params) {
    var maxtanlength = Math.sqrt(params.outerRadius*params.outerRadius - 
    var maxangle = maxtanlength / params.baseRadius;

    var tl_at_pitchcircle = Math.sqrt(params.pitchRadius*params.pitchRadius -
    var angle_at_pitchcircle = tl_at_pitchcircle / params.baseRadius;
    var diffangle = angle_at_pitchcircle - Math.atan(angle_at_pitchcircle);
    var angularToothWidthAtBase = Math.PI / params.numTeeth + 2*diffangle;

    // build a single 2d tooth in the 'points' array:
    var resolution = 5;
    var points = [];
    for(var i = 0; i <= resolution; i++)
        // first side of the tooth:
        var angle = maxangle * i / resolution;
        var tanlength = angle * params.baseRadius;
        var radvector = CSG.Vector2D.fromAngle(angle);    
        var tanvector = radvector.normal();
        var p = radvector.times(params.baseRadius).plus(tanvector.times(tanlength));
        points[i] = p;
        // opposite side of the tooth:
        radvector = CSG.Vector2D.fromAngle(angularToothWidthAtBase - angle);    
        tanvector = radvector.normal().negated();
        p = radvector.times(params.baseRadius).plus(tanvector.times(tanlength));
        points[2 * resolution + 1 - i] = p;
    var gear_face_points = [];
    var p = 0;
    for(var j = 0; j < params.numTeeth; j++) {
            var angle = j*360/params.numTeeth;
            for(var i = 0; i < points.length; i++) {
                gear_face_points[p++] = points[i].rotateZ(angle);

    return gear_face_points;

Chibios Board Generator

Producing board files for a new board to use with ChibiOS can be a cumbersome process. ChibiStudio features an Eclipse plugin that makes this process much easier.

Firstly Install Eclipse CDT.

Manual Eclipse plugin install

Windows users can use ChibiStudio which has these plugins installed by default, but it is also possible to manually install the plugin on Eclipse running on Linux/Mac.

In Eclipse go to Help > Install New Software. Add a new repository:


Then install the Chibios plugin for Eclipse



Also install the XML editor to allow editing of the board config file


Board config editing + generation

Create a folder for the board with a subfolder called cfg/. Copy a sample .chcfg board file from a similar board from the ChibiOS boards directory to use as the base.

Open the .chcfg board file in Eclipse (make sure it opens with the XML editor) and make the necessary changes. Some aspects of modifying the board such as setting pin names are easier to perform using a text editor, but the XML editor is handy for setting pin functions:

Screen Shot 2014-05-10 at 19.37.05

To generate the board files select the .chcfg file in the browser and click the following button in the toolbar:

Screen Shot 2014-05-10 at 19.34.08

The console should display something like this:

Screen Shot 2014-05-10 at 19.36.38

Job done!

Mikromedia-M4 With Chibios

For a while I’ve been looking for a development board that meets the following requirements:

  • Small board with LCD + touchscreen
  • STM32, preferably F4
  • Documentation

The latter point is quite important as there is a proliferation of no-name chinese STM32 LCD boards on ebay, but they tend to be poorly documented, with some sellers not even giving a datasheet or schematic before purchase!

Ultimately I stumbled upon the mikromedia for stm32f4 that met my needs. Most attractively it was a single board module, unlike the majority of boards on ebay, which are two stacked PCBs.

The nice thing about this module is that it comes well packaged (and you’d expect that given the cost..), with printed documentation and schematics that is almost big enough to read without straining your eyes.

The time that I ordered this board happened to coincide with a 15$ discount, which made ordering direct from mikroe more economical than from farnell, even including duty + import taxes, and it meant I could get some of the prototyping shields which aren’t available on farnell. It took about 3-4 days to make its way from Belgrade to Cornwall.




In my opinion there is one big omission with this board, and its in the way the LCD is interfaced. The high density STM32 chips have a peripheral called FSMC (flexible static memory controller), which memory maps reads/writes to the display, working in either 8 or 16 bit modes. Unfortunately the designers of the board have opted to use standard GPIO pins for the parallel interface, which requires extra commands and has no fine grain control over timing like the FSMC peripheral has. It is a shame, but not the end of the world. Presumably they did this so that they can have code that is compatible with the other mikromedia kits, which use different microcontrollers.


The board has a microSD slot wired up via SPI. SDC would’ve been nice, but SPI works.


The board is clearly designed and sold to be used with Mikroe’s comercial (and non-free) compiler and GUI libraries, but I had other plans for it – to use the freely available GCC compiler with a RTOS (ChibiOS) and a an embedded GUI project that I was quite involved with in the early days, writing and adapting low level drivers for several displays – ugfx.

Getting ChibiOS up and running on this was fairly painless, given that it has excellent support for the STM32 family. All it takes is to produce a board definition for the module, create a makefile based project by copying an existing STM32F4 project as a base, and setting the appropriate board to use as oscillator settings (took me a while to realise that the mikromedia does not have a high speed external crystal!). An empty git repo that makes getting started with this combo is provided at the end.

Once the project was set up, it is built by running ‘make’, and I programmed the chip using an ST-LinkV2 and OpenOCD 0.7:

[code] openocd -f interface/stlink-v2.cfg -f \
target/stm32f4x_stlink.cfg -c "init" -c "reset init" -c \
"flash write_image erase build/ch.bin 0x08000000" -c \
"reset" -c "shutdown"[/code]

Display Driver

This is the fun part. Recent version of this board have an ILI9341, according to a leaflet in the box. I’ve been referring to this datasheet for the module, and the datasheet for ILI9341.

The LCD is wired up in 8 bit mode, with IM3:0 = 0011, which according to the datasheet means it is an ‘8080 MCU 18-bit bus interface’:

but this can’t be correct as the LCD is only wired up with 8 bit data! So either the schematic is wrong or they’ve bodged it. Given that the data pins go to DB10:17, I think it must be in ‘8080 MCU 8-bit bus interface II’ mode, which means that IM3:0 should = 1001.

To be continued!

L-BFGS-B MATLAB interface on Mac OS X Lion and Ubuntu

I needed to use L-BFGS-B, which unfortunately uses fortran code making it a pain in the arse to compile. Additionally, I’m not that keen on the interface, specifically on how limited the options are for passing in parameters to the objective and gradient callback functions. If I had time I’d rewrite it entirely in C, like the brilliant LibLBFGS (unfortunately, note the missing -B suffix), which is so much easier to compile. But I don’t, so here I am faffing to get it working..

In both cases it was necessary to modify matlabstring.h to change

#include <string>


#include <string.h>

on Mac OS X:

  1. Install gfortran
  2. compile the library from within MATLAB:[code title=”mex command”]mex -output lbfgsb arrayofmatrices.cpp \
    matlabexception.cpp matlabprogram.cpp \
    matlabstring.cpp lbfgsb.cpp matlabmatrix.cpp \
    matlabscalar.cpp program.cpp solver.f[/code]

run examplehs038 to test install:

1 573
2 393
3 132
4 11.5
5 1.48
6 1.12
7 0.984
8 0.305
9 0.0847
10 0.0101
11 0.000123
12 0.000113
13 0.000113
14 0.000112
15 0.000109
16 9.75e-05
17 7.93e-05
18 7.17e-05
19 3.55e-05
20 1.46e-05
21 1.71e-06
22 1.94e-07
23 1.47e-07
24 8.84e-10
25 2e-11
26 2.21e-12
27 7.16e-14


And on Ubuntu 11.10

  1. Compile gcc-4.3.4 with support for fortran, adapted from here:
    # download gcc 4.3.4
    # extract, make and cd into separate build dir
    tar -xvf gcc-4.3.4.tar.bz2
    mkdir gcc-4.3.4-build
    cd gcc-4.3.4-build/
    # Install build dependencies, this should get all of them
    sudo apt-get build-dep gcc-4.5
    # the "fix"
    export LIBRARY_PATH=/usr/lib/x86_64-linux-gnu
    # configure, set prefix so install will only touch the /opt/gcc-4.3.4/ directory
    ../gcc-4.3.4/configure --prefix=/opt/gcc-4.3 --program-suffix=4.3 --enable-languages=c,c++,fortran
    # build, specify number of cpu corse after -j
    make -j4
    # install
    sudo make install
  2. Modify the Makefile:
    # Linux settings.
    MEX = /usr/local/MATLAB/R2011b/bin/mex
    MATLAB_HOME = /usr/local/MATLAB/R2011b
    CXX = g++4.3
    F77 = gfortran4.3
    CFLAGS = -O3 -fPIC -pthread
    FFLAGS = -O3 -fPIC -fexceptions
    TARGET = lbfgsb
    OBJS = solver.o matlabexception.o matlabscalar.o matlabstring.o \
    matlabmatrix.o arrayofmatrices.o program.o matlabprogram.o \
    CFLAGS += -Wall -ansi -DMATLAB_MEXFILE
    all: $(TARGET)
    %.o: %.cpp
    $(CXX) $(CFLAGS) -I$(MATLAB_HOME)/extern/include -o $@ -c $^
    %.o: %.f
    $(F77) $(FFLAGS) -o $@ -c $^
    $(TARGET): $(OBJS)
    $(MEX) -cxx CXX=$(CXX) CC=$(CXX) FC=$(FCC) LD=$(CXX) -lgfortran -lm \
    -O -output $@ $^
    rm -f *.o $(TARGET)
  3. run make

Chibios C++ Notifications templates

I’ve found myself using memory pools + mailboxes a lot to pass data between different processes.

For example, I read sensors, pass the data to a process which filters that, and then pass the result to another process which saves it.

Originally I had one memory pool / mailbox per server process, but I found that limiting for several reasons – I didn’t want the main server thread to have to wait if a client was being slow processing the data, and I wanted the client to have control over how much data it needed buffering. So instead I swapped it around, and created a few helper classes for automating the process.

Firstly I define a message:

class SensorData : public NotifierMsg<SensorData> {
    int16_t _acc[3], _gyro[3], _mag[3];
    float acc[3], gyro[3], mag[3];
    systime_t time;
    uint32_t status;
    bool_t magData(void);

Then the server:

class SensorHandler : public EnhancedThread<256>, public Notifier<SensorData> {
    Sensor *acc, *mag, *gyro;
    virtual msg_t Main(void);
    SensorHandler(Sensor*, Sensor*, Sensor*);

And then start the process:

SensorHandler sensors( acc, gyro, mag);

The client thread registers a listener on the server, specifying a cache of 2:

Listener<SensorData, 2> sensor_listener(&sensors);

Then in my server thread I populate the message and ‘broadcast’ it:

d.status = (gyro->getData(d._gyro, d.gyro) << SENS_GYR_DATA) |
(acc->getData(d._acc, d.acc) << SENS_ACC_DATA) |
(mag->getData(d._mag, d.mag) << SENS_MAG_DATA);

/* broadcast data */

Meanwhile I create a client Listener, and handle the data:

Listener<SensorData, 2> sensor_listener(&sensors);
int i=0;

i = 99;
while(true) {
    SensorData *data = sensor_listener.getData();
    if(++i == 100) {
        cPrintf("[%d %d %d] ", data->_acc[0], data->_acc[1], data->_acc[2]);
        cPrintf("[%d %d %d]\r\n", data->_gyro[0], data->_gyro[1], data->_gyro[2]);
        i = 0;

The full code can be found on GitHub.