About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >


android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Thursday, 16 December 2010, 04:23

Branchy code

This week I was looking at feature detectors - and one of those I was trying is FAST. This is pretty much the definition of branchy code - a function which is a single if statement of 2900 lines.

I didn't think it would be something that would map to OpenCL particularly well, but I was pleasantly surprised.

I simply took the if statement and wrapped it in a kernel which I call for every pixel, and added a simple list append function at the end (more on that below). With a bit of playing with the kernel work size I got it down to about 75uS for a 1024x768 frame at around 1000 output points.

I still haven't done non-maximum suppression or the like but it certainly lives up to it's name - it's damn FAST. I've been playing with SURF and others and even a partial implementation is licking 2000uS/frame. FAST seems to be very sensitive to noise and camera focus though, so i'm not sure I can use it - hopefully the non-maximum suppression will help.

GPU List Append

One problem with GPU coding is that it particularly likes having large well-defined data-sets to work with, and what I needed to do was just generate points beyond a threshold. In the past i've just had a separate post-process which 'reduces' the data, but that input had already been reduced and wasn't just a whole frame's-worth.

So I came up with something very simple based on atomics. I don't know whether it's the best solution but it seems to work ok in this case.

kernel void somekernel(..., global uint *indexp, global float *posp) {

    // do stuff

    if (result > threshold) {
        uint index = atom_inc(&indexp[0]);

        if (index < 1024) {
            posp[index] = (float2) { x, y };

Anything that then uses the 'index' count just has to limit it to the maximum (e.g. 1024) and away it goes.


As of about an hour ago i'm now on leave until next year. Yay. I've been hanging out for it for the last few days and it's been pretty difficult keeping the motivation level up (despite working on some interesting stuff). Yard should keep me busy and i'll probably get into some hacking before long too. Maybe i'll finally play GT5 and 'upgrade' the PS3 firmware. But for the rest of the day it's time for SFA. And maybe a brewski or twoski. I've got one more wort busy in the cellar, and once i've bottled that next week i'll have over 100 longnecks - and even if that isn't enough for the whole summer it should be a good start.

Tagged opencl.
Gran Turismo 5 | Stuff
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!