binarymillennium: lidar

Showing posts with label lidar. Show all posts

2009-11-21

Natal competitor: Optricam?

For a while I was worried that after Microsoft acquired 3DV Systems and their depth sensing ZCam, which might have been available at the beginning of this year, they were going to release the Natal exclusively for the Xbox and the possibilities for non-MS sanctioned applications would be severely hampered. And then to have to suffer delays for software I don't care about when the hardware may be ready now. But there may be similar products from other vendors due to arrive soon.

I learned about the PMDTec Camcube a few months ago. It is a lot less expensive than a Swissranger, but there are no U.S. distributors and nothing on the web from customers that are using it.

But this last week there was this press-release about a Belgian company Optrima (Israel, Germany, Switzerland, and now Belgium- all the U.S. flash lidar vendors seemed to have missed the boat on low cost continuous-wave IR led range finding systems to instead focus on extremely expensive aerospace and high end mobile robotics applications) teaming with TI and a body motion capture software vendor (Softkinetic) to produce Natal/Zcam-like results in the same application area- games and general computer UI. There is mention of Beagleboard support in the press-release, so having say Ubuntu be able to communicate is very likely.

Hopefully TI will really get behind it and all the economies of scale that MS can bring to bear can be matched, and it will only cost around $100.

Also, I'm seeing more and more red clothing and jumpsuits- possibly with specially IR reflective component, which makes me suspicious about the limitations of these sensors (also using them with windows letting in direct sunlight is probably out of the question).

2009-06-16

Xbox Project Natal

A little less than a year ago I remember stumbling across the Zcam from 3DV Systems, the company promised two orders of magnitude decreases in the cost of flash array lidar through mass production- the trick is to market it as a device anyone can use, not just as a robotics or general automation tool. The company promised to be in the market by the end of 2008, and after emails went unanswered I assumed it was vaporware.

The closest competitor would be the Mesa Imaging SwissRanger, which I think goes for $5000-$10000. Beyond that there are very expensive products from Advanced Scientific Concepts or Ball Aerospace that are in the hundreds of thousands of dollars range at least. ASC made a deal with iRobot that might bring the price down through economies of scale, though they probably aren't going to put it on the Roomba anytime soon. More likely the Packbot which already costs $170K, why not round that up to half a million?

In late 2008 to early 2009 rumors surfaced that Microsoft was going to buy 3DV Systems, and now we have the official announcements about Natal. And of course no mention of 3DV Systems (which hasn't updated their webpage in over a year) or even how it measures the phase shift or time of flight of light pulses in a sensor array to produce depth images. Given enough processing power, the right software, and good lighting, it would be possible to do everything seen in the Natal videos with a single camera. The next step up would be stereo vision to get depth images- it's possible that's what Natal is, but it seems like they would have mentioned that since that technology is so conventional.

But that won't stop me from speculating:

Natal is probably a 0.5-2 megapixel webcam combined with a flash lidar with a resolution of 64x64 or 128x128 pixels, and maybe a few dozen levels of depth bins.

The low resolution means there is a ton of software operating on the video image and the depth information to derive the skeletal structure for full body motion capture. All that processing means the speed and precision is going to be somewhat low- it would be great to buy one of these and be able to record body movements for use in 3D animation software, machinima, independent games, or full body 3D chat (there's no easy way to do intersections or collisions with other people in an intuitive way so don't get too excited), but I doubt it will capture a lot of nuance.

The lidar might be continuous wave (CW) like the SwissRanger. This has an interesting property where beyond the maximum range of the sensor, objects appear closer again- if the range was 10 feet, an object 12 feet away is indistinguishable from one 2 feet away, or 22 feet away.

Beyond that, hopefully MS sees the potential for this beyond an Xbox peripheral. It would be criminal not to be able to plug this into a PC, and have at least Windows drivers, an SDK + DirectX support. The next most obvious thing would be to use it to promote MS Robotics Studio, and offer a module for that software to use the Natal. If it just has a USB connection then it could be placed on a moderately small mobile robot, and software could use the depth maps for collision avoidance and with some processing power be able to computer 3D or 2D grid maps (maybe like this) and figure out when it has returned to the same location.

The next step is to make a portable camera that takes a high megapixel normal image along with a depth image. Even with the low resolution and limited range (or range that rolls over), the depth information could be passed on to photosynth to reduce the amount of pictures needed to make a good synth. MS doesn't make cameras, but why not license the technology to Nikon or Canon? Once in dedicated cameras, it's on to cell phone integration...

The one downside is that the worst application seems to be as a gaming device, which is bad because I'd like it to be very successful in order to inspire competing products and later generations of the same technology. It is certainly not going to have the precision of a Wii MotionPlus, and maybe not even a standard Wii controller (granted that it can do some interesting things that a Wii controller can't).

But even if it isn't a huge success, it should be possible to get a device from the first generation, and it's only a matter of time before someone hacks it and produces Linux drivers, right?

2008-11-20

Artoolkit + rangefinder

Since my relatively inexpensive purely visual depth map approach wasn't that successful, I've tried it out using a rangefinder instead of a visible laser. This means I can point the video camera straight at the marker (which is attached to the rangefinder), and it can point at anything provided I don't tilt it so the camera can't see the marker/fiducial.

This is the result:

Artoolkit with a Rangefinder from binarymillenium on Vimeo.

The following plots show the tracked attitude of the rangefinder as measured by ARToolkit:

My left to right bottom to top scanning approach is very apparent.

And here is the tracked attitude (as a 3-component vector) plus the range vs. time:

You can see how cyclical it is, as I scan the floor in front of me the range doesn't change much until I reach one end and tilt the tripod up a little, and then later on I start to capture the two wheels of the car.

2008-08-20

Makeavi

Discovered a neat windows (and vista) tool for turning image sequences into videos: http://makeavi.sourceforge.net/

1280x720 in the 'Microsoft Video 1' format worked well, though 57 MB of pngs turned into 135 MB of video. 'Uncompressed' didn't produce a video just a small 23kb file. 'Intel IYUV' sort of produced a video but not correctly. 'Cinepak' only output a single frame. 'VP60 Simple profile' and 'VP61 Advanced Profile' with the default settings worked, and actually produces video smaller than the source images, though quicktime player didn't like those files. Vimeo seems to think VP61 is okay:

More Velodyne Lidar - overhead view from binarymillenium on Vimeo.

This new video is similar to the animated gifs I was producing earlier, but using a new set of data. Vimeo seems to be acting up this morning, I got 75% through an upload of the entire file (the above is just a subset) and it locked up. I may try to produce a shorter test video to see if it works.

I have around 10 gigs of lidar data from Velodyne, and of course no way to host it.

My process for taking pcap files and exporting the raw data has run into a hitch- wireshark crashes when trying to 'follow udp stream' for pcap files larger than a couple hundred megabytes. Maybe there is another tool that can do the conversion to raw?

2008-08-14

Phase Correlation for Lidar Image Alignment

After going through the manual alignment of the point cloud data noted in the last post, I've gotten the translation step of automatic alignment using phase correlation.

Phase correlation requires the use of a 2d fft, which I couldn't find in Processing (would have to bring in a java lib for that somehow?). Instead I used Octave, which has the fft2, inverse ifft2 function, and many other convenient math functions. The Matlab/Octave file is here.

The fundamental phase correlation code is this:


a = imread('one.png')
b = imread('two.png')

af = fft2(a);
bf = fft2(b);

% cross power
cp = af.*conj(bf) ./ abs(af.*conj(bf));

icp = (ifft2(cp));

mmax = max(max(icp));
[sx,sy,v] = find(mmax == icp);

And sx and sy are the translation to apply to b to make it line up with image a. An additional check is to make sure the largest value in icp is above some threshold- if it is lower than the threshold then there is no good translation to align the data.

I'm a little suspicious that my input pngs are too regular and easy, every frame seems a constant displacement from the former as the vehicle with the lidar was moving at a constant velocity.

Rotation is mostly working in isolation but I need to revisit the proper method to make simultaneous rotation and translation work- there was a paper somewhere I need to dig up.

2008-08-11

Point Cloud Alignment

I took about 1/5th of the png images generated from the Velodyne point cloud data and manually aligned them in Gimp. It's easy to shift-select a bunch of images and open them as individual layers in Gimp, but there are no multiple layer selection capabilities: it's not possible for instance select all the layers and change their transparency.

In all of the images there are a few features, mostly beyond the edges of the road, that can be aligned with the earlier and later images. The closer together the images are in time the easier this is, but I was skipping every 4 images to take the 5th in order to speed up the process- also a gimp image with 300 layers is difficult to handle.

The later portions of the data are all purely translational, only at the very beginning are rotations and translations needed.

I think an automatic process for alignment won't be that hard, but the inherent inconsistency in frame to frame image will make for a lot of error. Translations correspond to phase shifts in the frequency domain, and I think rotations are almost as simple- and there isn't any scaling to account for.

2008-08-03

Animated gif of height map

animated gif of height map

Source code is here:

http://code.google.com/p/binarymillenium/source/browse/#svn/trunk/processing/velodyne

One interesting thing I discovered is that animated gifs with an alpha channel don't just let the back ground of the web page show through, they also don't clear the last frame of the gif- which was confusing for this gif before I blackened the background with ImageMagick:


for i in *png; do convert $i -background black -flatten +matte flat_$i; done
convert flat*png velodyne_hgt.gif

Velodyne Lidar Sample Data

Applying the db.xml calibration file

As with the pcap parsing, I originally thought I'd look into Python xml parsing. I'm sure if I was really interested in parsing xml I would have tried some of them out, but the interface I was hoping to find would look like this


import pyxmlthing
a = pyxmlthing.load("some.xml")
 some_array= a.item('9').rotCorrection

And I would have an array of all the rotCorrections of all the items of type 9. Instead I found a several xml parsers that required tons of code to get to a point where I'm still not sure if I could get at the rotCorrections or not. Which may be why I don't care for xml, flat-files are it.

So I just used vim to strip all the xml out and leave me with a nice text file that looks like this


0,     -3.8,     -7.0046468,     20,     21.560343,     -2.5999999
    1,     -1.5,     -6.7674689,     26,     21.516994,     2.5999999
    2,     5,     0.44408101,     28,     20.617426,     -2.5999999
    3,     6.8000002,     0.78093398,     32,     20.574717,     2.5999999

Where the first column is the laser index (0-63), the next might be the rotCorrection and so on.

To apply the calibration data, it's very important that the indexing derived from the raw data is correct- reading the 0xDDEE vs. the 0xDDFF (or similar) that designates upper or lower laser block is important.

The velodyne manual doesn't have a good diagram that shows the xyz axes and what is positive or negative direction for both the original lidar angle and the correction angles and offsets, so some experimentation is necessary there. The manual did mention it in this case the rotCorrection had to be subtracted from the base rotation angle. The vertical and horizontal offset are pretty minor for my visualization but important for accurate measurements obviously.

Processing viewer

Using Aaron Koblin's House of Cards SceneViewer as a starting point, I wrote code to take the point cloud data and display it:

Monterey Full from binarymillenium on Vimeo.

The data was split into files each containing a million data points and spanning a second in time. With some testing I found the lidar was spinning at about 10 Hz (of the possible 5, 10, or 15 Hz), so I would bite off 1/10 of the file and display that in the current frame, then the next 1/10th for the next frame, and then load the next file after 10 frames. A more consistent approach would be to split the data into a new text file for each frame.

Next Steps

Right now I'm running a large job to process each point-cloud frame into png height-map files as I did for the Radiohead data. It doesn't work as well with the 360 degrees of heights- some distances just have to be cut off and ignored to keep the resolutions low (although with the png compression having large empty spaces doesn't really take up any additional room). A lot of detail is lost on the nearby objects compared to a lot of empty space between distant objects.

So either using that data or going back to processing the raw point cloud, I'd like to track features frame-to-frame and derive the vehicle motion from them. I suspect this will be very difficult. Once I have the vehicle motion, I could create per-frame transformations that could create one massive point cloud or height map where still objects are in their proper places and other vehicles probably become blurs.

After that, if I can get a dataset from Velodyne or another source where a moving ground lidar moves in a circle or otherwise intersects its own path somewhere, then the proof that the algorithm works will be if in that big point cloud that point of intersection actually lines up. (though I suspect again that more advanced logic is need to re-align the data after the logic determines that it has encounter features that it has scene before).

2008-07-30

Velodyne Lidar- Monterey Data

There's something wrong with how I'm viewing the data, the streaks indicate every revolution of the lidar results in an angular drift- maybe I'm re-adding a rotational offset and allowing it to accumulate?

No, it turned out to be a combination of things- applying the wrong calibration data (essentially flopping the lower and upper laser blocks). Now it's starting to look pretty good, though there is a lot of blurriness because the environment is changing over time- that's a car pulling out into an intersection.

Velodyne Lidar - Monterey from binarymillenium on Vimeo.

2008-07-29

Velodyne Lidar Sample Data: Getting a .pcap into Python

Velodyne has provided me with this sample data from their HDL-64E lidar.

Instead of data exported from their viewer software, it's a pcap captured with Wireshark or a another network capture tool that uses the standard pcap format.

Initially I was trying to extract the lidar packets with libpcap and python, using pcapy or similar, and went through a lot of trouble getting the pcap library to build in the cygwin environment. Python and libpcap were able to load the data from the pcap, but rendered the binary into a long string with escape codes like '\xff'.

I then discovered that Wireshark has a function called 'follow udp stream'- right-click on the data part of a packet in Wireshark and then export as 'raw'. The exported data doesn't preserve division between packets any longer, but since each is of a consistent length (1206 bytes) it's easy to parse.

Python does work well for loading binary data from file:


import array

f = open('data.raw')  # the data exported from wireshark in the raw format

bin = array.array('B') # setup an array of typecode unsigned byte

bin.fromfile(f, 1206) # each packet has 1206 bytes of data

The contents of bin now have 1206 bytes of data that looks like this:


>>> bin
array('B', [255, 221, 33, 39, 5, 9, 67, 178, 8, 116, 160, 13, 126, 222, 13, 63, 217, 8, 162, 204, ...

The 'B' element doesn't prevent indexing into the real data with bin[0] and getting 255 and so on.

The first two bytes indicate whether the data is from the upper or lower array, and the 33 39 is interpreted to mean that the reading was taken when the sensor head was rotated to (39*255+33)/100 or 99.78 degrees.

For 32 times after those first two bytes there are pairs of distance bytes followed by single intensity bytes, and then it starts over for a total of 12 times... and there are 6 more bytes with information that is unnecessary now. See here for what I currently have for parsing the raw file, later I will get into add the per laser calibration data.

2008-07-26

More HoC: Preprocessing into pngs

house of cards height

house of cards intensity

2008-07-21

Radiohead - House of Cards 2

Radiohead - House of Cards from binarymillenium on Vimeo.

This rendered slowly, less than 1 fps. One possible speedup would be to pre-process the csv data into binary heightmap files, rather than loading and processing each frame.

Processing code is here:
http://code.google.com/p/binarymillenium/source/browse/trunk/processing/hoc/hoc.pde

It would be nice if they uploaded the data from the woman singing, and the party scene.

2008-07-15

Radiohead - House of Cards

I've been playing with the data for a couple of hours in Processing. The main problem is that the points are not consistent across animation frames, so it is necessary to produce a new set of points in a regular grid that then can be tessellated easily. By the end of the week I ought to have a video up in the official group on youtube and in higher quality on vimeo.