2009-11-21
Natal competitor: Optricam?
For a while I was worried that after Microsoft acquired 3DV Systems and their depth sensing ZCam, which might have been available at the beginning of this year, they were going to release the Natal exclusively for the Xbox and the possibilities for non-MS sanctioned applications would be severely hampered. And then to have to suffer delays for software I don't care about when the hardware may be ready now. But there may be similar products from other vendors due to arrive soon.
I learned about the PMDTec Camcube a few months ago. It is a lot less expensive than a Swissranger, but there are no U.S. distributors and nothing on the web from customers that are using it.
But this last week there was this press-release about a Belgian company Optrima (Israel, Germany, Switzerland, and now Belgium- all the U.S. flash lidar vendors seemed to have missed the boat on low cost continuous-wave IR led range finding systems to instead focus on extremely expensive aerospace and high end mobile robotics applications) teaming with TI and a body motion capture software vendor (Softkinetic) to produce Natal/Zcam-like results in the same application area- games and general computer UI. There is mention of Beagleboard support in the press-release, so having say Ubuntu be able to communicate is very likely.
Hopefully TI will really get behind it and all the economies of scale that MS can bring to bear can be matched, and it will only cost around $100.
Also, I'm seeing more and more red clothing and jumpsuits- possibly with specially IR reflective component, which makes me suspicious about the limitations of these sensors (also using them with windows letting in direct sunlight is probably out of the question).
2009-11-02
Using a canon camera to view jpegs
I tried using my Canon camera as a picture viewer, putting some downloaded jpegs on it- the pictures didn't show up at all when I tried to review them, only the pictures I had taken with the camera. Renaming the pictures to have camera names like DSC_0025.jpg made the camera show a question mark icon for the picture at least.
A little searching later I discovered a tool called paint.net, which saves jpegs in the proper format so the Canon will like them. The other trick is to re-size the canvas the picture is on so all dimensions are multiples of 8.
paint.net itself is somewhat interesting, quicker to load than Gimp but not so much easier or more intuitive or like Deluxe Paint that I'll use it for anything else.
A little searching later I discovered a tool called paint.net, which saves jpegs in the proper format so the Canon will like them. The other trick is to re-size the canvas the picture is on so all dimensions are multiples of 8.
paint.net itself is somewhat interesting, quicker to load than Gimp but not so much easier or more intuitive or like Deluxe Paint that I'll use it for anything else.
2009-09-12
Instructions for rendering with Processing on Amazon EC2
There are detailed instructions elsewhere on how to get started with EC2 in general, here are the high level things to do for my headless rendering project:
Get a unix command line environment that has python and ssh, I use cygwin under Windows, other times I dual boot into Ubuntu.
Get an Amazon EC2 account, create a ~/username.pem file, and make environmental variables for the keys (follow boto instructions).
Make sure pem permission are set to 700.
Edit ssh_config so that StrictHostChecking is set to no, otherwise ssh sessions started by the scripts will ask if it's okay to connect to every created instance- I could probably automate that response though.
Make sure there are no carriage returns (\r) in the pem file in Linux.
Get Elasticfox, put your credentials in.
Get boto
Get trajectorset
Create a security group called http that at least allows your ip to access a webserver of an ec2 instance that uses it.
At this point it should be possible to run ec2start.py, visit the ip address of the head node and watch the results come in. The ec2start script launches a few instances, one head node that will create noise seeds to send to the worker nodes via sqs, and then wait for the workers to process the seeds and send sqs messages back. The head node then copies the results files and renders the graphics, copying the latest results to folder that can be seen by index.html for web display.
My code is mainly for demonstration, so the key things I did that will help with alternate applications follow:
Custom AMI
You can use the AMI I created with the id 'ami-2bfd1d42', I used one of the Alestic Ubuntu amis and added Java, Xvfb, Boto, and a webserver like lighttpd (I forget if Xvfb was already installed or not).
Headless rendering
The EC2 instance lack graphics contexts at first, and trying to run a graphical application like an exported Processing project will not work (TBD did I ever try that?). Xvfb creates a virtual frame buffer that Processing can render to after running these commands:
Launching processes and detaching from them
I use python subprocess.Popen frequently to execute commands on the instances like this:
The problem is when one wants to run something and close the connection, and leave it running - like Xvfb above, it needs to run and stay running. One method is to leave the ssh connection open, but there is a limit of about 20 ssh sessions.
The trick is to use nohup:
Don't put extra quotes around the command to execute, which brings me to the next topic.
Quote escaping
There are a few bash commands that require parts to be in quotes- but in python the bash command is already is in quotes, and python will not understand the inner set of quotes unless they are escaped with the backslash:
Then at other times an additional level of quote escaping is required:
(I do this when I pass all of the cmd variable to be executed by ssh, and ssh wants it in quotes)
One backslash escapes on level of quoting, three escapes two levels? It's because the escaping backslash itself needs to be escaped. This gets confusing fast, and some experimentation with python in interactive mode is required to get it right.
Config file driven
It's not currently, not as much as at it needs to be, which makes it very brittle- to change plots requires making about three different edits, when a source config file should specify it for all.
Get a unix command line environment that has python and ssh, I use cygwin under Windows, other times I dual boot into Ubuntu.
Get an Amazon EC2 account, create a ~/username.pem file, and make environmental variables for the keys (follow boto instructions).
Make sure pem permission are set to 700.
Edit ssh_config so that StrictHostChecking is set to no, otherwise ssh sessions started by the scripts will ask if it's okay to connect to every created instance- I could probably automate that response though.
Make sure there are no carriage returns (\r) in the pem file in Linux.
Get Elasticfox, put your credentials in.
Get boto
Get trajectorset
Create a security group called http that at least allows your ip to access a webserver of an ec2 instance that uses it.
At this point it should be possible to run ec2start.py, visit the ip address of the head node and watch the results come in. The ec2start script launches a few instances, one head node that will create noise seeds to send to the worker nodes via sqs, and then wait for the workers to process the seeds and send sqs messages back. The head node then copies the results files and renders the graphics, copying the latest results to folder that can be seen by index.html for web display.
My code is mainly for demonstration, so the key things I did that will help with alternate applications follow:
Custom AMI
You can use the AMI I created with the id 'ami-2bfd1d42', I used one of the Alestic Ubuntu amis and added Java, Xvfb, Boto, and a webserver like lighttpd (I forget if Xvfb was already installed or not).
Headless rendering
The EC2 instance lack graphics contexts at first, and trying to run a graphical application like an exported Processing project will not work (TBD did I ever try that?). Xvfb creates a virtual frame buffer that Processing can render to after running these commands:
Xvfb :2
export DISPLAY=:2
Launching processes and detaching from them
I use python subprocess.Popen frequently to execute commands on the instances like this:
cmd = "Xvfb :2"
whole_cmd = "ssh -i ~/lucasw.pem root@" + dns_name + " \"" + cmd + "\""
proc = subprocess.Popen(whole_cmd, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(stdout,stderr) = proc.communicate()
The problem is when one wants to run something and close the connection, and leave it running - like Xvfb above, it needs to run and stay running. One method is to leave the ssh connection open, but there is a limit of about 20 ssh sessions.
The trick is to use nohup:
cmd = "nohup Xvfb :2"
Don't put extra quotes around the command to execute, which brings me to the next topic.
Quote escaping
There are a few bash commands that require parts to be in quotes- but in python the bash command is already is in quotes, and python will not understand the inner set of quotes unless they are escaped with the backslash:
cmd = "echo \"blah\" > temp.txt";
Then at other times an additional level of quote escaping is required:
cmd = "echo \\\"blah\\\" > temp.txt";
(I do this when I pass all of the cmd variable to be executed by ssh, and ssh wants it in quotes)
One backslash escapes on level of quoting, three escapes two levels? It's because the escaping backslash itself needs to be escaped. This gets confusing fast, and some experimentation with python in interactive mode is required to get it right.
Config file driven
It's not currently, not as much as at it needs to be, which makes it very brittle- to change plots requires making about three different edits, when a source config file should specify it for all.
2009-08-27
Computing Cloud Rendering with Processing and Amazon EC2
This project is my first experiment with using Amazon EC2 for cloud rendering. The source code is all there and I'll post detailed instructions on how to use it later, but here is a speeded up video of the output:
It looks kind of cool, but not too exciting- but there's potential for better things.
What I've done is launched several compute instances on EC2, where worker nodes create individual lines seen in the plots, and then pass data back to a head node, which creates the plots, puts them on a web page for real-time feedback, and stores all the frames for retrieval at the end of the run.
The plots are aggregations of all the results, blue is the presence of any line, and white is a high density of lines, and greenish tinge signifies the line was from a recently aggregated set. It's interesting because the more lines are aggregated, the less the plot changes, so it becomes increasingly boring.
All the plotting and data generation is done using java applications exported from Processing. 3D graphics are also possible, and something like this earlier video could be ported to the scripts I've made. There is no graphics card accessible on the EC2 machines, but virtual frame buffer software like Xvfb and software rendering (either Processing's P3D or software opengl) make it possible to trick the application into thinking there is.
It's not distributed rendering since all the rendering is on one computer, but I think I need to distribute the rendering in order to speed it up.
There is potential for more dynamic applications, involving user interaction through webpages, or simulations that interact with the results of previous simulations, and communicate with other nodes to alter what they are doing.
Computing Cloud Rendering from binarymillenium on Vimeo.
It looks kind of cool, but not too exciting- but there's potential for better things.
What I've done is launched several compute instances on EC2, where worker nodes create individual lines seen in the plots, and then pass data back to a head node, which creates the plots, puts them on a web page for real-time feedback, and stores all the frames for retrieval at the end of the run.
The plots are aggregations of all the results, blue is the presence of any line, and white is a high density of lines, and greenish tinge signifies the line was from a recently aggregated set. It's interesting because the more lines are aggregated, the less the plot changes, so it becomes increasingly boring.
All the plotting and data generation is done using java applications exported from Processing. 3D graphics are also possible, and something like this earlier video could be ported to the scripts I've made. There is no graphics card accessible on the EC2 machines, but virtual frame buffer software like Xvfb and software rendering (either Processing's P3D or software opengl) make it possible to trick the application into thinking there is.
It's not distributed rendering since all the rendering is on one computer, but I think I need to distribute the rendering in order to speed it up.
There is potential for more dynamic applications, involving user interaction through webpages, or simulations that interact with the results of previous simulations, and communicate with other nodes to alter what they are doing.
2009-08-23
Save Image As And Close Tab Firefox Addon
I haven't made a firefox addon before, but I thought I'd try something simple: combine the context menu "Save Image As..." with closing the current tab. My contribution consists of putting these two lines together:
To start out with I used the Firefox/Thunderbird Extension Wizard. Initially I didn't select the 'Create context menu item' and that may have caused problems with gContextMenu not being defined - it was either that or the fact I was trying to embed the commands into the firefoxOverlay.xul file as embedded javascript instead of putting it in the overlay.js file.
I found the first function by looking through the firefox source code first for the menuitem name of the function "Save Image As", and from there finding saveImage. The removeCurrentTab function was harder to find, but this addon provided source code that showed it: Stephen Clavering's CTC.
Tutorial pages I initially found about extension development were helpful, but I didn't see anything that talks about mozilla fundamentals- probably I need to find a book about it.
This addon goes well with the Menu Editor and Download Sort.
There is code in the real Save Image As for determining whether an image is being selected or not (my addon shows up regardless) I should add in next, and there should be logic that prevents the close action if the save as was canceled (less sure how to do that).
gContextMenu.saveImage();
gBrowser.removeCurrentTab();
To start out with I used the Firefox/Thunderbird Extension Wizard. Initially I didn't select the 'Create context menu item' and that may have caused problems with gContextMenu not being defined - it was either that or the fact I was trying to embed the commands into the firefoxOverlay.xul file as embedded javascript instead of putting it in the overlay.js file.
I found the first function by looking through the firefox source code first for the menuitem name of the function "Save Image As", and from there finding saveImage. The removeCurrentTab function was harder to find, but this addon provided source code that showed it: Stephen Clavering's CTC.
Tutorial pages I initially found about extension development were helpful, but I didn't see anything that talks about mozilla fundamentals- probably I need to find a book about it.
This addon goes well with the Menu Editor and Download Sort.
There is code in the real Save Image As for determining whether an image is being selected or not (my addon shows up regardless) I should add in next, and there should be logic that prevents the close action if the save as was canceled (less sure how to do that).
2009-08-05
Quick jmatio in Processing example
1. Download jmatio from mathworks file exchange
2. unzip and put contents in folder called jmatio
3. rename lib dir to library
3. rename library/jamtio.jar to library/jmatio.jar
4. create a mat file in the sketch data dir called veh_x.mat which contains an array called veh_x
5. Run the following code:
TBD use getContents instead of requiring the mat file name and array name be the same.
2. unzip and put contents in folder called jmatio
3. rename lib dir to library
3. rename library/jamtio.jar to library/jmatio.jar
4. create a mat file in the sketch data dir called veh_x.mat which contains an array called veh_x
5. Run the following code:
import com.jmatio.io.*;
import com.jmatio.types.*;
MatFileReader mfr = null;
try {
mfr = new MatFileReader(sketchPath + "/data/veh_x.mat" );
} catch (IOException e) {
e.printStackTrace();
exit();
}
if (mfr != null) {
double[][] data = ((MLDouble)mfr.getMLArray( "veh_x" )).getArray();
println(data.length +" " + data[0].length + " " + data[0][0]);
}
TBD use getContents instead of requiring the mat file name and array name be the same.
2009-07-03
Building Bundler v0.3 on Ubuntu
'The Office Box' requested help with running Bundler on linux ( specifically Ubuntu 9- I have a virtualbox vm of Ubuntu 8.04 fully updated to today, but I'll try this out on Ubuntu 9 soon) so I went through the process myself.
The binary version depends on libgfortran.so.3, which I couldn't find with aptitude, so I tried the building from source- it turned out to be not that hard. The is no 'configure' for bundler 0.3 to search for dependencies that aren't installed, so I built incrementally and installed packages as I ran into build failures. I might be missing a few I already had installed for other purposes, but do a sudo aptitude install on the following:
A missing gfortran produces the cryptic 'error trying to exec 'f951': execvp: No such file or directory)' message.
These might be necessary I'm not sure:
After that run the provided makefile, add the bundler bin folder to your LD_LIBRARY_PATH, and then go into the examples/kermit directory and run ../../RunBundler.sh to see there are good ply files in the bundle directory. Bundler is a lot slower than Photosynth for big jobs, I haven't tried the intel math libs though.
The full output from a successful kermit RunBundler run looks like this:
The binary version depends on libgfortran.so.3, which I couldn't find with aptitude, so I tried the building from source- it turned out to be not that hard. The is no 'configure' for bundler 0.3 to search for dependencies that aren't installed, so I built incrementally and installed packages as I ran into build failures. I might be missing a few I already had installed for other purposes, but do a sudo aptitude install on the following:
build-essentials
gfortran-4.2
zlib1g-dev
libjpeg-dev
A missing gfortran produces the cryptic 'error trying to exec 'f951': execvp: No such file or directory)' message.
These might be necessary I'm not sure:
lapack3
libminpack1
f2c
After that run the provided makefile, add the bundler bin folder to your LD_LIBRARY_PATH, and then go into the examples/kermit directory and run ../../RunBundler.sh to see there are good ply files in the bundle directory. Bundler is a lot slower than Photosynth for big jobs, I haven't tried the intel math libs though.
The full output from a successful kermit RunBundler run looks like this:
Using directory '.'
0
Image list is list_tmp.txt
[Extracting exif tags from image ./kermit000.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit001.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit002.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit003.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit004.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit005.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit006.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit007.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit008.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit009.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Extracting exif tags from image ./kermit010.jpg]
[Focal length = 5.400mm]
[Couldn't find CCD width for camera Canon Canon PowerShot A10]
[Found in EXIF tags]
[CCD width = 5.230mm]
[Resolution = 640 x 480]
[Focal length (pixels) = 660.803
[Found 11 good images]
[- Extracting keypoints -]
Finding keypoints...
1245 keypoints found.
Finding keypoints...
1305 keypoints found.
Finding keypoints...
1235 keypoints found.
Finding keypoints...
1220 keypoints found.
Finding keypoints...
1104 keypoints found.
Finding keypoints...
1159 keypoints found.
Finding keypoints...
949 keypoints found.
Finding keypoints...
1108 keypoints found.
Finding keypoints...
1273 keypoints found.
Finding keypoints...
1160 keypoints found.
Finding keypoints...
1122 keypoints found.
[- Matching keypoints (this can take a while) -]
../../bin/KeyMatchFull list_keys.txt matches.init.txt
[KeyMatchFull] Reading keys took 1.020s
[KeyMatchFull] Matching to image 0
[KeyMatchFull] Matching took 0.010s
[KeyMatchFull] Matching to image 1
[KeyMatchFull] Matching took 0.170s
[KeyMatchFull] Matching to image 2
[KeyMatchFull] Matching took 0.380s
[KeyMatchFull] Matching to image 3
[KeyMatchFull] Matching took 0.560s
[KeyMatchFull] Matching to image 4
[KeyMatchFull] Matching took 0.740s
[KeyMatchFull] Matching to image 5
[KeyMatchFull] Matching took 0.960s
[KeyMatchFull] Matching to image 6
[KeyMatchFull] Matching took 1.060s
[KeyMatchFull] Matching to image 7
[KeyMatchFull] Matching took 1.210s
[KeyMatchFull] Matching to image 8
[KeyMatchFull] Matching took 1.410s
[KeyMatchFull] Matching to image 9
[KeyMatchFull] Matching took 1.600s
[KeyMatchFull] Matching to image 10
[KeyMatchFull] Matching took 1.760s
[- Running Bundler -]
[- Done -]
2009-06-16
Xbox Project Natal
A little less than a year ago I remember stumbling across the Zcam from 3DV Systems, the company promised two orders of magnitude decreases in the cost of flash array lidar through mass production- the trick is to market it as a device anyone can use, not just as a robotics or general automation tool. The company promised to be in the market by the end of 2008, and after emails went unanswered I assumed it was vaporware.
The closest competitor would be the Mesa Imaging SwissRanger, which I think goes for $5000-$10000. Beyond that there are very expensive products from Advanced Scientific Concepts or Ball Aerospace that are in the hundreds of thousands of dollars range at least. ASC made a deal with iRobot that might bring the price down through economies of scale, though they probably aren't going to put it on the Roomba anytime soon. More likely the Packbot which already costs $170K, why not round that up to half a million?
In late 2008 to early 2009 rumors surfaced that Microsoft was going to buy 3DV Systems, and now we have the official announcements about Natal. And of course no mention of 3DV Systems (which hasn't updated their webpage in over a year) or even how it measures the phase shift or time of flight of light pulses in a sensor array to produce depth images. Given enough processing power, the right software, and good lighting, it would be possible to do everything seen in the Natal videos with a single camera. The next step up would be stereo vision to get depth images- it's possible that's what Natal is, but it seems like they would have mentioned that since that technology is so conventional.
But that won't stop me from speculating:
Natal is probably a 0.5-2 megapixel webcam combined with a flash lidar with a resolution of 64x64 or 128x128 pixels, and maybe a few dozen levels of depth bins.
The low resolution means there is a ton of software operating on the video image and the depth information to derive the skeletal structure for full body motion capture. All that processing means the speed and precision is going to be somewhat low- it would be great to buy one of these and be able to record body movements for use in 3D animation software, machinima, independent games, or full body 3D chat (there's no easy way to do intersections or collisions with other people in an intuitive way so don't get too excited), but I doubt it will capture a lot of nuance.
The lidar might be continuous wave (CW) like the SwissRanger. This has an interesting property where beyond the maximum range of the sensor, objects appear closer again- if the range was 10 feet, an object 12 feet away is indistinguishable from one 2 feet away, or 22 feet away.
Beyond that, hopefully MS sees the potential for this beyond an Xbox peripheral. It would be criminal not to be able to plug this into a PC, and have at least Windows drivers, an SDK + DirectX support. The next most obvious thing would be to use it to promote MS Robotics Studio, and offer a module for that software to use the Natal. If it just has a USB connection then it could be placed on a moderately small mobile robot, and software could use the depth maps for collision avoidance and with some processing power be able to computer 3D or 2D grid maps (maybe like this) and figure out when it has returned to the same location.
The next step is to make a portable camera that takes a high megapixel normal image along with a depth image. Even with the low resolution and limited range (or range that rolls over), the depth information could be passed on to photosynth to reduce the amount of pictures needed to make a good synth. MS doesn't make cameras, but why not license the technology to Nikon or Canon? Once in dedicated cameras, it's on to cell phone integration...
The one downside is that the worst application seems to be as a gaming device, which is bad because I'd like it to be very successful in order to inspire competing products and later generations of the same technology. It is certainly not going to have the precision of a Wii MotionPlus, and maybe not even a standard Wii controller (granted that it can do some interesting things that a Wii controller can't).
But even if it isn't a huge success, it should be possible to get a device from the first generation, and it's only a matter of time before someone hacks it and produces Linux drivers, right?
The closest competitor would be the Mesa Imaging SwissRanger, which I think goes for $5000-$10000. Beyond that there are very expensive products from Advanced Scientific Concepts or Ball Aerospace that are in the hundreds of thousands of dollars range at least. ASC made a deal with iRobot that might bring the price down through economies of scale, though they probably aren't going to put it on the Roomba anytime soon. More likely the Packbot which already costs $170K, why not round that up to half a million?
In late 2008 to early 2009 rumors surfaced that Microsoft was going to buy 3DV Systems, and now we have the official announcements about Natal. And of course no mention of 3DV Systems (which hasn't updated their webpage in over a year) or even how it measures the phase shift or time of flight of light pulses in a sensor array to produce depth images. Given enough processing power, the right software, and good lighting, it would be possible to do everything seen in the Natal videos with a single camera. The next step up would be stereo vision to get depth images- it's possible that's what Natal is, but it seems like they would have mentioned that since that technology is so conventional.
But that won't stop me from speculating:
Natal is probably a 0.5-2 megapixel webcam combined with a flash lidar with a resolution of 64x64 or 128x128 pixels, and maybe a few dozen levels of depth bins.
The low resolution means there is a ton of software operating on the video image and the depth information to derive the skeletal structure for full body motion capture. All that processing means the speed and precision is going to be somewhat low- it would be great to buy one of these and be able to record body movements for use in 3D animation software, machinima, independent games, or full body 3D chat (there's no easy way to do intersections or collisions with other people in an intuitive way so don't get too excited), but I doubt it will capture a lot of nuance.
The lidar might be continuous wave (CW) like the SwissRanger. This has an interesting property where beyond the maximum range of the sensor, objects appear closer again- if the range was 10 feet, an object 12 feet away is indistinguishable from one 2 feet away, or 22 feet away.
Beyond that, hopefully MS sees the potential for this beyond an Xbox peripheral. It would be criminal not to be able to plug this into a PC, and have at least Windows drivers, an SDK + DirectX support. The next most obvious thing would be to use it to promote MS Robotics Studio, and offer a module for that software to use the Natal. If it just has a USB connection then it could be placed on a moderately small mobile robot, and software could use the depth maps for collision avoidance and with some processing power be able to computer 3D or 2D grid maps (maybe like this) and figure out when it has returned to the same location.
The next step is to make a portable camera that takes a high megapixel normal image along with a depth image. Even with the low resolution and limited range (or range that rolls over), the depth information could be passed on to photosynth to reduce the amount of pictures needed to make a good synth. MS doesn't make cameras, but why not license the technology to Nikon or Canon? Once in dedicated cameras, it's on to cell phone integration...
The one downside is that the worst application seems to be as a gaming device, which is bad because I'd like it to be very successful in order to inspire competing products and later generations of the same technology. It is certainly not going to have the precision of a Wii MotionPlus, and maybe not even a standard Wii controller (granted that it can do some interesting things that a Wii controller can't).
But even if it isn't a huge success, it should be possible to get a device from the first generation, and it's only a matter of time before someone hacks it and produces Linux drivers, right?
2009-04-06
OpenCV example, and why does Google do so poorly?
Take searching for cvGetSpatialMoment:
http://www.google.com/search?hl=en&q=cvGetSpatialMoment&btnG=Google+Search&aq=f&oq=
All the top results are nearly useless, just code that doesn't help much if you don't know what cvGetSpatialMoment does.
The "CV Reference Manual" that comes with an install of OpenCV probably should come up first (the local html files of course aren't google searchable), or any real text explanation or tutorial of the function. So scrolling down further there are some odd but useful sites like http://www.ieeta.pt/~jmadeira/OpenCV/OpenCVdocs/ref/opencvref_cv.htm. I guess the official Willow Garage docs here haven't been linked to enough.
The official OpenCV book on Google is highly searchable, some pages are restricted but many are not.
Through all that frustration I did manage to learn a lot of basics to load an image and process a portion of the image to look for a certain color, and then find the center of the region that has that color.
split it into two halves for separate processing
convert it to hsv color space
get only the hue component using the COI '[color] Channel Of Interest' function
find only the parts of an image within a certain hue range
erode it down to get rid of noise
and then find the centers of mass of the found regions
Copy the single channel mask back into a three channel rgb image
and draw circles on a temp image where the centers of mass are
All the work of setting channels of interest and regions of interest was new to me. I could have operated on images in place rather than creating many new ones, taking up more memory (and I would need to remember to free the memory created by all of them), but for debugging it's nice to keep around the intermediate steps.
http://www.google.com/search?hl=en&q=cvGetSpatialMoment&btnG=Google+Search&aq=f&oq=
All the top results are nearly useless, just code that doesn't help much if you don't know what cvGetSpatialMoment does.
The "CV Reference Manual" that comes with an install of OpenCV probably should come up first (the local html files of course aren't google searchable), or any real text explanation or tutorial of the function. So scrolling down further there are some odd but useful sites like http://www.ieeta.pt/~jmadeira/OpenCV/OpenCVdocs/ref/opencvref_cv.htm. I guess the official Willow Garage docs here haven't been linked to enough.
The official OpenCV book on Google is highly searchable, some pages are restricted but many are not.
Through all that frustration I did manage to learn a lot of basics to load an image and process a portion of the image to look for a certain color, and then find the center of the region that has that color.
IplImage* image = cvLoadImage( base_filename, CV_LOAD_IMAGE_COLOR );
split it into two halves for separate processing
IplImage* image_left = cvCreateImage( cvSize( image->width/2, image->height), IPL_DEPTH_8U, 3 );
cvSetImageROI( image, cvRect( 0, 0, image->width/2, image->height ) );
cvCopy( image, image_left );
convert it to hsv color space
IplImage* image_left_hsv = cvCreateImage( cvSize(image_left->width, image_left->height), IPL_DEPTH_8U, 3 );
cvCvtColor(image_left,image_left_hsv,CV_BGR2HSV);
get only the hue component using the COI '[color] Channel Of Interest' function
IplImage* image_left_hue = cvCreateImage( cvSize(image_left->width, image_left->height), IPL_DEPTH_8U, 1 );
cvSetImageCOI( image_left_hsv, 1);
cvCopy(image_left_hsv, image_left_hue);
find only the parts of an image within a certain hue range
cvInRangeS(image_left_hue, cvScalarAll(huemin), cvScalarAll(huemax), image_msk);
erode it down to get rid of noise
cvErode(image_msk,image_msk,NULL, 3);
and then find the centers of mass of the found regions
CvMoments moments;
cvMoments(image_msk, &moments, 1);
double m00, m10, m01;
m00 = cvGetSpatialMoment(&moments, 0,0);
m10 = cvGetSpatialMoment(&moments, 1,0);
m01 = cvGetSpatialMoment(&moments, 0,1);
// TBD check that m00 != 0
float center_x = m10/m00;
float center_y = m01/m00;
Copy the single channel mask back into a three channel rgb image
IplImage* image_rgb = cvCreateImage( cvSize(image_msk->width, image_msk->height), IPL_DEPTH_8U, 3 );
cvSetImageCOI( image_rgb, 2);
cvCopy(image_msk,image_rgb);
cvSetImageCOI( image_rgb, 0);
and draw circles on a temp image where the centers of mass are
cvCircle(image_rgb,cvPoint(int(center_x),int(center_y)), 10, CV_RGB(200,50,50),3);
All the work of setting channels of interest and regions of interest was new to me. I could have operated on images in place rather than creating many new ones, taking up more memory (and I would need to remember to free the memory created by all of them), but for debugging it's nice to keep around the intermediate steps.
2009-03-29
mewantee example
I've made enough fixes to mewantee to open it open and allow most of it to be viewed without logging in, and creating a user no longer requires activation.
There isn't much on there right now, but I have a good example: There's a project called crossephex I was working on a few months ago, and I'll probably start on it again soon. It's supposed to be a vj/visuals generating tool for processing similar to gephex. I need a bunch of basic graphics to use as primitives to mix with each other to create interesting effects, so on mewantee I have this request, which asks for help from other people generating those graphics. Each one shouldn't take more than a few minutes to make, of course I could do it myself but I think it's a good example of what the site might be good for.
There isn't much on there right now, but I have a good example: There's a project called crossephex I was working on a few months ago, and I'll probably start on it again soon. It's supposed to be a vj/visuals generating tool for processing similar to gephex. I need a bunch of basic graphics to use as primitives to mix with each other to create interesting effects, so on mewantee I have this request, which asks for help from other people generating those graphics. Each one shouldn't take more than a few minutes to make, of course I could do it myself but I think it's a good example of what the site might be good for.
2009-03-25
mewantee!
I created a website called mewantee using google appengine. It's closed to the public right now, but I need some users to try it out and tell me if they run into any problems using it normally, or any feedback at all. If you login with a gmail account (google handles the login, I won't know anything except your email address, and even that will be hidden from other users), I'll be sent a notification email and I can then activate your account.
What is it about? Mainly I'd like it to incentivize the creation of creative commons and open source content and it uses a sort of economic model to do it. Even if it is too strange or the kind of users needed to make it work don't show up, it was a good exercise to learn python and appengine.
Something else to figure out- I have mewantee.com pointing to mewanteee.appspot.com, is there any way to make it stay mewantee.com to everyone else like they way this blog is really on blogspot.com but is seen as binarymillenium.com.
What is it about? Mainly I'd like it to incentivize the creation of creative commons and open source content and it uses a sort of economic model to do it. Even if it is too strange or the kind of users needed to make it work don't show up, it was a good exercise to learn python and appengine.
Something else to figure out- I have mewantee.com pointing to mewanteee.appspot.com, is there any way to make it stay mewantee.com to everyone else like they way this blog is really on blogspot.com but is seen as binarymillenium.com.
2009-02-21
Gephex 0.4.3 updated for Ubuntu 8.10
Since there hasn't been a better version of Gephex since 0.4.3 (though I haven't tried compiling the repository recently, last time was not successful), I've downloaded the source and hacked it until it built on Ubuntu 8.10 updated to today:
http://binarymillenium.googlecode.com/files/gephex-0.4.3updated.tgz
I haven't tested it all the way, especially the video input modules, but it probably works.
Most of the changes have to do with updates to gcc, where it treats classname::method in cpp files as errors, and some files needed to include stdlib.h or string.h that didn't before. Also some structure definition in libavcodec had to be messed with- the static declaration removed.
nasm, qt3 in the form libqt3-headers, and libxv-dev had to be installed (and other non-standard things for 8.10 that I already had installed for other purposes). For qt3, flags for the include, bin, and lib dir needed to be passed to configure.
I had to run configure in the ffmpeg library and disable mmx with the --disable-mmx flag, putting that flag in the top-level makefile didn't work. My configuration specific makefiles are in the tarball so you would definitely have to rerun configure to override them.
Next I'll be creating a new custom gephex module for my ARToolkit multimarker UI project.
----
Update
I've tested this build more extensively, and have discovered that the Ubuntu visual effects that are on by default cause the gephex output window to flicker. To disable them go to System | Preferences | Appearance | Visual Effects and select none. It's possible I need to build gephex with OpenGL support and these options will co-exist better.
Also, my screencap frei0r module I've depended on extensively in the past updates extremely slowly on the laptop I'm using currently, it may be an ATI thing (I originally developed it on an Nvidia system).
http://binarymillenium.googlecode.com/files/gephex-0.4.3updated.tgz
I haven't tested it all the way, especially the video input modules, but it probably works.
Most of the changes have to do with updates to gcc, where it treats classname::method in cpp files as errors, and some files needed to include stdlib.h or string.h that didn't before. Also some structure definition in libavcodec had to be messed with- the static declaration removed.
nasm, qt3 in the form libqt3-headers, and libxv-dev had to be installed (and other non-standard things for 8.10 that I already had installed for other purposes). For qt3, flags for the include, bin, and lib dir needed to be passed to configure.
I had to run configure in the ffmpeg library and disable mmx with the --disable-mmx flag, putting that flag in the top-level makefile didn't work. My configuration specific makefiles are in the tarball so you would definitely have to rerun configure to override them.
Next I'll be creating a new custom gephex module for my ARToolkit multimarker UI project.
----
Update
I've tested this build more extensively, and have discovered that the Ubuntu visual effects that are on by default cause the gephex output window to flicker. To disable them go to System | Preferences | Appearance | Visual Effects and select none. It's possible I need to build gephex with OpenGL support and these options will co-exist better.
Also, my screencap frei0r module I've depended on extensively in the past updates extremely slowly on the laptop I'm using currently, it may be an ATI thing (I originally developed it on an Nvidia system).
2009-02-18
Marker Tracking as Visualization Interface
My idea is that I would be able to do an ARToolkit based visualization performance by using a clear table with markers I can slide, rotate, add and remove, and all those movement could correspond to events on screen. Unlike other AR videos the source video wouldn't be incorporated into the output necessarily, the markers provide an almost infinitely expressive set of UI knobs and sliders.
So far I have this:
AR User Interface from binarymillenium on Vimeo.
The lighting is difficult, the markers need to be white and black pixels but the plexiglass tends to produce reflections. Also if the light source itself is visible a marker will not be able to be right on top of it. I need a completely black backdrop under the plexiglass so there are no reflections that will obscure the markers, and also more numerous and softer diffuse lights.
One way to solve the reflection problem is to have the camera looking down at a table, though it's a little harder to get the camera up high enough, and I didn't want my hands or body to obscure the markers- the clear table idea is more elegant and self-contained.
The frame rate isn't very high, I need to work on making it all more real-time and responsive. It may have to be that one computer is capturing video and finding marker positions and sending them to another computer completely free to visualize it. Also more interpolation and position prediction could smooth things out, and cover up gaps if a marker isn't recognized in a frame, but that could produce more lag.
So far I have this:
AR User Interface from binarymillenium on Vimeo.
The lighting is difficult, the markers need to be white and black pixels but the plexiglass tends to produce reflections. Also if the light source itself is visible a marker will not be able to be right on top of it. I need a completely black backdrop under the plexiglass so there are no reflections that will obscure the markers, and also more numerous and softer diffuse lights.
One way to solve the reflection problem is to have the camera looking down at a table, though it's a little harder to get the camera up high enough, and I didn't want my hands or body to obscure the markers- the clear table idea is more elegant and self-contained.
The frame rate isn't very high, I need to work on making it all more real-time and responsive. It may have to be that one computer is capturing video and finding marker positions and sending them to another computer completely free to visualize it. Also more interpolation and position prediction could smooth things out, and cover up gaps if a marker isn't recognized in a frame, but that could produce more lag.
2009-01-29
Bundler - the Photosynth core algorithms GPLed
[update- the output of bundler is less misaligned looking than this, I was incorrectly displaying the results here and in the video]
Bundler (http://phototour.cs.washington.edu/bundler) takes photographs and can create 3D point clouds and camera positions derived from them similar to what Photosynth does- this is called structure from motion. It's hard to believe this has been out as long as the publically available Photosynth but I haven't heard about it- it seems to be in stealth mode.
Bundler - GPLed Photosynth - Car from binarymillenium on Vimeo.
From that video it is apparent that highly textured flat surfaces do best. The car is reflective and dull grey and so generates few correspondences, but the hubcaps, license plate, parking strip lines, and grass and trees work well. I wonder if this could be combined with a space carving technique to get a better car out of it.
It's a lot rougher around the edges lacking the Microsoft Live Labs contribution, a few sets I've tried have crashed with messages like "RunBundler.sh: line 60: 2404 Segmentation fault (core dumped) $MATCHKEYS list_keys.txt matches.init.txt" or sometimes individual images throw it with "This application has requested the Runtime to terminate it..." but it appears to plow through (until it reaches that former error).
Images without good EXIF data trip it up, the other day I was trying to search flickr and find only images that have EXIF data and allow full view, but am not successful so far. Some strings supposed limit search results by focal length, which seems like would limit results only to EXIF, but that wasn't the case.
Bundler outputs ply files, which can be read in Meshlab with the modification that these two lines be added to ply header:
element face 0
property list uchar int vertex_index
Without this Meshlab will give an error about there being no faces, and give up.
Also I have some Processing software that is a little less user friendly but doesn't require the editing:
http://code.google.com/p/binarymillenium/source/browse/trunk/processing/bundler/
Bundler can't handle filenames with spaces right now, I think I can fix this myself without too much work, it's mostly a matter of making sure names are passed everywhere with quotes around them.
Multi-megapixel files load up sift significantly until it crashes after taking a couple of gigabytes of memory (and probably not able to get more from windows):
...
[Found in EXIF tags]
[CCD width = 5.720mm]
[Resolution = 3072 x 2304]
[Focal length (pixels) = 3114.965
[Found 18 good images]
[- Extracting keypoints -]
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Resizing them to 1600x1200 worked without crashing and took only a few hundred megabytes of memory per image, so more megapixels may work as well.
The most intriguing feature is the incremental option, I haven't tested it yet but it promises to be able to take new images and incorporate them into existing bundles. Unfortunately each new image has a matching time proportional to the number of previous images- maybe it would be possible to incrementally remove images also, or remove found points that are in regions that already have high point densities?
Labels:
3d,
bundler,
photosynth,
processing,
video
2009-01-24
Nested blocks in Django duplication problems
The official instructions and cursory google searches didn't turn up a good explanation, but I've figured it out for myself. I was confused about nesting blocks, sometimes getting no output or getting duplicate output.
In this example the base has a single level of nesting with two sub-blocks.
base.html:
This file duplicate the original block structure but adds a comment:
some.html:
The output would be
Moving the 'new stuff' line before the the block.super would swap the order of the output statements. There is no way to interject the new comment inbetween inner1 and inner2 without creating a new block that sits inbetween them in the parent base.html file.
Don't try to do this (which is what I thought to do initially):
It will result in duplication like this:
Instead, the extending file that wants to alter any parent block does it in a non-nested way, don't redefine an inherited block while inside of another inherited block:
And now the output will be without duplication.
block.super needs to be in there or the redefinition of inner2 won't be applied to anything.
In this example the base has a single level of nesting with two sub-blocks.
base.html:
{% block outer %}
{% block inner1 %}
this is inner1
{% endblock inner1 %}
{% block inner2 %}
this is inner2
{% endblock inner2 %}
{% endblock outer %}
This file duplicate the original block structure but adds a comment:
some.html:
{% extends base.html %}
{% block outer %}
{{ block.super }}
new stuff
{% endblock outer %}
The output would be
this is inner1
this is inner 2
new stuff
Moving the 'new stuff' line before the the block.super would swap the order of the output statements. There is no way to interject the new comment inbetween inner1 and inner2 without creating a new block that sits inbetween them in the parent base.html file.
Don't try to do this (which is what I thought to do initially):
{% extends base.html %}
{% block outer %}
{{ block.super }}
new stuff
{% block inner2 %}
new inner2
{% endblock inner2 %}
{% endblock outer %}
It will result in duplication like this:
this is inner1
new inner2
new stuff
new inner2
Instead, the extending file that wants to alter any parent block does it in a non-nested way, don't redefine an inherited block while inside of another inherited block:
{% extends base.html %}
{% block outer %}
{{ block.super }}
new stuff
{% endblock outer %}
{% block inner2 %}
new inner2
{% endblock inner2 %}
And now the output will be without duplication.
this is inner1
new inner2
new stuff
block.super needs to be in there or the redefinition of inner2 won't be applied to anything.
2009-01-04
Laser Scanning
The idea is to project laser lines onto a flat surface, image them, and then put objects in front of the surface and compute the displacement made by the object.
Here is the flat base with a line on it:
Here is the line at the same position with objects intersecting:
Finding depth involves figuring out what the 2d projection of the normal line that is perpendicular to the wall at any point along the laser line. I'm working on this but it's also possible to guess an average line for low precision demonstration. The software looks for all points where it believes the laser is shining, and then computes the intersection of the normal line with the original object free laser line, and gets depth.
I had about 8 different images from laser lines, here are the results from two:
The yellow lines are the projected normals from the base line to the found laser line on the backpack and broom. There are some spurious results, and also on the dark woven backpack material the laser was not always reflected strongly enough to register.
The source code is here:
http://code.google.com/p/binarymillenium/source/browse/trunk/processing/laserscan
Here is the flat base with a line on it:
Here is the line at the same position with objects intersecting:
Finding depth involves figuring out what the 2d projection of the normal line that is perpendicular to the wall at any point along the laser line. I'm working on this but it's also possible to guess an average line for low precision demonstration. The software looks for all points where it believes the laser is shining, and then computes the intersection of the normal line with the original object free laser line, and gets depth.
I had about 8 different images from laser lines, here are the results from two:
The yellow lines are the projected normals from the base line to the found laser line on the backpack and broom. There are some spurious results, and also on the dark woven backpack material the laser was not always reflected strongly enough to register.
The source code is here:
http://code.google.com/p/binarymillenium/source/browse/trunk/processing/laserscan
Subscribe to:
Posts (Atom)