且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

深度估计的准确性-立体视觉

更新时间:2022-11-29 19:08:09

我要补充一点,即使对于昂贵的相机,使用颜色也是个坏主意-仅使用灰度强度梯度即可.一些高端立体摄像机(例如Point Grey)的生产商过去常常依赖于颜色,然后转换为灰色.还应将偏差和方差视为立体声匹配误差的两个组成部分.这很重要,因为例如使用具有较大相关窗口的相关立体声将平均深度(即将世界建模为一堆平行的面片)并减少偏差,同时增加方差,反之亦然.因此,总会有一个权衡.

I would add that using color is a bad idea even with expensive cameras - just use the gradient of gray intensity. Some producers of high-end stereo cameras (for example Point Grey) used to rely on color and then switched to grey. Also consider a bias and a variance as two components of a stereo matching error. This is important since using a correlation stereo, for example, with a large correlation window would average depth (i.e. model the world as a bunch of fronto-parallel patches) and reduce the bias while increasing the variance and vice versa. So there is always a trade-off.

除了上面提到的因素以外,立体声的准确性还取决于算法的细节.由算法来验证深度(立体估计后的重要步骤)并优雅地修补无纹理区域中的孔.例如,考虑来回验证(将R与L匹配应产生与将L与R匹配相同的候选对象),斑点噪声去除(用

More than the factors you mentioned above, the accuracy of your stereo will depend on the specifics of the algorithm. It is up to an algorithm to validate depth (important step after stereo estimation) and gracefully patch the holes in textureless areas. For example, consider back-and-forth validation (matching R to L should produce the same candidates as matching L to R), blob noise removal (non Gaussian noise typical for stereo matching removed with connected component algorithm), texture validation (invalidate depth in areas with weak texture), uniqueness validation (having a uni-modal matching score without second and third strong candidates. This is typically a short cut to back-and-forth validation), etc. The accuracy will also depend on sensor noise and sensor's dynamic range.

最后,由于d = f * B/z,其中B是相机之间的基线,f是以像素为单位的焦距,z是沿光轴的距离,因此您最终必须问一个关于深度与精度的函数的问题.因此,精度对基线和距离的依赖性很大.

Finally you have to ask your question about accuracy as a function of depth since d=f*B/z, where B is a baseline between cameras, f is focal length in pixels and z is the distance along optical axis. Thus there is a strong dependence of accuracy on the baseline and distance.

Kinect将提供1mm的精度(偏差),最大偏差可达1m左右.然后它急剧下降. Kinect的死区可达50厘米,因为两个相机在近距离处没有足够的重叠.是的-Kinect是一款立体摄像机,其中一台摄像机是由IR投影仪模拟的.

Kinect will provide 1mm accuracy (bias) with quite large variance up to 1m or so. Then it sharply goes down. Kinect would have a dead zone up to 50cm since there is no sufficient overlap of two cameras at a close distance. And yes - Kinect is a stereo camera where one of the cameras is simulated by an IR projector.

我确信使用概率立体声,例如马尔可夫随机场上的置信传播",可以实现更高的精度.但是那些方法假定了关于物体表面的光滑度或特定的表面取向的一些先验知识.请参阅第14页.

I am sure with probabilistic stereo such as Belief Propagation on Markov Random Fields one can achieve a higher accuracy. But those methods assume some strong priors about smoothness of object surfaces or particular surface orientation. See this for example, page 14.