英雄联盟LOL:opencv 技术教程 opencv教程基础篇pdf
目录
有新的认识会继续更新
一、基本操作
1、对图像的一些基本操作
①获取和修改像素的值
②、获取图像的特性
③、ROI regions of images, 图像的感兴趣区域
④、分离和融合图像通道
⑤、为图片创建一边界 use cv.copyMakeBorder()
2、图片的数学操作
①、加法,其中opencv的add函数是饱和运算,numpy 的加法是模运算
②、图像融合操作,将两张图片按照一定权重相加
③、位运算
3、代码性能检测与提升手段
①、Measuring Performance with OpenCV
②、一些代码优化的技巧
二、图像处理的方法
1、改变图像的颜色空间
①、常用的颜色空间
②、hsv中的物体提取
③、如何找HSV值进行跟踪
2、图像的几何变换
①、尺寸修改
②、图片位置的移动
③、图片旋转一定角度
④、仿射变换
⑤、视角变换
3、图像阈值处理
①、简单阈值化,二值化处理
②、适应性阈值化
4、图像平滑处理 Smoothing Images
①、二维卷积滤波
②、图像模糊 即 图像平滑
5、形态学处理 Morphological Transformations
①、腐蚀、Erosion
②、膨胀 Dilation
③、开运算、闭运算
④、梯度处理 Morphological Gradient
⑤、Top Hat and Black Hat
⑥、结构化元素Structuring Element
6、图像的梯度处理(高通滤波)
7、Canny Edge Detection 边缘检测
8、Image Pyramids 图像金字塔
9、对图像轮廓的操作 Contours
①、简介
②、轮廓特征,将轮廓在原图中进行标识
③、Contour Properties 轮廓属性
④、Contours : More Functions,更多功能
10、灰度直方图 Histograms in OpenCV
①、 Find, Plot, Analyze !!!
②、Histogram Equalization
③、2D Histograms
④、Histogram Backprojection 直方图反向投影
11、Image Transforms in OpenCV
①、Fourier Transform----傅里叶变换
②、Fourier Transform in Numpy
③、Fourier Transform in OpenCV
④、Performance Optimization of DFT
⑤、Why Laplacian is a High Pass Filter?
12、Template Matching
Template Matching in OpenCV
②、Template Matching with Multiple Objects
13、Hough Line Transform ---- 直线检测
①、Theory
②、Hough Transform in OpenCV
③、Probabilistic(概率) Hough Transform
14、Hough Circle Transform --- 圆形检测
15、Image Segmentation with Watershed Algorithm 分水岭
①、Theory
②、Code
16、GrabCut Algorithm(图割 提取前景)
①、Theory
②、Demo,lableme(做错了)
三、Feature Detection and Description 特征识别和描述
1、Understanding Features
2、Harris Corner Detection
①、Theory
②、Harris Corner Detector in OpenCV
③、Corner with SubPixel Accuracy
3、Shi-Tomasi Corner Detector & Good Features to Track
4、Introduction to SIFT (Scale-Invariant Feature Transform)
①、theory
②、SIFT in OpenCV
5、Introduction to SURF (Speeded-Up Robust Features),有问题
①、theory
②、SURF in OpenCV
6、FAST Algorithm for Corner Detection
①、theory
②、FAST Feature Detector in OpenCV
7、BRIEF (Binary Robust Independent Elementary Features)
①、theory
②、BRIEF in OpenCV
8、ORB (Oriented FAST and Rotated BRIEF)
①、theory
②、ORB in OpenCV
9、Feature Matching
①、Basics of Brute-Force Matcher
②、FLANN based Matcher
10、Feature Matching + Homography to find Objects
①、Basics
②、code①获取和修改像素的值
通过行列坐标获取某一点的像素值,对于BGR彩色图像来说,返回一个顺序是蓝色、绿色、红色通道数值的序列,而对于灰度图像,只返回相应的灰度值
②、获取图像的特性
③、ROI regions of images, 图像的感兴趣区域
文档里,梅西提的足球是感兴趣区域,并在图中的另一处区域进行了复制
④、分离和融合图像通道
如非必要,不用使用cv.split分离通道,耗时大,一般用numpy的索引即可
⑤、为图片创建一边界 use cv.copyMakeBorder()
下面的话解释了为什么蓝色变成了红色 ,整体的颜色都发生了反相
①、加法,其中opencv的add函数是饱和运算,numpy 的加法是模运算
以图片形式进行融合的代码如下,这里将两张图片的shape进行匹配后才能相加
②、图像融合操作,将两张图片按照一定权重相加
③、位运算
①、Measuring Performance with OpenCV
在代码的首尾减一下,计算可得程序的运行时间,以s为单位
上一段的代码的执行时间0.31s
用python 自带的time.time得到
②、一些代码优化的技巧
1--减少循环操作
2--尽量向量化算法
3--缓存一致性,:)????,不是我这种fw考虑的吧
4-- 减少数组(序列)的复制操作①、常用的颜色空间
图像空间中经常用到的就三个BGR、GRAY、HSV
注意HSV 的范围②、hsv中的物体提取
在hsv 颜色空间中更加容易用颜色特征提取出一个物体
官方的代码,对视频中的蓝色物体进行追踪 ③、如何找HSV值进行跟踪
官方给了一种通用方法
计算之前的蓝色追踪的hsv
120-10 = 110; 120+10 = 130 符合预期①、尺寸修改
②、图片位置的移动
使用cv.warpAffine(),其中变换矩阵
注意,row对应高度,col对应宽度
③、图片旋转一定角度
opencv 提供一种可以在图片中心按照一个角度进行旋转
④、仿射变换
在仿射变换中,变换后之前的线依然保持平行。设输入图像的三个点,输出图像的三个点,使用 cv.getAffineTransform 求出变换矩阵
官方的实例
⑤、视角变换
注意点,找的四个点中至少三个不在一条线上,效果,原图的四个顶点由选择的四个点进行顶替①、简单阈值化,二值化处理
阈值处理简单说就是,一个像素其值小于阈值设为0,大于阈值设为最大值
cv.threshold。函数返回的第一个数是阈值,第二个数是阈值处理后的函数
一般阈值化前先要转为灰度图像
②、适应性阈值化
适应性阈值化针对多种光源的情况,这个算法决定一个像素点阈值时是根据周围一小部分区域,所以我们在图像的不同区域得到不同阈值,不同区域用不同的阈值进行处理
③、Otsu's Binarization 大津二值化算法
Otsu's method avoids having to choose a value and determines it automatically.这个算法不需要自己设定阈值
Similarly, Otsu's method determines an optimal global threshold value from the image histogram.
是因为这个算法可以从灰度直方图中自己确定阈值
官方的例子如下,三种方法进行比较,明显第三个方法,先滤波去噪声,在ostu二值化
①、二维卷积滤波
卷积, The operation works like this: keep this kernel above a pixel, add all the 25 pixels below this kernel, take the average, and replace the central pixel with the new average value.
官方的核函数以平均滤波为例
②、图像模糊 即 图像平滑
Image blurring is achieved by convolving the image with a low-pass filter kernel. It is useful for removing noise. It actually removes high frequency content (eg: noise, edges) from the image. So edges are blurred a little bit in this operation (there are also blurring techniques which don't blur the edges). OpenCV provides four main types of blurring techniques.
图像平滑处理的实现就是基于上面的图像的二维卷积,低通滤波处理,除去噪声、边缘,所以会造成模糊的结果。opencv 提供四种模糊的方法
第一种、Averaging,使用cv.blur函数
第二种、Gaussian Blurring,Gaussian blurring is highly effective in removing Gaussian noise from an image,去除高斯噪声很有用
第三种、Median Blurring,This is highly effective against salt-and-pepper noise in an image.椒盐噪声
第四种、Bilateral Filtering,cv.bilateralFilter() is highly effective in noise removal while keeping edges sharp. 去除噪声时保持边缘的区分度
①、腐蚀、Erosion
it erodes away the boundaries of foreground object (Always try to keep foreground in white)
腐蚀的是前景对象,一般在二值化图像中,白色是前景对象
It is useful for removing small white noises (as we have seen in colorspace chapter), detach two connected objects etc.
可以用于不同对象之间的分离
腐蚀的原理, A pixel in the original image (either 1 or 0) will be considered 1 only if all the pixels under the kernel is 1, otherwise it is eroded (made to zero).②、膨胀 Dilation
It is just opposite of erosion.
③、开运算、闭运算
开运算 Opening
erosion followed by dilation,先腐蚀后膨胀
It is useful in removing noise, as we explained above.
闭运算 Closing
Dilation followed by Erosion.,先膨胀后腐蚀
It is useful in closing small holes inside the foreground objects, or small black points on the object.就是将前景图像(白色图像)里的小黑点给闭合了
④、梯度处理 Morphological Gradient
The result will look like the outline of the object.保留前景图像的轮廓
⑤、Top Hat and Black Hat
top-hat and black-hat transform are operations that are used to extract small elements and details from given images.
the top-hat transform is defined as the difference between the input image and its opening by some structuring element, while the black-hat transform is defined as the difference between the closing and the input image.
tophat 提取small elements
blackhat 提取details
⑥、结构化元素Structuring Element
But in some cases, you may need elliptical/circular shaped kernels. So for this purpose, OpenCV has a function, cv.getStructuringElement(). You just pass the shape and size of the kernel, you get the desired kernel.
官方文档的例子,矩形,椭圆,十字形的
OpenCV provides three types of gradient filters or High-pass filters, Sobel, Scharr and Laplacian.
注意
If you want to detect both edges, better option is to keep the output datatype to some higher forms, like cv.CV_16S, cv.CV_64F etc,
用sobel检测时要选用cv.CV_64F,这样才可以检测出两个内容
官方例子
Canny Edge Detection is a popular edge detection algorithm.
这个算法是多个步骤的合成,具体原理之后将,
效果
These set of images with different resolutions are called Image Pyramids (because when they are kept in a stack with the highest resolution image at the bottom and the lowest resolution image at top, it looks like a pyramid).
图像金字塔是其他操作的基础原理
One application of Pyramids is Image Blending.其中一个应用就是图像的融合
官方
我的例子,奇怪了,:《
①、简介
Contours is a Python list of all the contours in the image. Each individual contour is a Numpy array of (x,y) coordinates of boundary points of the object. 返回的轮廓使一个序列,保存着找到的所有轮廓
画轮廓的方法
②、轮廓特征,将轮廓在原图中进行标识
图像矩,Image moments ,From this moments, you can extract useful data like area, centroid etc.
Contour Approximation,轮廓近似,修改参数大小改变得到的轮廓的近似程度
Convex Hull,凸包,与上面的轮廓近似的功能类似,找出图像中凸出来的点
Bounding Rectangle,边界矩形,将轮廓用矩形给框出来,
Rotated Rectangle
Minimum Enclosing Circle
Fitting an Ellipse
③、Contour Properties 轮廓属性
Here we will learn to extract some frequently used properties of objects like Solidity, Equivalent Diameter, Mask image, Mean Intensity etc.
1. Aspect Ratio
2.Extent
3.Solidity
4、Equivalent Diameter
5、Orientation
Orientation is the angle at which object is directed.
6、Mask
mask = np.zeros(imgray.shape,np.uint8)
常常在其它函数的调用中作为参数
Pixel Points
Maximum Value, Minimum Value and their locations
Mean Color or Mean Intensity
7、Extreme Points
Convexity Defects,缺陷,以下图为例,标出来的轮廓与实际的轮廓的偏差就是defect
Point Polygon Test
This function finds the shortest distance between a point in the image and a contour.
返回 图像中一个点和一个轮廓的最短距离
Match Shapes, 比较两个图像的轮廓,返回一个度量值,越小,两张图像的相似度越高
Contours Hierarchy 轮廓的层次
What is Hierarchy?
In some cases, some shapes are inside other shapes. we call outer one as parent and inner one as child. This way, contours in an image has some relationship to each other.Representation of this relationship is called the Hierarchy.①、 Find, Plot, Analyze !!!
Theory
histogram,which gives you an overall idea about the intensity distribution of an image.
图像的强度分布
X 轴,像素值分布;
Y轴,该像素值对应的像素数
!!! Remember, this histogram is drawn for grayscale image, not color image
some terminologies related with histograms:
Bins:This each sub-part is called "BIN". In first case, number of bins were 256 (one for each pixel) while in second case, it is only 16. BINS is represented by the term histSize in OpenCV docs.就是x轴上有几个值
DIMS : It is the number of parameters for which we collect the data. In this case, we collect data regarding only one thing, intensity value. So here it is 1.
RANGE : It is the range of intensity values you want to measure. Normally, it is [0,256], ie all intensity values.
Find
Histogram Calculation in OpenCV
Histogram Calculation in Numpy
Plot
Using Matplotlib
Using OpenCV
What if you want to find histograms of some regions of an image?
当你只想了解图像某个区域的histogram
Just create a mask image with white color on the region you want to find histogram and black otherwise.
②、Histogram Equalization
a good image will have pixels from all regions of the image.
This normally improves the contrast of the image.
Numpy implementation 先看下Histogram Equalization的numpy实现
原图
Histogram Equalization
可以看出,图片的对比度增强,在直方图中,不同强度的像素数量的分布也更加均匀
Histograms Equalization in OpenCV
CLAHE (Contrast Limited Adaptive Histogram Equalization)
应对之前简单的均衡化造成的重要图像信息丢失的情况
③、2D Histograms
one-dimensional histogram, we are taking only one feature into our consideration.灰度值
in two-dimensional histograms, you consider two features. two features are Hue & Saturation values of every pixel.因此要先把颜色空间转为HSV 空间
在这段代码里,x轴的数值是S,y轴的数值是H
④、Histogram Backprojection 直方图反向投影
It is used for image segmentation or finding objects of interest in an image.
the output image will have our object of interest in more white compared to remaining part.
在输出的图像中,我们感兴趣的区域会有更多的白色
做法:
1、We create a histogram of an image containing our object of interest (in our case, the ground, leaving player and other things). 例子:背景是感兴趣的区域
2、The object should fill the image as far as possible for better results.
a color histogram is preferred over grayscale histogram.选择彩色的直方图
3、 "back-project" this histogram over our test image where we need to find the object.
in other words, we calculate the probability of every pixel belonging to the ground and show it.
就是计算每个像素属于背景(感兴趣的内容)的可能性
具体代码实现
Backprojection in OpenCV
①、Fourier Transform----傅里叶变换
我自己的理解,时域和频域,有些情况下,图像处理在频域会更加方便,所以使用傅里叶变换,
频域变换:
傅里叶变换,得到功率图和相位图并进行分析,低频(图像的背景,主体),高频(线条,轮廓),低频和高频的位置(是否中心化)
离散傅里叶变换(dft),具有性质(可分离、平移、叠加、周期、对称、旋转不变、比例变换、平均值…)
快速傅里叶变换(fft),减少计算
官方文档的解释
For a sinusoidal signal(正弦信号), x(t)=Asin(2πft), we can say f is the frequency of signal, and if its frequency domain is taken, we can see a spike(峰值) at f. If signal is sampled(采样) to form a discrete signal(离散信号), we get the same frequency domain, but is periodic(周期) in the range [−π,π] or [0,2π] (or [0,N] for N-point DFT). You can consider an image as a signal which is sampled in two directions. So taking fourier transform in both X and Y directions gives you the frequency representation of image.
More intuitively(直观地), for the sinusoidal signal, if the amplitude(振幅) varies so fast in short time, you can say it is a high frequency signal(高频信号). If it varies slowly, it is a low frequency signal. You can extend the same idea to images. Where does the amplitude varies drastically(极大地) in images ?(图像的那个地方高频多) At the edge points, or noises. So we can say, edges and noises are high frequency contents in an image. If there is no much changes in amplitude, it is a low frequency component.②、Fourier Transform in Numpy
规律:中间白色---低频分量,四周黑色---高频分量
more whiter region at the center showing low frequency content is more.
在频率域,对图像做一些处理,然后在反变换为图片
So you found the frequency transform Now you can do some operations in frequency domain, like high pass filtering and reconstruct the image,
因为代码将频率图的中心区域设为0,即过滤掉了低频,保留高频,即高通滤波,得到图像的边缘,轮廓,并且有结论,图像的大部分信息在低频
The result shows High Pass Filtering is an edge detection operation.
This also shows that most of the image data is present in the Low frequency region of the spectrum③、Fourier Transform in OpenCV
消除低频
同样的,图像的细节处,边缘,轮廓模糊了④、Performance Optimization of DFT
So if you are worried about the performance of your code, you can modify the size of the array to any optimal size (by padding zeros) before finding DFT.
改变序列的形状,加快离散傅里叶变换的速度
So how do we find this optimal size ? OpenCV provides a function, cv.getOptimalDFTSize() for this.⑤、Why Laplacian is a High Pass Filter?
官方的结果图,看出Laplacian 的频率图中低频暗,即低频为0,删除低频,保留高频,高通滤波
同理,高斯滤波是低通滤波
Template Matching in OpenCV
其中一张图,可以看到矩形的左上角位置处,像素的亮度很高②、Template Matching with Multiple Objects
①、Theory
A line can be represented as y=mx+c or in a parametric form, as ρ=xcosθ+ysinθ where ρ is the perpendicular(垂直的) distance from the origin(原点) to the line, and θ is the angle formed by this perpendicular line and the horizontal (水平的)axis measured in counter-clockwise (That direction varies on how you represent the coordinate system. This representation is used in OpenCV). Check the image below:
So if the line is passing below the origin(直线在原点的下面), it will have a positive rho(n. 希腊字母表的第17个字母 (P, ρ)) and an angle less than 180. If it is going above the origin(线在原点上方), instead of taking an angle greater than 180, the angle is taken less than 180, and rho is taken negative. Any vertical(垂直的) line will have 0 degree and horizontal(水平的) lines will have 90 degree.
Now let's see how the Hough Transform works for lines. Any line can be represented in these two terms, (ρ,θ). So first it creates a 2D array or accumulator(存储的东西) (to hold the values of the two parameters) and it is set to 0 initially. Let rows denote the ρ and columns denote the θ. Size of array depends on the accuracy you need. Suppose you want the accuracy of angles to be 1 degree, you will need 180 columns. For ρ, the maximum distance possible is the diagonal length of the image. So taking one pixel accuracy, the number of rows can be the diagonal(对角) length of the image.
Consider a 100x100 image with a horizontal line at the middle. Take the first point of the line. You know its (x,y) values. Now in the line equation, put the values θ=0,1,2,....,180 and check the ρ you get. For every (ρ,θ) pair, you increment(增加) value by one in our accumulator in its corresponding (ρ,θ) cells. So now in accumulator, the cell (50,90) = 1 along with some other cells.
Now take the second point on the line. Do the same as above. Increment the values in the cells corresponding to (rho, theta) you got. This time, the cell (50,90) = 2. What you actually do is voting the (ρ,θ) values. You continue this process for every point on the line. At each point, the cell (50,90) will be incremented or voted up, while other cells may or may not be voted up. This way, at the end, the cell (50,90) will have maximum votes. So if you search the accumulator for maximum votes, you get the value (50,90) which says, there is a line in this image at a distance 50 from the origin and at angle 90 degrees.
抽象啊。。。②、Hough Transform in OpenCV
③、Probabilistic(概率) Hough Transform
an optimization
①、Theory
Any grayscale image can be viewed as a topographic(地形)surface where high intensity denotes peaks and hills while low intensity denotes valleys. You start filling every isolated(孤立的) valleys (local minima) with different colored water (labels). As the water rises, depending on the peaks (gradients) nearby, water from different valleys, obviously with different colors will start to merge. To avoid that, you build barriers in the locations where water merges. You continue the work of filling water and building barriers until all the peaks are under water. Then the barriers you created gives you the segmentation result. This is the "philosophy" behind the watershed. You can visit the CMM webpage on watershed to understand it with the help of some animations.
告诉了算法的理念,自己的理解如下:
一张灰度图可以当做一个具有不同的地形的区域,灰度值高的当做peak,灰度值低的当做valley,在分开的valley 处加不同颜色的水,不断加水,并在peak处建立barrier来挡水,直到水位高于peak,这样根据barrier就可以分出不同的peak,就是segmention of the image
But this approach gives you oversegmented result due to noise or any other irregularities in the image. So OpenCV implemented a marker-based watershed algorithm where you specify which are all valley points are to be merged and which are not. It is an interactive(交互式的) image segmentation. What we do is to give different labels for our object we know.②、Code
①、Theory
How it works from user point of view ? Initially user draws a rectangle around the foreground region (foreground region should be completely inside the rectangle). Then algorithm segments it iteratively to get the best result. Done. But in some cases, the segmentation won't be fine, like, it may have marked some foreground region as background and vice versa. In that case, user need to do fine touch-ups. Just give some strokes on the images where some faulty results are there. Strokes basically says
See the image below. First player and football is enclosed in a blue rectangle. Then some final touchups with white strokes (denoting foreground) and black strokes (denoting background) is made. And we get a nice result.
So what happens in background ?
User inputs the rectangle. Everything outside this rectangle will be taken as sure background (That is the reason it is mentioned before that your rectangle should include all the objects). Everything inside rectangle is unknown. Similarly any user input specifying foreground and background are considered as hard-labelling which means they won't change in the process.
Computer does an initial labelling depending on the data we gave. It labels the foreground and background pixels (or it hard-labels) 标签分为两类,前景和背景
Now a Gaussian Mixture Model(GMM) is used to model the foreground and background.
Depending on the data we gave, GMM learns and create new pixel distribution. That is, the unknown pixels are labelled either probable foreground or probable background depending on its relation with the other hard-labelled pixels in terms of color statistics (It is just like clustering).
A graph is built from this pixel distribution. Nodes in the graphs are pixels. Additional two nodes are added, Source node and Sink node. Every foreground pixel is connected to Source node and every background pixel is connected to Sink node.GMM会根据给的数据得到一张表,有三个Node,前景对应Source Node
The weights of edges connecting pixels to source node/end node are defined by the probability of a pixel being foreground/background. The weights between the pixels are defined by the edge information or pixel similarity. If there is a large difference in pixel color, the edge between them will get a low weight.The weights of edges 的确定
Then a mincut algorithm is used to segment the graph. It cuts the graph into two separating source node and sink node with minimum cost function. The cost function is the sum of all weights of the edges that are cut. After the cut, all the pixels connected to Source node become foreground and those connected to Sink node become background. 一个算法将graph 分为两部分,再转为前景和背景
The process is continued until the classification converges(收敛).
②、Demo,lableme(做错了)
插播以下怎么自己lable mask
conda install labelme
打开 在condaprompt
好像不是这么添加的,笑了。w 。
太麻烦了,不太会用photoshop
特征十分重要,我们在一张图片中找到features ,就可以在别的图片中也能找到同样的features。①、Theory
He took this simple idea to a mathematical form. It basically finds the difference in intensity for a displacement of (u,v) in all directions. This is expressed as below:
The window function is either a rectangular window or a Gaussian window which gives weights to pixels underneath.(either or ---二者择一)
We have to maximize this function E(u,v) for corner detection. That means we have to maximize the second term. (权重w 后面的参数) After using some mathmatical steps, we can get the final equation as:
Here, Ix and Iy are image derivatives(衍生物) in x and y directions respectively. (These can be easily found using cv.Sobel()).
Then comes the main part. After this, they created a score, basically an equation, which determines if a window can contain a corner or not.
So the magnitudes(大小) of these eigenvalues decide whether a region is a corner, an edge, or flat.
So the result of Harris Corner Detection is a grayscale image with these scores. Thresholding for a suitable score gives you the corners in the image. We will do it with a simple image.
③、Corner with SubPixel Accuracy
Sometimes, you may need to find the corners with maximum accuracy. OpenCV comes with a function cv.cornerSubPix() which further refines(完善) the corners detected with sub-pixel accuracy(亚像素精度). Below is an example. As usual, we need to find the Harris corners first. Then we pass the centroids(重心,形心) of these corners (There may be a bunch of pixels at a corner, we take their centroid) to refine them. Harris corners are marked in red pixels and refined corners are marked in green pixels.
这是用上个代码的未完善的情况,有些角没标出来
完善后,之前没标的,多标的,都表示出来了
就是在上个coner detect 的算法的改进
This function is more appropriate for tracking(跟踪). We will see that when its time comes.
scale -- 尺度 invariant -- 不变换
①、theory
In last couple of chapters, we saw some corner detectors like Harris etc. They are rotation-invariant(旋转不变换), which means, even if the image is rotated, we can find the same corners. It is obvious because corners remain corners in rotated image also. But what about scaling? A corner may not be a corner if the image is scaled. For example, check a simple image below. A corner in a small image within a small window is flat when it is zoomed in the same window. So Harris corner is not scale invariant.
就是在小的图片里,矩形区域存在一个coner ,大的时候,在矩形区域就没有coner了
In 2004, D.Lowe, University of British Columbia, came up with a new algorithm, Scale Invariant Feature Transform (SIFT) in his paper.
There are mainly four steps involved in SIFT algorithm. We will see them one-by-one.
1. Scale-space Extrema Detection
scale-space filtering is used. In it, Laplacian of Gaussian(LoG) is found for the image with various σ values. LoG acts as a blob(团) detector which detects blobs in various sizes due to change in σ. In short, σ acts as a scaling parameter. For eg, in the above image, gaussian kernel with low σ gives high value for small corner while gaussian kernel with high σ fits well for larger corner. So, we can find the local maxima across the scale and space which gives us a list of (x,y,σ) values which means there is a potential keypoint at (x,y) at σ scale.
But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an approximation of LoG. Difference of Gaussian is obtained(获取) as the difference of Gaussian blurring of an image with two different σ, let it be σ and kσ. This process is done for different octaves of the image in Gaussian Pyramid. It is represented in below image:
Once this DoG are found, images are searched for local extrema over scale and space. For eg, one pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that keypoint is best represented(代替) in that scale. It is shown in below image:
Regarding different parameters, the paper gives some empirical data which can be summarized as, number of octaves = 4, number of scale levels = 5, initial σ=1.6, k=2–√ etc as optimal values.
2. Keypoint Localization
Once potential keypoints locations are found, they have to be refined to get more accurate results. They used Taylor series expansion of scale space to get more accurate location of extrema, and if the intensity at this extrema is less than a threshold value
DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute(计算) the principal curvature(主曲率). We know from Harris corner detector that for edges, one eigen value is larger than the other. So here they used a simple function.
If this ratio is greater than a threshold, called edgeThreshold in OpenCV, that keypoint is discarded(被丢弃的). It is given as 10 in paper.
So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest points.
3. Orientation Assignment(指定方向)
Now an orientation is assigned to each keypoint to achieve invariance(不变性) to image rotation(实现旋转不变性). A neighbourhood is taken around the keypoint location depending on the scale, and the gradient magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created (It is weighted by gradient magnitude and gaussian-weighted circular window with σ equal to 1.5 times the scale of keypoint). The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints with same location and scale, but different directions. It contribute to stability of matching.
4. Keypoint Descriptor
Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is divided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor.
5. Keypoint Matching
Keypoints between two images are matched by identifying their nearest neighbours. But in some cases, the second closest-match may be very near to the first. It may happen due to noise or some other reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is greater than 0.8, they are rejected. It eliminates around 90% of false matches while discards only 5% correct matches, as per the paper.
This is a summary of SIFT algorithm. For more details and understanding, reading the original paper is highly recommended.②、SIFT in OpenCV
图像中标出了keypoints,即这张图片的特征
总之
什么是SIFT算法 尺度不变特征转换(SIFT, Scale Invariant Feature Transform)是图像处理领域中的一种局部特征描述算法.
it is a speeded-up version of SIFT.①、theory
In SIFT, Lowe approximated Laplacian of Gaussian with Difference of Gaussian for finding scale-space. SURF goes a little further and approximates LoG with Box Filter(滤波器). Below image shows a demonstration(演示) of such an approximation. One big advantage of this approximation is that, convolution with box filter can be easily calculated with the help of integral(积分) images. And it can be done in parallel(平行) for different scales. Also the SURF rely on determinant(行列式) of Hessian matrix
SURF uses wavelet responses in horizontal and vertical direction for a neighbourhood of size 6s. Adequate gaussian weights are also applied to it. Then they are plotted in a space as given in below image. The dominant orientation is estimated by calculating the sum of all responses within a sliding orientation window of angle 60 degrees. Interesting thing is that, wavelet response can be found out using integral images very easily at any scale. For many applications, rotation invariance is not required, so no need of finding this orientation, which speeds up the process. SURF provides such a functionality called Upright-SURF or U-SURF. It improves speed and is robust upto ±15∘. OpenCV supports both, depending upon the flag, upright. If it is 0, orientation is calculated. If it is 1, orientation is not calculated and it is faster.
This when represented as a vector gives SURF feature descriptor with total 64 dimensions. Lower the dimension, higher the speed of computation and matching, but provide better distinctiveness of features.
For more distinctiveness(特殊性), SURF feature descriptor has an extended 128 dimension version. The sums of dx and |dx| are computed separately for dy<0 and dy≥0. Similarly, the sums of dy and |dy| are split up according to the sign of dx , thereby doubling the number of features. It doesn't add much computation complexity. OpenCV supports both by setting the value of flag extended with 0 and 1 for 64-dim and 128-dim respectively (default is 128-dim)
Another important improvement is the use of sign of Laplacian (trace of Hessian Matrix) for underlying interest point. It adds no computation cost since it is already computed during detection. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse situation. In the matching stage, we only compare features if they have the same type of contrast (as shown in image below). This minimal information allows for faster matching, without reducing the descriptor's performance.
In short, SURF adds a lot of features to improve the speed in every step. Analysis shows it is 3 times faster than SIFT while performance is comparable to SIFT. SURF is good at handling images with blurring and rotation, but not good at handling viewpoint change and illumination change. ②、SURF in OpenCV
存在版本问题用不了再说吧‘
’
①、theory
We saw several feature detectors and many of them are really good. But when looking from a real-time application point of view, they are not fast enough. One best example would be SLAM (Simultaneous Localization and Mapping) mobile robot which have limited computational resources.
As a solution to this, FAST
Feature Detection using FAST
Select a pixel p in the image which is to be identified as an interest point or not. Let its intensity be Ip.
Select appropriate threshold value t.
Consider a circle of 16 pixels around the pixel
Now the pixel p is a corner if there exists a set of n contiguous pixels(相邻像素) in the circle (of 16 pixels) which are all brighter than Ip+t, or all darker than Ip−t. (Shown as white dash lines in the above image). n was chosen to be 12.
A high-speed test was proposed to exclude a large number ozf non-corners. This test examines only the four pixels at 1, 9, 5 and 13 (First 1 and 9 are tested if they are too brighter or darker. If so, then checks 5 and 13). If p is a corner, then at least three of these must all be brighter than Ip+t or darker than Ip−t. If neither of these is the case, then p cannot be a corner. The full segment test criterion can then be applied to the passed candidates by examining all pixels in the circle. This detector in itself exhibits high performance, but there are several weaknesses:
First 3 points are addressed with a machine learning approach(上面的四个缺陷,前三个可以用机器学习的方法解决). Last one is addressed using non-maximal suppression(非极大值抑制).
Machine Learning a Corner Detector
1、Select a set of images for training (preferably from the target application domain)
2、Run FAST algorithm in every images to find feature points.
3、For every feature point, store the 16 pixels around it as a vector. Do it for all the images to get feature vector P.
4、Each pixel (say x) in these 16 pixels can have one of the following three states:
1、Depending on these states, the feature vector P is subdivided into 3 subsets, Pd, Ps, Pb.
2、Define a new boolean variable, Kp, which is true if p is a corner and false otherwise.
3、Use the ID3 algorithm (decision tree classifier,决策树) to query each subset(子集) using the variable Kp for the knowledge about the true class. It selects the x which yields the most information about whether the candidate pixel is a corner, measured by the entropy(熵) of Kp.
4、This is recursively applied to all the subsets until its entropy is zero.
5、The decision tree so created is used for fast detection in other images.
Non-maximal Suppression
Detecting multiple interest points in adjacent locations is another problem. It is solved by using Non-maximum Suppression.
1、Compute(计算) a score function, V for all the detected feature points. V is the sum of absolute difference between p and 16 surrounding pixels values.
2、Consider two adjacent(相邻的) keypoints and compute their V values.
3、Discard
Summary
It is several times faster than other existing corner detectors.
But it is not robust to high levels of noise. It is dependent on a threshold.(依赖于阈值)②、FAST Feature Detector in OpenCV
不使用非极大值抑制,可以看到特征点的数量极大的增加
①、theory
We know SIFT uses 128-dim vector for descriptors. Since it is using floating point numbers, it takes basically 512 bytes. Similarly SURF also takes minimum of 256 bytes (for 64-dim). Creating such a vector for thousands of features takes a lot of memory which are not feasible(可行的) for resource-constraint applications especially for embedded systems(嵌入式). Larger the memory, longer the time it takes for matching.
But all these dimensions may not be needed for actual matching. We can compress it using several methods like PCA(主成分分析), LDA etc. Even other methods like hashing using LSH (Locality Sensitive Hashing) is used to convert(转换) these SIFT descriptors in floating point numbers to binary strings(二进制字符串). These binary strings are used to match features using Hamming distance. This provides better speed-up because finding hamming distance is just applying XOR and bit count, which are very fast in modern CPUs with SSE instructions. But here, we need to find the descriptors first, then only we can apply hashing, which doesn't solve our initial problem on memory.
BRIEF comes into picture at this moment. It provides a shortcut to find the binary strings directly without finding descriptors. It takes smoothened image patch and selects a set of nd (x,y) location pairs in an unique way (explained in paper). Then some pixel intensity comparisons are done on these location pairs. For eg, let first location pairs be p and q. If I(p)<I(q), then its result is 1, else it is 0. This is applied for all the nd location pairs to get a nd-dimensional bitstring.
This nd
One important point is that BRIEF is a feature descriptor, it doesn't provide any method to find the features. So you will have to use any other feature detectors like SIFT, SURF etc. The paper recommends to use CenSurE which is a fast detector and BRIEF works even slightly better for CenSurE points than for SURF points.
In short, BRIEF is a faster method feature descriptor calculation and matching.②、BRIEF in OpenCV
matching, which will be done in another chapter.
①、theory英雄联盟
As an OpenCV enthusiast(爱好者), the most important thing about the ORB is that it came from "OpenCV Labs". This algorithm was brought up by Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary R. Bradski in their paper ORB: An efficient alternative to SIFT or SURF
ORB is basically a fusion(融合) of FAST keypoint detector and BRIEF descriptor with many modifications to enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to find top N points among them. It also use pyramid to produce multiscale-features. But one problem is that, FAST doesn't compute the orientation. So what about rotation invariance? Authors came up with following modification.
It computes the intensity weighted centroid(重心) of the patch with located corner at center. The direction(方向) of the vector from this corner point to centroid gives the orientation(方向,定位). To improve the rotation invariance, moments are computed with x and y which should be in a circular region of radius r, where r is the size of the patch.
Now for descriptors, ORB use BRIEF descriptors. But we have already seen that BRIEF performs poorly with rotation. So what ORB does is to "steer" BRIEF
ORB discretize(离散化) the angle to increments(增量) of 2π/30 (12 degrees), and construct a lookup table of precomputed BRIEF patterns. As long as the keypoint orientation θ is consistent across views, the correct set of points Sθ will be used to compute its descriptor.
BRIEF has an important property that each bit feature has a large variance and a mean near 0.5. But once it is oriented along keypoint direction(指向关键点方向), it loses this property and become more distributed(分布式). High variance makes a feature more discriminative, since it responds differentially to inputs. Another desirable property is to have the tests uncorrelated, since then each test will contribute to the result. To resolve all these, ORB runs a greedy search among all possible binary tests to find the ones that have both high variance and means close to 0.5, as well as being uncorrelated. The result is called rBRIEF.
For descriptor matching, multi-probe LSH which improves on the traditional LSH, is used. The paper says ORB is much faster than SURF and SIFT and ORB descriptor works better than SURF. ORB is a good choice in low-power devices(一些嵌入式设备) for panorama stitching(全景拼接)
网图的orb 的匹配结果
②、ORB in OpenCV
ORB feature matching,
检测两个物体的相似程度①、Basics of Brute-Force Matcher
Brute-Force(蛮力) matcher is simple. It takes the descriptor of one feature in first set and is matched with all other features in second set using some distance calculation. And the closest one is returned.
SIFT Descriptors
②、FLANN based Matcher
FLANN stands for(代表) Fast Library for Approximate Nearest Neighbors. It contains a collection of algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional features. It works faster than BFMatcher
①、Basics
So what we did in last session? We used a queryImage, found some feature points in it, we took another trainImage, found the features in that image too and we found the best matches among them. In short, we found locations of some parts of an object in another cluttered image(另一个杂乱的图像). This information is sufficient to find the object exactly on the trainImage.②、code
Copyright © 2024 虎扑电竞 - 最电竞的世界 - LOL虎扑社区 版权所有.
本文仅代表作者观点,不代表虎扑电竞 - 最电竞的世界 - LOL虎扑社区。
本文系作者授权虎扑电竞发表,未经许可,不得转载。