Harr feature plus cascade classifier target detection system 1. Recognition system architecture 2. Training method 3. Acceleration method 4. Code practice reference

Harr feature plus cascade classifier target detection system 1. Recognition system architecture 2. Training method 3. Acceleration method 4. Code practice reference

1. Identify the system architecture


The above is the architecture diagram of the recognition system of Harr feature + cascade classifier. The system is divided into the following parts:

  • Sliding box: a fixed-size box that slides on the original image to obtain sub-images
  • Harr feature extractor: extract the specified four Harr features on the sub-image (there are many features)
  • Cascade classifier: based on selected features, classify and filter out positive examples

For the target recognizer, the target detection problem is converted into a target classification problem: the sliding box slides on the original image, and the recognition part recognizes each sliding sub-image, and judges whether it is a target that needs to be recognized.

1.1.Harr features

The Harr feature is a very simple feature. There are four boxes as shown in the figure below. The size of these four boxes is variable. The sum of the pixels covered by the black part minus the sum of the pixels covered by the white part is Harr. Features: Harr(x) =/sum pic_{black}(x)-\sum pic_{white}(x)


For example, the following picture shows:


Take the first 4x4 area data to calculate the second type of harr feature, the sum of the area covered by black is 3, and the sum of the area covered by white is 2, so the harr feature based on the sliding box under the second template is 3 2=1. For a template based on a certain template, a candidate frame can take multiple features. For example, for a 24x24 sliding frame, based on the first template (diagonal template), features can be taken in multiple sizes such as 2x2, 3x3,..., 24x24 For 2x2 features, 23x23 features can also be taken in a 24x24 frame. Therefore, the number of harr features in each sliding frame is massive. In the original paper, a 20x20 sliding frame is used, and each sliding frame has About 18k feature values.

1.2. Cascade classifier

Due to the excessive number of Harr features, it has almost exceeded the limit of the number of input features of any machine learning algorithm (2001), so it is not practical to train a classifier directly, so multiple weak classifiers are used to form a strong classifier Method training. In this system, each weak classifier is only for a single feature: h_j(x) =/begin{cases} 1 & f_j(x) <\theta_j/0 & other/end{cases} The cascade classification The classifier is trained using the AdaBoost method. The classifier is trained while also filtering the features. The final classifier has the same number of features as the number of features used (each classifier uses only one feature). The final classifier is: h(x) =/begin{cases} 1 &/sum_\limits{t=1}^{T}a_th_t(x)/geq/frac{1}{2}\sum\limits^ T_{t=1}a_t/0 & other/end{cases} T is the number of cascaded classifiers and also the number of selected features. The features not used by the cascaded classifier can be calculated without calculating Harr features. Reduce the amount of calculation; a_t is the weight of a single classifier, which is obtained during the training process.

2. Training method

The part that needs to be trained is the cascade classifier. Since each weak classifier uses only one feature, the parameter of each weak classifier is the threshold/theta_j. The training algorithm is shown in the figure below:


1. initialize the sample weight w_{1,i} =/begin{cases}\frac{1}{2m} & y_i = 0\\frac{1}{2l} & y_i = 1\end{cases}, Among them, y_i is the label of the current sample, 1 means positive examples; m and l are the number of negative examples and positive examples. After entering the training loop, for each iteration:

  1. First standardize the sample weight w_{t,i} =/cfrac{w_{t,i}}{\sum^n_{j=1}w_{t,j}}
  2. The weak classifier h(x) is trained according to each feature. During the training process, the cost function is related to the sample weight. The cost function is/epsilon_j =/sum_iw_i|h_j(x_i)-y_i|.
  3. After the training of the weak classifier corresponding to all the features is completed, the classifier with the lowest cost function and the corresponding feature are selected, and the feature is removed from the features to be selected.
  4. Last updated sample weight: w_{t+1,i} = w_{t,i}\beta_t^{1-e_i}, where e_i =/begin{cases}1 & classifid/correctly/0 & otherwise/end {cases} ;\beta_t =/cfrac{\epsilon_t }{1-\epsilon_t}

Finally, the classifier h(x) and the weight of each classifier a_t = log\cfrac{1}{\beta_t} are obtained.

3. Acceleration method

In order to achieve a faster detection speed, the system proposes acceleration schemes for calculating Harr features and cascade classifiers.

3.1. Points graph

The integral map is used to accelerate the calculation of Harr features. The method is to generate a map with the same size as the original image, using the following formula: ii(x) =/sum\limits_{x'/leq x,y'/leq y}{i (x',y')} As shown in the figure below, the data of the integral map is the sum of all pixels covered by the diagonal line between the corresponding position of the picture and the upper left corner of the picture. In the process of generating the calculation graph by row, the value of each position can be obtained by adding the data above the calculation graph and the accumulation before this row and the value of the position, so the generation of the calculation graph is relatively simple.


When calculating the harr feature, it is necessary to calculate a large number of pixel sums of a certain area. Based on the integral map, if you want to calculate the sum of the following calculation areas, you only need to calculate: AB-C+D, where ABCD is the value of the corresponding position of the integral map. Therefore, the summation of any rectangular area can be completed by 3 addition and subtraction calculations, which effectively accelerates the extraction speed of Harr features.


3.2. Cascade calculation

In the basic cascade classifier, all required Harr features need to be calculated. Although the learning algorithm has been used to filter, the number of features is still large. Based on most of the subgraphs that do not need to be identified, the cascade method is proposed:

  • When training the screening classifier, instead of choosing the classifier with the smallest error, the classifier that divides the positive examples into negative examples is selected, that is, the classifier with the highest recall rate. And the sample set to be calculated next time is the sample set that uses this classifier to eliminate counterexamples.
  • At runtime, the feature-classification is calculated sequentially. When a sample is recognized as a counterexample by a classifier, the sample is directly rejected, and subsequent features and classifications can not be calculated.

4. Code practice

4.1. Use the built-in cascade classifier

OpenCV comes with some cascaded classifiers, which can be used to recognize faces, facial features and human bodies, etc. The method of use in Python is as follows:

face_cascade = cv2.CascadeClassifier("./haarcascades/haarcascade_frontalface_alt2.xml")
faces = face_cascade.detectMultiScale(
        gray, scaleFactor=1.3, minNeighbors=2, minSize=(60, 60), maxSize=(300, 300))

First call to cv2.CascadeClassifier()open a cascade classifier. The xml loaded here is the face recognition cascade classifier that comes with OpenCV. Then the .detectMultiScale()method is called for recognition. The meaning of the parameters is:

  • The first parameter image: the image to be recognized, which must be a grayscale image (channel=1)
  • scaleFactor: The scale change of the detected object, the reasonable range is 1.1~1.4, the larger the parameter, the more detailed the detection and the slower the speed
  • minNeighbors: How many fields need to be maintained for each candidate box, the larger the parameter, the more difficult it is for a candidate box to be accepted
  • minSize and maxSize: the minimum size and maximum size of the target, when the target exceeds this range, it cannot be recognized

This function returns a list, where each element is a list with 4 elements, respectively [x, y, w, h], which can be directly used to draw a rectangular box.

4.2. Train the cascade classifier

Choose the FDDB dataset to train a cascaded classifier for faces

4.2.1. Processing tags

The labeling method of FDDB is elliptical labeling, which provides the center, major and minor axis and angle information of the ellipse. The original label is <major_axis_radius minor_axis_radius angle center_x center_y detection_score>. 1. the label must be converted to the format of <left_x top_y width height>. For simplicity, use the following formula: left\_x = clamp(center\_x-minor\_axis\_radius,0,-1)/top\_y = clamp(center\_y-major\_axis\_radius,0,-1 )/width = 2/times minor\_axis\_radius/height = 2/times major\_axis\_radius This formula simply converts the ellipse to a rectangle, clamp is the clamp function, and the input is limited to 0~-1, -1 means no limit. At the same time, the range of the restriction rectangle must be within the range of the picture. code show as below:

def FDDB2label(source_path, target_path):
    source_list = read_ellipseList(source_path)//Read the original label file
    target_list = change_label(source_list)//Convert label format
    save_rec_label(target_list, target_path)//Save label format

The part that converts the label is as follows:

def change_label(source):
    """source:list[[path,label],...],label:[major_axis_radius minor_axis_radius angle center_x center_y detection_score]"""
    result = []
    for name, label in source:
        name = name + ".jpg"
        data = [float(x) for x in label.replace(" ", '').split('')]
        data = [int(data[3]-data[1]), int(data[4]-data[0]),
                int(data[1] * 2), int(data[0] * 2)]
        data = check_label(name, data, root="../FDDB-folds/")
            [name, data])
    return result

The inspection part is as follows:

def check_label(name, data, root=""):
    img_shape = cv2.imread(os.path.join(root, name)).shape
    if data[0] <0:
        data[0] = 0
    if data[1] <0:
        data[1] = 0
    if data[0] + data[2]> img_shape[1]:
        data[2] = img_shape[1]-data[0]-1
    if data[1] + data[3]> img_shape[0]:
        data[3] = img_shape[0]-data[1]-1
    return data

There are two cases checked:

  • The coordinates of the upper left corner of the item are less than 0
  • The coordinates of the lower right corner of the item exceed the picture limit

4.2.2. Prepare documents

Before training, you need to prepare data, including positive and negative examples. Prepare a positive example

The positive example is opencv_createsamples.exegenerated by opencv . Note that the exe file cannot be run independently, so it cannot be copied and used. It depends on other files of OpenCV, so it must be called ( opencv\build\x64\vc14\bin\opencv_createsamples.exe) from OpenCV . This tool converts the positive example into a .vec file. There are mainly the following command line parameters:

  • -vec: The path of the output vec file
  • -info:The path of the positive description file
  • -num: Number of positive examples generated
  • -wAnd -h: the length and width of the positive image

Before use, you need to prepare a file describing the positive example file info.dat, the format is as follows:

FDDB-folds\2002\08\11\big\img_591.jpg 1 184 38 171 247 
FDDB-folds\2002\07\19\big\img_423.jpg 1 196 46 118 174 
FDDB-folds\2002\08\24\big\img_490.jpg 1 110 23 70 109 
<relative path> <number of targets n> <x,y,w,h of target 1> ... <x,y,w,h of target n>

Then use the tool to generate a positive example file pos.vec.

.\opencv\build\x64\vc14\bin\opencv_createsamples.exe -vec .\pos.vec -info info.dat -num 178 -w 40 -h 40 Prepare counterexamples

For counter-examples, counter-examples only need to prepare a list of files neg_list.dat:

<relative path>

4.2.3. Model training

Model training uses OpenCV opencv_traincascade.exe, the main parameters are as follows:

  • -data: The location where the classifier file is finally saved
  • -vec: The path of the positive example vec file
  • -bg: The path of the counter example file list
  • -numPosAnd -numNeg: the number of positive and negative examples
  • -numStages: The number of layers of the multi-layer classifier
  • -wAnd -h: The length and width of the positive example file must be the same as the corresponding length and width filled in when generating the sample

The command line parameters used this time are shown in the figure below:

.\opencv\build\x64\vc14\bin\opencv_traincascade.exe -data. -vec .\pos
.vec -bg .\neg_list.dat -numPos 178 -numNeg 200 -numStages 10 -w 40 -h 40

The final trained model will be saved in -data/cascade.xml.

4.2.4. Model testing

You can use the official test tool to opencv_visualisation.exetest, which will visualize the test process and print the type of classifier used. The command line parameters are as follows:

  • --image: Image path used for testing
  • --model: The model used for testing (.xml file)
  • --data: Path to save test results (optional)

The official examples are as follows:

.\opencv\build\x64\vc14\bin\opencv_visualisation --image=\data\object.png --model=\
data\model.xml --data=\data\result\


Theoretical part: Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//IEEE Computer Society Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, 2001:511.

Reference: https://cloud.tencent.com/developer/article/1156244 harr feature plus cascade classifier target detection system 1. Recognition system architecture 2. Training method 3. Acceleration method 4. Code practice reference-Cloud+ Community-Tencent Cloud