A Study on Verification of CCTV Image Data through Unsupervised Learning Model of Deep Learning

: Abnormal behavior is called an abnormal behavior that deviates from the same normal standard as the average. The installation of public CCTVs to prevent crimes is increasing, but the crime rate is rather increasing recently. In line with this situation, artificial intelligence research using deep learning that automatically finds abnormal behavior in CCTV is increasing. Deep learning is a type of artificial intelligence designed based on artificial neural networks, and the quality of learning data is important for high accuracy in the development of artificial intelligence through deep learning. This paper verifies whether learning data for abnormal behavior detection is suitable as learning data which is being constructed using an MPED - RNN model for binary classification to determine whether there is an abnormal behavior by frame using skeleton data of a person based on an autoencoder. As a result of the experiment, the unsupervised learning - based MPED - RNN model used in this paper is not suitable for verifying images with a similar number of frames with and without abnormal behavior, such as the corresponding data, and it is judged that appropriate results can be derived only when verified with a supervised learning - based model.


INTRODUCTION
Abnormal behavior is called an abnormal behavior that deviates from the same normal standard as the average. The installation of public CCTVs (CCTV -close-circuit television) to prevent crimes is increasing, but the crime rate is rather increasing recently. In line with this situation, artificial intelligence research using deep learning that automatically finds abnormal behavior in CCTV is increasing. Deep learning is a type of artificial intelligence designed based on artificial neural networks, and the quality of learning data is important for high accuracy in the development of artificial intelligence through deep learning [4,5,12,15]. This paper verifies whether learning data for abnormal behavior detection is suitable as learning data which is being constructed using an MPED-RNN model for binary classification to determine whether there is an abnormal behavior by frame using skeleton data of a person based on an autoencoder. As a result of the experiment, the unsupervised learning-based MPED-RNN model used in this paper is not suitable for verifying images with a similar number of frames with and without abnormal behavior, such as the corresponding data, and it is judged that appropriate results can be derived only when verified with a supervised learning-based model [8,10,11].

RELATED STUDIES 2.1 Autoencoder
The autoencoder is an unsupervised deep learning, consisting of two structures: encoder and decoder. The autoencoder learns through the process of minimizing the difference between the original data and the restored data after encoding the input data and restoring it again through the decoder [1,2,13]. Fig. 1 shows the structure of the autoencoder.

RNN (Recurrent Neural Network)
RNN is a neural network that continuously uses the information of the previous step while repeating itself. We use historical information as a loop structure to improve the performance of neural networks on current inputs [6,7,9].

MPED-RNN Model
The MPED-RNN model is an autoencoder-based anomaly detection model with skeleton data input. The encoder-decoder has a repeated structure and features a temporal and spatial pattern of the skeleton trajectory [14,16,19,20,21].  The MPED-RNN Model learns by dividing the model's input, skeleton data, into global body movement which is information about large movements with little shape, size, and deformation, and local body posture which is information about fine movements such as internal deformation of skeleton movements. When an irregular pattern occurs during learning, the frame in which the pattern occurs is classified as an abnormal behavior.

AUROC (Area under the ROC Curve)
In the MPED-RNN model, the default output evaluation index is AUROC. AUROC represents the area under the ROC curve, a graph that corresponds to the vertical and horizontal axes of the True Positive Rate (TPR), which is the ratio that accurately predicted the normal, and false positive rate (FPR), which is the ratio that incorrectly predicted the normal. Fig. 4 shows the ROC curve. [3,6,7].

Unsupervised Learning
As a kind of machine learning, it falls into the category of problems that determine how data is composed. Unlike supervised learning or reinforcement learning, this method is not given a target value for the input [2,11,18,22,24].

VERIFICATION OF LEARNING DATA WITH MPED-RNN
MPED-RNN, an anomaly detection model, uses skeleton data for each person in the video as learning data, and evaluation is conducted using skeleton data and a frame-level mask that expresses which frame the anomaly behavior occurred. Fig. 6 is a data verification system for learning conducted in this paper.

Preprocessing of Skeleton Data
In order to preprocess image data among learning data as skeleton information used as learning data in the MPED-RNN model, skeleton data was first extracted from the image. The extracted skeleton data is a JSON (JavaScript Object Notation) file with a frame number, a person number, and a person's joint coordinates as shown in Fig. 7. Fig. 7 shows the extracted skeleton data. The input data of the MPED-RNN model is a csv file representing the trajectory of skeleton data for each person. Therefore, the extracted skeleton data were used to divide the files by person, and the frame in which the person appeared in each file and the coordinates of the 17 joints observed in the frame were stored in the form of a csv file. Fig. 8 shows a preprocessed skeleton file.

Preprocessing Evaluation Data
The frame_level_mask file used to evaluate abnormal behavior classification in the MPED-RNN model is a binary file that expresses 0 and 1 with and without abnormal behavior by frame. In order to produce a frame_level_mask of learning data, a start_frame_index in which abnormal behavior begins and an ends_frame_index in which the abnormal behavior ends was extracted from the annotation file provided with the learning image data to produce a binary file with the information. Fig. 9 shows a data annotation file for learning, and Fig. 10 shows a generated frame-level mask.

Learning
Learning was conducted using the generated skeleton data as an input to an unsupervised learning model. 454 images out of a total of 572 images were used as learning data. Fig. 11 shows part of the learning data. Fig. 12 shows the learning settings. Epoch proceeded to 20. One epoch refers to the forward pass/backward pass process for the entire data in an artificial neural network. In other words, in the model, a total of 20 learning are conducted on the entire data.
In the learning process, if the video is put in, it is separated for each frame and clustering is performed by grouping frames with similar skeleton values. If most frames have similar skeleton values, but there are frames with different skeleton values than other frames, we classify the frames as abnormal behavior and proceed with learning.

Evaluation
Since the learning model of this paper is unsupervised learning, the intermediate result before AUROC output is predicted by frame. TPR(true positive rate) and FPR(false positive rate) were calculated based on an arbitrary classification point with the predicted value, and AUROC, the lower area of the green ROC curve, was output.
AUPR, the lower area of the precision-replay graph, was output with precision, which is the actual normal ratio among frames predicted to be normal, and reproduction, which is the normal ratio among frames predicted to be normal. Fig. 13 shows an example of a predicted value for each frame and an output result.

Figure 13 Predicted value for each frame and an output result
The model's evaluation method uses learned weights to quantify abnormal behavior for each frame, and then classify abnormal behavior using clustered values based on arbitrarily determined values. It is a method of extracting accuracy by comparing the classified binary file with the frame level mask, which is an answer binary file input by the user.

EXPERIMENTAL RESULTS AND ANALYSIS
The verification of learning data was conducted in a Geforce RTX 2080 environment with about 11 GB of memory. Fig. 14 shows a data source image for learning.  Fig. 15 shows the verification results of learning data. The model's input data uses 17 joint coordinates, but 13 of the joint coordinates extracted from learning data were available. Therefore, as if the four uninputed joint coordinates were not observed, it is the result of entering the joint coordinates at the location most similar to the result of the learning by entering '0'.

AUROC Results
Looking at the AUROC value, it was about 0.6. The value of AUROC in a binary classifier is from 0.5 to 1.0, and the binary classifier must have at least an AUROC value of 0.8 in order for it to be useful.

Added Evaluation Index
In addition to AUROC and AUPR, which are essentially provided evaluation indicators in the MPED-RNN model, the numerical values of classification points were changed to find the optimal classification points for the specific section with the highest accuracy, and output the maximum accuracy. As can be seen from Fig. 16, it can be seen that the AUROC of the learning data is about 0.66 and the optimal classification point accuracy is about 0.65. The learning data shows lower accuracy than the HR-Avenue data used as the performance evaluation of the model. However, this is not the low quality of the image, but in the case of the MPED-RNN model, which finds irregular patterns in the image and classifies them as abnormal behavior, since it is based on an autoencoder that performs unsupervised learning, if the number of frames in which abnormal behavior occurs in the learning data is similar to the number of frames in which abnormal behavior does not occur, the accuracy is lowered, and the accuracy of the learning data is lowered.

Abnormal Behavior Evaluation Results
Fig . 17 shows the evaluation results of child abuse, home invasion, theft, and vehicle theft learning data, respectively.
It was confirmed that the evaluation results were very low in the case of residential intrusion and vehicle theft abnormal behavior with more than 50% of the total number of frames of the learning video. Therefore, it is judged that the learning data is not suitable because the MPED-RNN model shows very low accuracy.

CONCLUSION AND FUTURE RESEARCH
This paper verified whether learning data for abnormal behavior detection is suitable as learning data through the MPED-RNN model. Due to the nature of the data, the accuracy was not high in the unsupervised learning-based MPED-RNN model, but it is judged as valid learning data in supervised learning-based models because the frame of precursor and abnormal behavior is clear and skeleton data extraction is accurate.
Currently, artificial intelligence technology is a technology that attracts attention among the 4th industrial revolution, and active research is being conducted, and many companies are trying to use it in industrial sites. However, since high-quality learning data for artificial intelligence development is difficult and difficult to build, it is believed that more learning data led by highly reliable national institutions can promote the development of artificial intelligence technology and popularization of artificial intelligence technology.
In the future, we will continue to conduct research on technology that verifies the data with other models based on supervised learning and applies abnormal behavior detection technology to public CCTVs.