Project “Safe city”: opportunities and сore points |
|
Please, note that the article is automatically translated from Russian into English, so the translation may not be accurate. Filin K., Ptitsyn N., Konushin A. Do you know about the program "Safe City"? Many have heard something about it, but the real results of the program are not wide known. What is the effectiveness of the project, justified if he invested in it means? Not at all. Multiply failures introduction of video analytics in the "Safe City" talk about the large number of hidden problems and the unwillingness of the project in its current form to the application of advanced video analysis technology. Only an evolutionary development of the project "Safe City" from simple to complex may lead to the desired result. Many have already become clear that without the use of advanced video analytics technology (see The role of video analytics) for the establishment and operation of this magnitude can not do. And what must necessarily include the concept of video analytics for the needs of "Safe City"? Let's look at three groups of the most sought after in this case, video analysis algorithms:
Service DetectorsService detectors (watch video) automatically record cases of failure of cameras or lighting equipment for the permanent maintenance of CCTV systems in working condition. Detectors identify such problems as the disappearance of the video, a breakdown of self-tuning the diaphragm, the natural lens is dirty. Service detectors will automatically recognize the situation of sabotage on the part of man, for example, boom box or the closure of the lens, defocusing, changing the orientation of the camera and the glare sensor. These functions are sometimes referred to as the control of the scene. Some forms of sabotage, including installing a mirror in front of cameras, the operator has seen virtually impossible, especially if attention is spread across multiple screens situation center.
Fig. 1. Service detectors (loss of signal, obstruction, defocusing, illumination) End users who heard about the potential of computer programs for automatic recognition of human behavior, often underestimate the importance of suppliers detectors. In practice, service detectors are the most useful video analytics, as it does not require special settings and save significant resources to maintaining the system. Implement service quality detectors is not so easy because of the substantial variety of equipment and environmental conditions. Detectors should be tested for a long time (like any other video analytics) on an extensive set of positive and negative examples in different contexts. Authentication sourceOn the set of distributed objects (oil and gas pipelines, retail chains, gas stations, ATMs, entrances of apartment houses, etc.), which operated and maintained thousand or more surveillance cameras, is very relevant reliable protection from possible fraud video. Implementing an effective mechanism of protection involves the use of digital signature scheme flow throughout the chain of transmission from the source (camera) to the user (operator position). In the video embedded metadata and hidden digital watermarking, containing a unique identifier for your camcorder, or the current time, such as GLONASS / GPS-coordinate.Inconsistency timestamp and checksum allows you to instantly identify fraud. At the same analog camera system "can not" apply a cryptographic tag on the video. It is possible to use other security mechanisms on the side of the encoder or server. In particular, the intelligent encoder can react in time substitute video using the Service or detectors to detect evidence looping video on statistical grounds. If it is difficult to avoid false positives when switching the camera from the night mode in the daytime or in the case of a sharp-locked to the external environment. In contrast to the "smart" IP-cameras, the authentication of analog cameras to date remains "sick" problem. Indexing eventsIndex event significantly increases the efficiency of law enforcement agencies in investigating accidents. Using the index, an employee might find the frame you want the video archive of 10 or even 100 times faster than a manual search. The ability to search "on the events of" significantly simplify the work of both law enforcement agencies in postanalize offense, and system operators in the identification and classification of the incident.
Fig. 2. Tracking overlapping people on a busy train station scene Currently, under the "Safe City" for the protection of housing and communal services most in demand detectors following types of events: the appearance of man (watch video), the entrance or exit door, fast moving, waiting, face detection.
Fig. 3. Stop detector
Fig. 5. Motion detector in the wrong direction / location Recognition accuracy event input and output can be greatly enhanced by the synchronous operation videoanaliticheskogo software serving internal and external camera of a doorway. The accuracy of the detector is characterized by the probability of detection desired event (sensitivity) and false positives. Various scenarios for the index suggest a different ratio of allowable values of these two components of accuracy. Indexing detector allocates time events and forms an image to illustrate the event. Quality of choice in the frame is largely determined by the convenience and efficiency of the detector. If the detector takes the first available shot, it is likely, such a frame would not be representative, and the operator would need additional time to continue watching videos. A selection of the optimal angle is particularly important in detecting individuals. Prompt recognition of human behaviorRecently, it was video analytics rapid detection of suspicious or abnormal conditions (such as the abandoned object , fight, fire, smoke, falling or lying person) has been discussed by market participants to a greater extent than indexing service and detectors. In the minds of the end-user problem of recognition of human behavior in "real time" took first place. Indeed, preventing crime at the time of his accomplishments and / or operative to apprehend a criminal in "hot pursuit" is an important task that should look for experts in machine vision and artificial intelligence. Prompt recognition, of course, possible at the current level of technology of computer vision, but only in special cases - if a certain position of the camera, light and stable characteristic of human behavior. But, unfortunately, until there are enough generic algorithms, which, in my opinion, under the "Safe City" would be economically justified. Another difficulty is the rapid recognition of the fact that criminal activities in most cases rarely occur in the field of view of cameras installed. Attackers are well aware of their location and performance monitoring. Camera control only a small fraction of the space housing, where the pronounced action against the law, it is highly unlikely. At the same time security staff often use the camera in retrospect to restore the course of events by circumstantial evidence. Herein lies the main advantage of public video surveillance systems, and therefore it is important indexing detectors. Thus, before the program "Safe City" now face an easier task than prompt recognition, namely: provision of video quality, comprehensive health monitoring system, a quick search on the video archive, and remote access for diagnostics. On the market right now, there are finished products that meet these requirements, the case for the right choice and make the proper system integration. As I mentioned above, only the evolutionary development of the project "Safe City" can lead to the desired result - a high efficiency. Obviously, for the evolutionary development of the project engineers to work directly with domestic suppliers of modern technology video analysis, leading the continuous improvement of algorithms and guaranteeing the phased introduction of video analytics in the already deployed projects. Among other things, the choice of platform is important to clearly aware of the cost of renovation, "mathematics" in the future.And definitely need to foresee a stock of hardware resources for analytics and technology to choose a centralized software updates on all the nodes of a complex system. To "concerned" person is not accused me of bias and partiality, I am asked to comment on my situation described by specialists: Nicholas Ptitsina (Ph.D. MSTU. Bauman) and Anton Konushin (CYP-m . n. MSU. M. University) How do you think, how Russian technology in general and the area of computer vision in particular are willing to work in nelaboratornyh conditions? Are the technologies in their current form to the challenges facing security at the city level? Anton Konushin: The leading scientific conferences on computer vision and image analysis, Russian scientists are extremely weak, publications can be counted literally on the fingers. It makes you wonder about the level of algorithms offered by domestic companies, as most modern, efficient algorithms to analyze images and videos have been proposed yet in the academic environment. Tasks such as searching for items left behind, the detector stops the object movement in the wrong direction, you can decide based on the widely known and proposed 10 years ago, background subtraction algorithms (eg, by modeling the color at each pixel by a mixture of normal distributions). Therefore, the Russian technology, most likely, quite suitable for solving such problems. Tracking overlapping people on a busy scene, the detectors of events, recognizing people by their clothing - these problems are now being actively studied in foreign universities, but still far from being solved. Published algorithms are still not reliable and often require complex configuration to be applied in real urban environments. In practice, the proposed algorithms are efficient today with a relatively strong constraints on the technical parameters of the video input (noise, resolution of the camera, etc.) and characteristics of the motion. This allows us to demonstrate impressive results in some examples, but causes problems when trying to use the algorithms in real conditions. Nikolai Ptitsyn: Assess readiness of business analytics for use in combat conditions, the most convenient with the table. Associate the three categories of scenes (low, medium and high density mobile targets) c three relevant scenarios, "roles" video analytics (first, recording events, and secondly, operational alert when a suspicious trajectory/velocity and, thirdly, operative anxiety in abandoned the subject).
Detectors running, stopping and penetration into the forbidden zone can be used in the first and second scenarios, but the accuracy requirements for them are different. Scenario "operational alert" more sensitive to the number of false positives detector. General framework detectors these two scenarios is the tracker non-shared goals, that is tracking algorithm (tracing) of whole objects in the field of view. Unlike the third scenario is that the detection of abandoned objects requires much more sophisticated tracker, which works with shared goals (Split Target Tracking). Algorithmic complexity of recognizing targets increases from left to right from the unsaturated to the lively scene and a bottom-up from the role of recording events to the role of operative recognition. As shown in the table, appropriate detectors for use in the sterile area are available for virtually all of these roles today. On the other hand, the most complex machine vision algorithm is the detection of objects left on the busy scenes. According to our estimates, sales of technology to solve this problem will appear not earlier than 3 years. What can you say about the methods of detecting the movement (tracking), and posted items? Are there any standards or generally accepted definition? Anton Konushin: One of the reasons for the rapid development of algorithms for computer vision has recently become just the increased attention paid to assess and compare the quality of existing algorithms. The general rule in computer vision has been holding special seminars on the comparison of recent developments in a particular area. To this end, the organizers plan ahead and spread on the Internet part of the test base, which writers can test their algorithms and systems. Then the results of the algorithms are sent to the organizers who conduct tests on the closed portion of the base. The results of this comparison are becoming the de facto standard quality assessment. In the area of surveillance regularly conducts seminars PETS - Performance Evaluation of Tracking and Surveillance. In 2007, the theme of the seminar was to compare the methods to search for abandoned objects, detecting cases of theft of luggage, etc. Workshop in 2009 dedicated to the analysis of groups of people - how many people in the group, event detection, such as "Flight" and "the formation of crowds", etc. Russian scientists who participated in these seminars, we do not know. Judging by the results of comparisons of existing algorithms can successfully solve these problems, but not in all cases. Nikolai Ptitsyn: Basic algorithmic techniques for solving problems of maintenance (tracking) in streaming video known for some time: a statistical modeling of targets and background, multiscale analysis, spatio-temporal correlation, optical flow, hidden Markov model, dynamic programming, etc. I believe that all problem addressed in our discussion, will eventually be solved by known algorithms. In other words, the main problem - not a conceptual and engineering (optimizing performance, distributed computing, and choosing the right architecture). In terms of classification (recognition) of behavior (eg, running, stopping the movement of the forbidden), it suffices to use simple terms and conditions, and there is no need for more sophisticated methods of artificial intelligence such as neural networks, support vector machine, fuzzy logic. Say, for recognition of race is sufficient to establish a threshold for average speed and distance traveled. With regard to measurement accuracy and generally accepted standards, the most comprehensive materials, including videos, expert layout, and testing methodology, prepared by the organizers of the annual conference PETS (Performance Evaluation of Tracking and Surveillance) and group i-LIDS (Imagery Library for Intelligent Detection Systems) as part of the scientific British Ministry of Internal Affairs division. What's better, in your opinion: to work on a PC-video server (x86) (with the video server is engaged in the full range of tasks - from compression to a map) or to carry out video analysis on a separate device? Anton Konushin: standard definition is not enough for analysis and recognition of people in the video, so are increasingly used high-resolution cameras. Data stream from a camera with high frame rate can easily fill even a gigabit network with no compression and the compressed signal to analyze much more complicated. In addition, good communication channel makes it difficult to install the camera. Wireless communication (WiFi, WiMAX) is too unreliable, has a low sustained bandwidth, which is also difficult to transfer high definition video. Therefore, the tendency is to increase the intelligence of the camera - more power embedded processors, etc. Now their power is sufficient only to compression and simple processing algorithms, but in the long run even the most complex modern video processing algorithms can be implemented on embedded computer. That is my answer to your question - yes, carry out the analysis on a single device is uniquely promising (in the future such systems will show better results than those who will try to analyze the compressed stream), although it is similar and difficult to implement because of the comparative weakness of modern hardware base. Nikolai Ptitsyn: Mass video analytics for the "Safe City" is sure to be embedded in peripherals such as cameras and encoders, because this architecture provides a higher recognition accuracy, and better scalability than the server architecture. Qualitative tracking of objects on the server using megapixel cameras with optics long-range or high coverage is practically impossible on the server. On the other hand, an analyst for multi-tracking will deploy the resources of the camera and the server simultaneously, that is to be implemented architecture of distributed computing. How far the theory of the current implementation of what we will speak tomorrow in terms of detection and classification of objects? Anton Konushin: Computer vision depends on the algorithms of pattern recognition (Pattern Recognition) based on machine learning. A real breakthrough in computer vision at the end of 1990. was caused by just the advent of powerful new algorithms such as the strengthening of weak classifiers (Boosting), machine SVM (Support Vector Machine) and randomized decisive Forest (Random Forest). Based on amplification of a weak classifier method was developed for the search of persons Violo-Jones, called by the names of authors, is the de facto standard that solves the problem so well that all subsequent method is superior to its only slightly with respect. The simplicity and effectiveness of this method allowed to embed it, even in everyday cameras and cell phones. And from the publication of the method before it in real commercial products have passed a few years. Problem of recognition of people's behavior, especially in a large group, portable detection of objects, etc.proved to be difficult since most intraclass variability as compared to individuals. But the new algorithms also appear very quickly thanks to progress in the development of graphic models of analysis of images, such as Markov random fields (Markov Random Field) and fast algorithms for solving them (for example, on the basis of sections of graphs). |







