High Definition Intelligent Network Video
Name:
Email:
Phone:
Message:

3D-video surveillance: tracking people on busy scenes using multiple cameras

Please, note that the article is automatically translated from Russian into English, so the translation may not be accurate.

Nikolai Ptitsyn

Automatic recognition of objects and situations according to the streaming video systems, security video surveillance is an important scientific and engineering challenge. Combining video analytics technology and three-dimensional modeling allows you to more accurately analyze the behavior of people during the busy scenes where fail-Video standard, operating in the space of 2D or 2.5D.

Why do I need a 3D-surveillance?

We have already touched on the theme for multi-tracking in the article "The future of video surveillance: multiple camera tracking", where we analyzed the technical approaches to automatically detect the trajectory of people in the three-dimensional space, such as in high-rise building. A key feature of this system is the ability to "pass" followed by an object from one camera to another.

Today, we consider the construction of cross-video surveiilance for accurate tracking of people in a busy scene, such as a hall terminal or shopping mall. The problem of calculating and analyzing the individual trajectories of human movement occurs in many areas of city life: preventing and investigating criminal acts, traffic management, marketing, retail sales and tagging advertising on the numeric keypad (see the table).

Table. Problems solved 3D-video analytics in public places
Task Groups Examples
Operative anxiety about suspicious behavior or abnormal situation
  • Detection of crowds;
  • Detection of the movement against the flow of people or out in the wrong place;
  • Detection of abandoned object;
  • Detection of "idle hesitation" (stay in the zone above the preset time)
Quick support the operator during monitoring suspects
  • Operational data about the movements of a person before the observation;
  • Accompanying person in a crowd;
  • Search for other persons with whom the person observed
Formation index video archives
  • Automatic control of PTZ-camera to produce high-quality photos of each visitor;
  • Snap track to register a passenger operations at the checkout counter, the baggage claim;
  • Defining moments of time talking on his cell phone;
  • Defining moments of the entry and exit through a given door
Data collection for statistical processing (the improvement of service quality, market research, sociology)
  • Counting passengers / customers;
  • Evaluation of waiting time;
  • Determining the types of behaviors;
  • Comparison of the data about the movements and make a purchase

 

Application of video analytical algorithms in public places is complicated significantly the density of people, their mutual overlapping and complex geometry of the space (Fig. 1). Compared with the tasks of monitoring the perimeter or entrances of residential buildings, automated control of public places suggests a more intellectual level of  video analytical system, allowing effectively to provide information from a large data flow from one side, and supplement missing data - on the other.

Система телекамер в зале прибытия аэропорта Гатвик города Лондон. Камеры 2, 3 и 4 перекрываются в зонах, выделенных серым цветом

Fig. 1. The system of cameras in the arrivals hall of London Gatwick Airport. Camera 2, 3 and 4 overlap in areas shaded 1

Algorithms that handle the flow of only one camera in the space of 2D (see the article  "Video motion detectors" ) or 2.5D (see the article "2.5D space: restoring image depth parameters from a single camera"), can not cope in a group of people and the more in heavy traffic. Fig. Figure 2 shows an example of the tracking system 2.5D, where people in the group are recognized as a single entity. The desired result - the individual trajectory for each person - is presented in Fig. 3. A major shortcoming of systems 2.5D is a significant error in measuring the "depth" scene and the real size of the observed targets (Fig. 4).

Универсальный видеодетектор и трекер не могут обеспечить индивидуальное сопровождение людей на оживленной сцене в пространстве 2.5D. Группа людей сегментируется как единое целое.

Fig. 2. Universal-Video and the tracker can not provide individual support of people on the bustling scene in the space of 2.5D. A group of people is segmented as a whole.

Желаемый результат: cпециализированный детектор и 3D-трекер в многокамерной системе обеспечивают индивидуальное сопровождение людей на оживленной сцене

Fig. 3. Desired outcome: Specialization detector and a 3D-tracker in multi-chamber system provide individual support of people in a busy scene.

Operating principal

Ход лучей камеры C при наблюдении за объектами 1 и 2. Красным цветом выделена зона возможного перекрытия с другими объектами

Fig. 4. Path of the rays chamber C by observing the objects 1 and 2. Red a zone of possible overlap with other objects.

Ход лучей камер A, B и C при наблюдении за объектами 1 и 2. Объекты 1 и 2 сливаются (загораживают друг друга) в поле зрения камеры B, но хорошо различимы в поле зрения камер A и С. Координаты и размеры объектов локализированы в трехмерном пространстве

Fig. 5. Path of the rays cameras A, B and C by observing the objects 1 and 2. Objects 1 and 2 merge (obstruct each other) in the field of view B, but clearly distinguishable in the field of view of cameras A and C. The coordinates and sizes of objects localized in three dimensions.

Deployment of cameras cross-examination (Fig. 5) allows a substantial degree solve the problem of insufficient information and ambiguity of 2D-video. Firstly, the multi-chamber system increases the likelihood of a successful segmentation and object tracking in the field of view of any one chamber due to a larger number of angles of observation. Secondly, the system can calculate with great precision and depth of the real dimensions of objects.

Due to large centralized processing-intensive tasks on the video server is not a promising architectural solution.Require decentralized processing scheme stream video on a built computer directly into the camera or encoder to compress the image. In this task the server is in the management computer and to share information between them. Important task is to synchronize the time of handling complaints and support relevant spatial calibration of all calculators.

Calculators, embedded camera should be exchanged between a 2D-coordinates and attributes tracked objects at high speed. Delay the transfer of these data should not exceed the processing time of several frames. In this case, for a dense flow of people is important to capture and process the color image with high frame rate. Thus, it is the relevance of the direct data exchange between the chambers. The topology of the P2P-interaction is determined by three-dimensional model of placing cameras at a controlled site.

Consider a possible sequence of algorithmic operations on the system for multi-tracking:

  1. Detection of motion (2D) is produced by standard methods, eg based on motion vector codec or subtracting the current frame of a statistical model of the background.
  2. Detection of the object (the figure of a man's head from his shoulders in 2D) is possible through the use of digital filters and classifiers, by analogy with common detectors persons. Detector objects can combine data on the shape and variability of the analyzed image area. Variability is determined by the motion detector at the previous step. An important nuance is that in order to control the entire area of a large room cameras are installed at different angles to the horizontal. Therefore, digital filters and classifiers should be automatically generated based on the three-dimensional calibration.
  3. Modeling object (2D) involves the accumulation of data about the shape, color, and change is accompanied by a man. You can use algorithms similar to algorithms for modeling the background when detecting motion.
  4. Prediction of the object (3D) at the current time is based on data on the 3D-position, velocity, acceleration accompanied by a person, calculated in the previous cycle. Three-dimensional coordinates are then converted into two-dimensional, based on the calibration information of the camera.
  5. Refinement of 2D-coordinate is the correlation of the object model and the current frame of the projected location of the object. The maximum correlation value corresponds to the most probable location. The reliability of the result is determined by comparing the correlation values at different points. So, if accompanied by an object is temporarily lost from sight, the reliability drops sharply.
  6. Grouping of 2D data object includes the collection and comparison of 2D-attributes of the objects with different cameras to view three-dimensional model of their mutual arrangement. Data with low reliability are discarded. The operation follows an object for each form input data for calculating its coordinates in three dimensions.
  7. The calculation of the real position of the object (3D) consists of solving the system of equations that minimize the mean square error coordinate transformation of the two-dimensional space.
  8. Calculation features (2D/3D) is necessary for comparing objects observed by different cameras (step 6).The most simple are the signs - the size and color - in 3-4 zones (headdress, face, clothing, top and bottom). Speed and acceleration are calculated based on the sequence of coordinates in 3D-space and are used in predicting the position (step 4).

The use of cameras of high definition (HD) can reduce their number, but it significantly increases the load evaluator in each chamber. Sometimes justified to use a larger number of cameras and / or high frame rate at a lower resolution.

For the detection of people at different distances, as well as to optimize the performance of the application of appropriate methods of multiscale analysis at all stages of image processing.

Actual tasks of the 3D-video developers

Motion detection technology with the help of TV cameras have been used successfully in related areas. Thus, in the film industry and computer games believable character animation is obtained by recording the movement of a living actor in the studio (Fig. 6).

Захват движения актера при помощи системы телекамер и флуоресцентных маркеров

Fig. 6. Motion capture actor using a system of cameras and fluorescent markers

In this case, the introduction of 3D-video technology in the field of security requires considerable adaptation of mathematical algorithms, software and hardware. Security applications impose more stringent requirements in terms of fault-tolerance, precision tracking, cost and scalability. It is these problems now decide the organization leading the development of 3D-video.

 


1 were used the materials from the site of the Scientific Division of the Ministry of Internal Affairs of the UK, section Imagery library for Intelligent Detection Systems (I-LIDS), http://scienceandresearch.homeoffice.gov.uk/hosdb/cctv-imaging-technology/video-based-detection -systems/i-lids /