
Computer Vision
The course aims to develop an engineering understanding of image and video data analysis methods in modern artificial intelligence systems. At the core of the course lies the idea that computer vision systems are part of broader human‑machine decision‑making systems. The learning process is structured around the following axis: physical world → sensors → image → algorithms → interpretation → solution.
Modern digital systems increasingly interact with the physical world through sensors, cameras, and other surveillance devices, and computer vision methods play a key role in converting visual information into data suitable for analysis, interpretation, and decision‑making.
The programme covers three levels of computer vision system analysis:
- Physical level — studying the nature of image formation: shooting geometry, lighting, camera optical properties, and the influence of observation conditions. Understanding these aspects is essential for explaining algorithmic errors.
- Algorithmic level — mastering image processing methods: filtering, edge detection, segmentation, feature extraction, and object recognition.
- Architectural level — exploring modern computer vision system architectures, including convolutional neural networks (CNNs), detection and segmentation architectures, and multimodal AI systems.
The practical part of the course is based on experimental study of algorithm behaviour. Students carry out laboratory work with real cameras and sensors: they conduct lighting experiments, analyse the impact of shooting parameters, train computer vision models, and study error sources. This approach builds a practical understanding of how computer vision technologies work.
Large language models (LLMs) play an important role in the learning process and are used in three roles:
As an analytical tool — to analyse algorithm architectures, generate hypotheses, and explain experimental results;
As a development tool — to generate program code, design system architectures, and analyse algorithm errors;
As a critical analysis tool — to verify experimental results, search for alternative solutions, and evaluate model limitations.
Upon completing the course, students will know the main computer vision tasks, image processing and feature extraction methods, neural network architectures for image analysis, as well as typical error sources and technology limitations. They will be able to analyse images and video data, apply processing algorithms, train computer vision models, analyse errors, and design system architectures. Practical skills include proficiency in image analysis methods, tools for model development, and ways to integrate computer vision into AI systems — preparing graduates to tackle real‑world engineering challenges in the field of artificial intelligence.
OBJECTIVES
Understanding of the physical and algorithmic principles of image formation;
Mastering image processing methods;
Study of algorithms for extracting features and objects;
Mastering deep learning methods for computer vision tasks;
Understanding the architecture of computer vision systems;
Analysis of limitations and errors of computer vision systems;
Development of skills in designing applied image analysis systems.
KEY TASKS
Study of the physical nature of image formation;
Mastering methods of image processing and analysis;
Study of feature extraction algorithms;
Mastering methods of object recognition;
Study of modern neural network architectures for computer vision tasks;
Analysis of errors and limitations of computer vision systems;
Mastering methods of experimental study of algorithm behavior;
Developing skills for integrating computer vision into applied AI systems.
Main topics of the course:
1. Introduction to Computer Vision and Real‑World Applications. Explores real‑world problems requiring visual data analysis (quality control, video surveillance, robotics, medical diagnostics) and introduces the concept of an image as a numerical matrix (pixel values), highlighting the difference between human vision and computer data processing.
2. Image Feature Extraction and Classical Processing Methods. Covers the concept of feature extraction for industrial tasks (e.g., defect detection), demonstrating classical methods like the Sobel operator and Canny edge detector, and examines how lighting, noise, and scale affect edge detection robustness.
3. Convolutional Neural Networks (CNNs) for Image Analysis. Introduces CNN architecture, explaining convolution, feature maps, and pooling operations, and guides students through training a simple CNN to understand the full model cycle (data preparation, training, testing) and the phenomenon of overfitting.
4. Image Classification and Model Training Principles. Focuses on automatic object classification (used in retail and logistics), covering loss functions, accuracy metrics, and optimization, while analysing model errors and systematic bias caused by imbalanced datasets.
5. Object Detection: Principles and Modern Architectures. Examines object detection tasks (people counting, traffic analysis) and explains detection using bounding boxes, covering modern architectures like YOLO and Faster R‑CNN, and tests system robustness under challenging conditions (occlusion, poor lighting).
6. Image Segmentation: Semantic and Instance Approaches. Covers image segmentation for tasks in medical diagnostics and robotics, distinguishing between semantic and instance segmentation, and uses models like U‑Net or SAM to identify objects at the pixel level and analyse segmentation errors.
7. Video Data Analysis and Object Tracking. Explores video analysis systems (surveillance, industrial monitoring), introducing object tracking and motion analysis methods, and demonstrates how algorithms associate objects across video frames, analysing complex scenarios (rapid motion, intersecting trajectories).
8. Vision‑Language Models and Multimodal Systems. Introduces multimodal vision‑language models (e.g., CLIP) used in document analysis and intelligent assistants, and applies image captioning models to generate textual descriptions, examining errors in ambiguous or complex scenes.
9. Computer Vision System Architecture and Data Workflow. Covers the full data processing pipeline (image acquisition, preparation, training, inference, integration), introduces datasets, annotation, and quality metrics, and demonstrates how data quality and markup affect model performance through hands‑on dataset creation and training.
10. Integration of Computer Vision into Real‑World Systems. Focuses on deploying computer vision in practical applications (smart cameras, robotics, quality control), discusses technology limitations (computing resources, latency, data dependence), and culminates in a mini‑project (e.g., object counting or defect detection system) to apply learned concepts in a real‑world context.