Data Structures and Databases

The course is offered in the second semester and serves as a logical continuation of the «Algorithms and Data Structures» course, closely integrated with the end‑to‑end instrumental course «Algorithmization and Programming Languages». It facilitates a structured transition from abstract data structures to their physical implementation, then to storage systems, and finally to data architecture in AI systems.

Upon completing the discipline, the student should possess comprehensive knowledge and skills in the design and operation of data storage systems, including a solid understanding of the principles of physical data organization and indexing mechanisms (such as B‑tree and hash structures), as well as the fundamentals of cost‑based optimization, transaction models, and isolation levels. The student will recognize the differences between relational and non‑relational data models, be familiar with methods for storing and searching embeddings, and understand key architectural trade‑offs — particularly the balance between scalability and data consistency.

In terms of practical abilities, the student will be able to design database schemas and work with indexes (create and analyze them), interpret query execution plans, and perform performance measurements. They will be capable of reproducing and analyzing transaction conflicts, selecting the most suitable storage model for specific tasks, designing hybrid storage architectures, and critically assessing recommendations from large language models (LLMs).

Furthermore, the student will gain practical proficiency in experimental analysis methodology and the culture of engineering measurements, develop strong skills in justifying architectural solutions, and master the practical use of SQL, NoSQL, and vector databases. They will also learn to effectively leverage LLMs as both a supportive assistant and a critical opponent during the system design process.

Mastering the physical storage model (pages, indexes, logs);

Study of the relational model and optimization mechanisms;

Analysis of query execution plans;

Mastering the transactional model and isolation levels;

Comparison of SQL and NoSQL models;

Mastering the principles of vector search and embeddings storage;

Development of experimental hypothesis testing skills;

Development of engineering argumentation when choosing architectural solutions;

Creating a culture of working with LLM as an analytical tool.

Main topics of the course:

1. Persistent storage vs RAM. Physics of I/O. Latency measurement, I/O profiling, time growth graph construction, file index implementation.

2. Page‑based storage. Buffering. Page size impact, page size emulation, I/O calculation, optimum size selection.

3. Indexes: B+tree and hash. Full scan vs index scan comparison, index creation, performance measurements.

4. Normalization and circuit design. Anomaly demonstration, 3NF conversion, schema design, constraint testing.

5. SQL and JOIN algorithms. Suboptimal query analysis, optimization, acceleration proof.

6. Cost‑based optimizer. Statistics and plan selection, index changes, EXPLAIN analysis.

7. Transactions and isolation. Deadlocks. Dirty reads, lock conflicts, deadlock playback, log analysis.

8. Document and key‑value. CAP theorem. SQL vs document database comparison, circuit redesign, performance measurements.

9. Graph databases. SQL traversal vs graph query, graph construction, path finding, time comparison.

10. Embeddings and similarity search. Cosine similarity, search implementation, error analysis.

11. Vector DB and ANN. Brute force vs approximate nearest neighbors (ANN), index setup, latency/quality comparison.

12. ETL and data pipelines. Batch vs streaming, ETL implementation, logging.

13. Replication and fault tolerance. Failover, crash behavior analysis, runbook recovery.

14. Integration of SQL + NoSQL + Vector. Hybrid storage architecture, component integration, load testing.

DOWNLOAD THE FULL COURSE SYLLABUS

BACK TO THE CURRICULUM

BACK TO MASTER'S PROGRAM

ABOUT HES MEPHI

Data Structures & Databases

OBJECTIVES

Developing an engineering understanding of the physical organization of data storage;

Mastering indexing, transactional consistency, and query optimization mechanisms;

Development of experimental performance analysis skills;

Forming the ability to choose a storage model in the face of architectural compromises;

Preparation for designing data warehouses in AI-oriented systems;

Mastering the model of conscious and critical use of LLM in engineering analysis.

KEY TASKS

Mastering the physical storage model (pages, indexes, logs);

Study of the relational model and optimization mechanisms;

Analysis of query execution plans;

Mastering the transactional model and isolation levels;

Comparison of SQL and NoSQL models;

Mastering the principles of vector search and embeddings storage;

Development of experimental hypothesis testing skills;

Development of engineering argumentation when choosing architectural solutions;

Creating a culture of working with LLM as an analytical tool.

Main topics of the course:

HES MEPhI

VK / Vkontakte

Yandex.Dzen

MAX

Telegram

Youtube

Rutube

NRNU MEPhI Admissions Committee: