City: Saint Petersburg Novosibirsk Kazan Language: Русский English

Modern algorithms for parallel, streaming and query-based data processing
Saint Petersburg / spring 2017, посмотреть все семестры

Enroll in the course to get notifications and to be able to submit home assignments.
Register to enroll now Login

In this mini-course we will cover recent developments in approximate algorithms for processing very large data sets. We will focus on fundamental techniques that can be used in a variety of different computational settings, including MapReduce-like batch parallel computation, algorithms for streaming data and data that can be accessed through queries. The list of topics will include: — Linear sketching: a powerful paradigm for compressing data that is based on carefully chosen random linear projections. Linear sketching allows to construct approximate arrays, estimate the size of cuts in graphs, compute approximate matchings, reduce dimension of vectors, solve linear regression and other linear-algebraic problems, etc.

— Core-sets: a baseline approach to problems involving geometric data core-sets allow to solve a variety of clustering and related problems. Core-sets are constructed by representing a dataset as a carefully selected subset of its points.

— Beyond linear sketches and core-sets: we will see how to design algorithms for massive data processing in situations when linear sketches and core-sets fall short of achieving optimal performance.

— Query-based sampling methods: we will see how queries can be used to design approximate decision-making algorithms for testing properties of high-dimensional and noisy data.

Date and time Class|Name Venue|short Materials
03 June
Лекция 1, Lecture ПОМИ РАН slides,  video
03 June
Лекция 2, Lecture ПОМИ РАН video
04 June
Лекция 3, Lecture ПОМИ РАН slides,  video
04 June
Лекция 4, Lecture ПОМИ РАН slides,  video