Introduction to mining massive dataset

Introduction to mining massive dataset

1.

Subject title

Introduction to mining massive dataset

Вовед во рударење на масивни податоци

2.

Code

F23L3W154

3.

Study program

Софтверско инженерство и информациски системи, Компјутерски науки, Компјутерско инженерство, Интернет, мрежи и безбедност, Software engineering and information systems, Примена на информациски технологии, Софтверско инженерство и информациски системи, Компјутерски науки, Компјутерско инженерство, Интернет, мрежи и безбедност, Информатичка едукација, Software engineering and information systems, Примена на информациски технологии, Стручни студии за програмирање, Стручни студии за програмирање, Bioinformatics, Software Engineering,

4.

Organizer of the study program (unit, institute, department, division)

Faculty of Information Sciences and Computer Engineering

5.

Study cycle (first, second, third)

Прв циклус

6.

Academic year / semester

4 / Зимски

7. Number of ECTS credits

6.0

8.

Instructor

ворн. проф. д-р Ефтим Здравевски проф. д-р Ѓорѓи Маџаров

9.

Prerequisites for enrollment

Алгоритми и податочни структури или Примена на алгоритми и податочни структури

10.

Subject goals and competencies:


Students will become familiar with data mining and machine learning algorithms and techniques for analyzing large data sets. Focus will be given to distributed platforms as well as how to define and create algorithms for processing and analyzing very large data sets.

11.

Subject content:


Lectures: 1. Introduction to MapReduce 2. Frequent sets and associative rules 3. Nearest Neighbor Search in High-Dimensional Data, Location-Sensitive Hashing 4. Dimensionality reduction (SVD and CUR) 5. Recommendation systems based on collaborative filtering 6. Content-Based Recommender Systems and Hybrid Recommender Systems 7. Clustering of massive data sets 8. Supervised learning on massive data sets (K-nearest neighbors, approximate K-nearest neighbors) 9. Supervised learning with massive data sets (classification and regression trees) 10. Supervised learning on massive data sets (deep neural networks) 11. Window-based continuous processing of massive data 12. Mining data represented by graphs Exercises: 1. MapReduce examples 2. Building associative rules 3. Use of location-sensitive hashing and nearest-neighbor search in multi-dimensional data 4. Use of SVD and CUR 5. Work with recommendation algorithms based on collaborative filtering 6. Working with content-based recommendation algorithms and collaborative filtering 7. Use of clustering methods on massive data sets 8. Use of K-nearest neighbors methods on real problems 9. Working with classification and regression trees 10. Working with deep neural networks to analyze massive data sets (self-supervised learning) 11. Work with algorithms for continuous processing of massive data based on a window 12. Work with graph neural networks

12.

Learning methods:


Предавања со користење на презентации, интерактивни предавања, вежби (користење на опрема и софтверски пакети), тимска работа, пример случаи, поканети гости предавачи, самостојна изработка и одбрана на проектна задача и семинарска работа.

13.

Total available time fund

6.0 ECTS x 30 hours = 180 hours

14.

Time distribution

30 + 45 + 0 + 30 + 75 = 180 hours

15.

Forms of teaching activities

15.1.

Lectures - theoretical teaching

30 hours

15.2.

Exercises (laboratory, classroom), seminars, team work

45 hours

16.

Other forms of activities

16.1.

Project tasks

30 hours

16.2.

Independent tasks

0 hours

16.3.

Homework

75 hours

17.

Grading method

17.1.

Tests

0 points

17.2.

Seminar work / project (presentation: written and oral)

30 points

17.3.

Activities and learning

0 points

17.4.

Final exam

60 points

18.

Grading criteria (points / grade)

up to 50 points

5 (five) (F)

from 51 to 60 points

6 (six) (E)

from 61 to 70 points

7 (seven) (D)

from 71 to 80 points

8 (eight) (C)

from 81 to 90 points

9 (nine) (B)

from 91 to 100 points

10 (ten) (A)

19.

Condition for signature and taking final exam

реализирани активности

20.

Language of instruction

македонски и англиски

21.

Quality assurance method

механизам на интерна евалуација и анкети

22.

Literature

22.1.

Mandatory literature

No.

Author

Title

Publisher

Year

8518

Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman

Mining of Massive Datasets

Cambridge University Press

2020

8519

Avrim Blum, John Hopcroft, and Ravindran Kannan

Foundations of Data Science

Draft version

2017

8520

Jiawei Han,‎ Micheline Kamber,‎ and Jian Pei

Data Mining: Concepts and Techniques, Third Edition

Morgan Kaufmann

2011

8521

0

22.2.

Additional literature

No.

Author

Title

Publisher

Year