Data Mining

Data Mining

1.

Subject title

Data Mining

Податочно рударење

2.

Code

F23L3S150

3.

Study program

Примена на информациски технологии, Софтверско инженерство и информациски системи, Компјутерски науки, Компјутерско инженерство, Интернет, мрежи и безбедност, Примена на информациски технологии, Софтверско инженерство и информациски системи, Компјутерски науки, Компјутерско инженерство, Интернет, мрежи и безбедност, Информатичка едукација, Software engineering and information systems, Software engineering and information systems, Стручни студии за програмирање, Стручни студии за програмирање, Statistics and Data Analytics,

4.

Organizer of the study program (unit, institute, department, division)

Faculty of Information Sciences and Computer Engineering

5.

Study cycle (first, second, third)

Прв циклус

6.

Academic year / semester

3 / Летен

7. Number of ECTS credits

6.0

8.

Instructor

проф. д-р Билјана Тојтовска Рибарски доц. д-р Бојан Илијоски

9.

Prerequisites for enrollment

Веројатност и статистика или Математика 3 или Основи на теорија на информации или Бизнис Статистика

10.

Subject goals and competencies:


Familiarity with methods for identification of valid, new, useful and understandable patterns in data and discovery of new knowledge. Data preprocessing. Introduction to predictive models from data: classification, regression. Cluster detection. Mastering techniques for collecting data, transforming them into a form suitable for internal use and saving them.

11.

Subject content:


1. Introduction, examples of application of data mining methods. Data types. Similarity and distance measures between data: similarity and distance measures for nominal and binary. Standardization of numerical attributes. Distance measures for numerical attributes: Minkowski distance and special types, cosine distance, Euclidean square, Mahalanobis distance. Distance between data with mixed attributes. 2. Data preprocessing techniques: Data quality, data set cleaning, Data integration, data reduction, data transformation and data discretization. 3. Advanced data visualization. 4. Prediction models, regression models. Advanced concepts (analysis of residuals, confounding, adjustment, interpretation of results) 5. Classification. Supervised and unsupervised learning. Classification with decision trees. LDA classification 6. Clustering and evaluation of clusters. 7. Detection and handling of outliers 8. Evaluation of models 9. Analyzing time data and time series 10. Extract, transform and save data (Extract, Transform, Load) 11. Extract, transform and save data (Extract, Transform, Load) 12. Final project

12.

Learning methods:


предавања, аудиториски вежби, лабораториски вежби, проектни задачи, домашни задачи, разработка на програмски пакет со методи за податочно рударење

13.

Total available time fund

6.0 ECTS x 30 hours = 180 hours

14.

Time distribution

30 + 45 + 0 + 30 + 75 = 180 hours

15.

Forms of teaching activities

15.1.

Lectures - theoretical teaching

30 hours

15.2.

Exercises (laboratory, classroom), seminars, team work

45 hours

16.

Other forms of activities

16.1.

Project tasks

30 hours

16.2.

Independent tasks

0 hours

16.3.

Homework

75 hours

17.

Grading method

17.1.

Tests

10 points

17.2.

Seminar work / project (presentation: written and oral)

30 points

17.3.

Activities and learning

10 points

17.4.

Final exam

60 points

18.

Grading criteria (points / grade)

up to 50 points

5 (five) (F)

from 51 to 60 points

6 (six) (E)

from 61 to 70 points

7 (seven) (D)

from 71 to 80 points

8 (eight) (C)

from 81 to 90 points

9 (nine) (B)

from 91 to 100 points

10 (ten) (A)

19.

Condition for signature and taking final exam

Реализирани активности 15, 16

20.

Language of instruction

македонски и англиски

21.

Quality assurance method

механизам на интерна евалуација и анкети

22.

Literature

22.1.

Mandatory literature

No.

Author

Title

Publisher

Year

4673

Pang-Ning Tan, Michael Steinbach, Vipin Kumar

Introduction to Data Mining

Pearson Education Limited

2021

4674

Ralph Kimball, Joe Caserta

The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data

Wiley

0

22.2.

Additional literature

No.

Author

Title

Publisher

Year