COURSE INFORMATION
Course Title: DATA MINING AND MACHINE LEARNING
Code Course Type Regular Semester Theory Practice Lab Credits ECTS
BIDS 406 B 2 3 0 2 4 8
Academic staff member responsible for the design of the course syllabus (name, surname, academic title/scientific degree, email address and signature) Dr. Florenc Skuka fskuka@epoka.edu.al
Main Course Lecturer (name, surname, academic title/scientific degree, email address and signature) and Office Hours: Dr. Florenc Skuka fskuka@epoka.edu.al
Second Course Lecturer(s) (name, surname, academic title/scientific degree, email address and signature) and Office Hours: M.Sc. Ferit Melih Akaybicen fakaybicen@epoka.edu.al
Language: English
Compulsory/Elective: Compulsory
Study program: (the study for which this course is offered) Master of Science in Business Intelligence and Data Science
Classroom and Meeting Time:
Teaching Assistant(s) and Office Hours: NA
Code of Ethics: Code of Ethics of EPOKA University
Regulation of EPOKA University "On Student Discipline"
Attendance Requirement: Attendance is mandatory. Students must attend at least 75% of classes.
Course Description: Machine learning is the process of automatically building mathematical models that explain and generalise datasets. It integrates elements of statistics and algorithm development into the same discipline. Data mining is a discipline within knowledge discovery that seeks to facilitate the exploration and analysis of large quantities for data, by automatic and semiautomatic means. This subject provides a practical and technical introduction to machine learning and data mining. Topics to be covered include problems of discovering patterns in the data, classification, regression, feature extraction and data visualisation. Also covered are analysis, comparison and usage of various types of machine learning techniques and statistical techniques.
Course Objectives: This course provides an introductory yet comprehensive overview of machine learning concepts and techniques for Masters students in Fintech and Business Intelligence. Using Orange Data Mining as a visual, no-code tool, students will learn supervised and unsupervised learning methods, model evaluation, feature engineering, text mining, and responsible AI practices. The course emphasizes real-world financial applications including fraud detection, credit scoring, customer segmentation, and sentiment analysis. Students will gain hands-on experience building, evaluating, and interpreting ML models without requiring programming skills.
BASIC CONCEPTS OF THE COURSE
1 Machine Learning fundamentals: supervised, unsupervised, and reinforcement learning
2 Data quality, preprocessing, normalization, and feature engineering
3 Classification algorithms: Decision Trees, k-Nearest Neighbors, and Support Vector Machines
4 Model evaluation: accuracy, precision, recall, F1-score, ROC/AUC, and cross-validation
5 Regression: linear and polynomial regression with metrics (MAE, RMSE, R-squared)
6 Ensemble methods: Random Forests (bagging) and Boosting (AdaBoost, Gradient Boosting)
7 Clustering: k-Means and hierarchical clustering for unsupervised learning and customer segmentation
8 Dimensionality reduction: Principal Component Analysis (PCA) and t-SNE visualization
9 Text mining and NLP: bag of words, TF-IDF, and sentiment analysis on financial text
10 Responsible AI: algorithmic bias, fairness, explainability (SHAP/LIME), and AI regulation (GDPR, EU AI Act)
COURSE OUTLINE
Week Topics
1 What is Machine Learning? + Orange Setup. AI vs ML vs Deep Learning — demystifying the buzzwords. Types of ML (supervised, unsupervised, reinforcement). Real-world Fintech/BI examples (fraud detection, credit scoring, customer segmentation). Lab: Install Orange, tour the interface, load sample datasets, connect widgets.
2 Data — The Foundation of Everything. What makes good data? Features vs targets. Data types (numerical, categorical, text, time series). Data quality issues (missing values, outliers, duplicates). Intro to financial datasets. Lab: Load a financial dataset, use Data Table, Feature Statistics, Box Plot, Scatter Plot widgets to explore.
3 Data Preprocessing & Feature Engineering. Student presentations (3 case studies). Why preprocessing matters: normalization, standardization, handling missing data, encoding categorical variables. Feature selection — which columns matter? Lab: Preprocess widget, Select Columns, Rank widget for feature importance.
4 Classification I — Decision Trees & kNN. Student presentations (3 case studies). What is classification? Decision Trees and k-Nearest Neighbors. Overfitting vs underfitting — the bias-variance tradeoff. Lab: Build a credit risk classifier, Tree Viewer widget, compare kNN and Tree using Test & Score.
5 Classification II — Support Vector Machines (SVMs). Student presentations (3 case studies). SVMs in depth: maximal margin classifiers, soft margins, kernel trick (polynomial, RBF). Intro to model evaluation (accuracy, precision, recall). Lab: SVM widget, visualize decision boundaries, Confusion Matrix widget.
6 Model Evaluation & Comparison. Student presentations (3 case studies). Why accuracy alone is misleading. Precision, Recall, F1-Score, ROC curves, AUC. Cross-validation explained. The cost of errors in finance. Lab: Test & Score with cross-validation, ROC Analysis widget, compare 3-4 models, Lift curves.
7 Regression — Predicting Numbers. Student presentations (3 case studies). Classification vs Regression. Linear Regression, Polynomial Regression. Evaluation metrics (MAE, RMSE, R-squared). Lab: Linear Regression on a housing/financial dataset, Predictions widget, Scatter Plot of predicted vs actual.
8 Midterm Review + Fun Competition. Student presentations (3 case studies). Quick recap of Weeks 1-7. Q&A session. Lab: In-class mini competition — students build the best classifier/regressor in Orange within 60-90 minutes on a new financial dataset. Leaderboard and prizes.
9 Ensemble Methods — Random Forests & Boosting. Student presentations (3 case studies). Wisdom of crowds — why combining models works. Bagging (Random Forest), Boosting (AdaBoost, Gradient Boosting). Lab: Random Forest widget, compare single Decision Tree vs Random Forest, AdaBoost widget.
10 Clustering — Finding Hidden Groups. Student presentations (3 case studies). Unsupervised learning. k-Means clustering, Hierarchical clustering. Silhouette scores. Applications: customer segmentation, market segmentation, anomaly detection. Lab: k-Means on customer data, Hierarchical Clustering with dendrograms, Silhouette Plot.
11 Dimensionality Reduction & Visualization. Student presentations (3 case studies). The curse of dimensionality. PCA (Principal Component Analysis). t-SNE for visualization. Why reducing dimensions helps models and humans. Lab: PCA widget, t-SNE widget, FreeViz and Radviz for interactive exploration.
12 Text Mining & Sentiment Analysis. Student presentations (3 case studies). Unstructured data in finance (news, earnings calls, social media). Bag of words, TF-IDF. Sentiment analysis. Lab: Orange Text Mining add-on, Corpus widget, Preprocess Text, Bag of Words, Word Cloud, Sentiment Analysis on financial tweets.
13 Responsible AI & Practical Considerations. Student presentations (3 case studies). Bias in ML (credit scoring discrimination, hiring algorithms). Fairness, explainability, transparency. GDPR and AI regulations. Model deployment basics. Lab: Explain widget (SHAP-style explanations), individual prediction analysis.
14 The Future of ML in Finance + Course Wrap-Up. Final student presentations (4 case studies). Large Language Models and Generative AI in finance. AutoML — the future of no-code ML. Career paths in ML/Fintech. Resources for continued learning. Lab: Free exploration — students try any algorithm or dataset.
Prerequisite(s): None.
Textbook(s): Aurélien Géron, "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow", 3rd Edition, O'Reilly Media, 2022.
Additional Literature: 1. Orange Data Mining Documentation (https://orangedatamining.com/docs/) 2. StatQuest YouTube Channel by Josh Starmer — beginner-friendly ML explanations 3. UCI Machine Learning Repository — datasets for practice 4. Kaggle — datasets and competitions for hands-on learning
Laboratory Work: Weekly hands-on lab sessions using Orange Data Mining for data exploration, preprocessing, model building, evaluation, and visualization.
Computer Usage: Orange Data Mining software is used extensively throughout the course for all lab exercises and projects.
Others: No
COURSE LEARNING OUTCOMES
1 Explain the fundamental concepts of machine learning, including supervised, unsupervised, and reinforcement learning, and distinguish between AI, ML, and deep learning.
2 Perform data preprocessing and feature engineering tasks including handling missing values, normalization, encoding, and feature selection using Orange Data Mining.
3 Build and apply classification models (Decision Trees, kNN, SVMs) to solve real-world problems such as credit risk assessment and fraud detection.
4 Evaluate and compare machine learning models using appropriate metrics including accuracy, precision, recall, F1-score, ROC curves, and cross-validation.
5 Apply regression techniques (linear and polynomial) to predict continuous outcomes and assess model performance using MAE, RMSE, and R-squared.
6 Understand and apply ensemble methods (Random Forests, AdaBoost, Gradient Boosting) and explain why combining models improves predictive performance.
7 Perform unsupervised learning tasks including k-Means and hierarchical clustering for customer segmentation and anomaly detection in financial data.
8 Apply dimensionality reduction techniques (PCA, t-SNE) to visualize and simplify high-dimensional datasets.
9 Conduct text mining and sentiment analysis on financial text data using bag of words, TF-IDF, and classification techniques.
10 Critically assess ethical considerations in ML including bias, fairness, explainability, and regulatory compliance (GDPR, AI Act) in financial applications.
COURSE CONTRIBUTION TO... PROGRAM COMPETENCIES
(Blank : no contribution, 1: least contribution ... 5: highest contribution)
No Program Competencies Cont.
Master of Science in Business Intelligence and Data Science Program
1 Demonstrate understanding the value of data driven decision making. 4
2 Graduates will acquire the ability to make informed decisions based on data analysis and interpretation. 5
3 Identify the basic concepts that underpin today’s organizational IT infrastructures, such as concepts of databases, information systems, operations and processes, cloud computing, data warehousing and Big Data, Data Mining and Machine Learning. 2
4 Students will develop advanced skills in data analysis techniques, including statistical analysis, data mining, data visualization, and predictive modeling. 5
5 Apply data mining/analytics (statistical and machine-learning) in order to solve real-world business problems. 5
6 Develop skills related to data analytics pipeline from collection, processing, analysis and interpretation 3
7 Graduates will develop strong communication skills to effectively present complex data analysis findings to diverse stakeholders. 2
8 Effectively communicate to top management the results and implications arising from data analytics, security risk assessments, and emerging technologies. 2
9 Demonstrate professionalism and leadership by taking initiatives within their domain of responsibility while working effectively with other team members. 2
10 The program offers practical training and exposure to industry-standard software and tools used in business intelligence and data analysis. 4
COURSE EVALUATION METHOD
Method Quantity Percentage
Homework
10
3
Project
1
35
Final Exam
1
35
Total Percent: 100%
ECTS (ALLOCATED BASED ON STUDENT WORKLOAD)
Activities Quantity Duration(Hours) Total Workload(Hours)
Course Duration (Including the exam week: 16x Total course hours) 16 4 64
Hours for off-the-classroom study (Pre-study, practice) 14 4 56
Mid-terms 0
Assignments 10 3 30
Final examination 1 20 20
Other 1 30 30
Total Work Load:
200
Total Work Load/25(h):
8
ECTS Credit of the Course:
8
CONCLUDING REMARKS BY THE COURSE LECTURER

This course is designed to make machine learning accessible and engaging for Masters students in Fintech and Business Intelligence without a programming background. Using Orange Data Mining as a visual, no-code platform, students gain hands-on experience with real-world financial datasets throughout the semester. The emphasis is on building intuition and critical thinking rather than memorisation — students are encouraged to experiment, ask questions, and connect every technique to practical Fintech and BI applications. Weekly student presentations on real-world ML case studies foster peer learning and discussion. The course concludes with a forward-looking perspective on Generative AI, AutoML, and responsible AI practices, preparing students to be informed, ethical consumers and practitioners of machine learning in their professional careers.