Epoka University

FACULTY OF ECONOMICS AND ADMINISTRATIVE SCIENCES
DEPARTMENT OF BUSINESS ADMINISTRATION
COURSE SYLLABUS

2025-2026 ACADEMIC YEAR

COURSE INFORMATION

Course Title: BIG DATA MANAGEMENT

Code	Course Type	Regular Semester	Theory	Practice	Lab	Credits	ECTS
BIDS 407	C	1	3	0	2	3	5

Academic staff member responsible for the design of the course syllabus (name, surname, academic title/scientific degree, email address and signature)	Dr. Vilma Çekani vcekani@epoka.edu.al
*Main Course Lecturer (name, surname, academic title/scientific degree, email address and signature)* and Office Hours:**	Dr. Vilma Çekani vcekani@epoka.edu.al , Wednesday 10:00 AM - 12:00 PM
*Second Course Lecturer(s) (name, surname, academic title/scientific degree, email address and signature)* and Office Hours:**	NA
Language:	English
Compulsory/Elective:	Elective
Study program: (the study for which this course is offered)	Master of Science in Business Intelligence and Data Science
Classroom and Meeting Time:	NA
Teaching Assistant(s) and Office Hours:	NA
Code of Ethics:	Code of Ethics of EPOKA University Regulation of EPOKA University "On Student Discipline"
Attendance Requirement:	75%
Course Description:	This course provides an in-depth understanding of the management, storage, and processing of large-scale data. Students will explore the key concepts and technologies behind Big Data systems, including distributed computing, NoSQL databases, data warehousing, and data pipelines. The course will focus on how to efficiently manage and analyze massive datasets using modern tools like Hadoop, Spark, and cloud platforms. By the end of the course, students will have the skills to design, implement, and maintain Big Data solutions, addressing both the technical and business challenges associated with Big Data.
Course Objectives:	Understand Big Data Fundamentals: Gain foundational knowledge of what constitutes Big Data and the challenges associated with managing it. Explore Distributed Computing: Learn how distributed systems like Hadoop and Spark facilitate the processing of large datasets. Work with NoSQL Databases: Understand the use of NoSQL databases such as MongoDB, Cassandra, and HBase in handling unstructured data at scale. Master Data Pipelines: Learn how to design and manage data pipelines for ingesting, processing, and analyzing large datasets. Implement Cloud-Based Big Data Solutions: Explore how cloud platforms like AWS, Azure, and Google Cloud can be used for scalable data management. Apply Data Warehousing Concepts: Understand the architecture and management of data warehouses and their role in Big Data environments. Develop Skills for Big Data Analytics: Learn how to use Big Data tools to perform analytics and generate actionable insights from large datasets. Address Big Data Security and Privacy Concerns: Understand the importance of securing Big Data systems and managing privacy issues when handling sensitive information.

BASIC CONCEPTS OF THE COURSE

1	Big Data Management for Data Science - Introduction and Motivation
2	Big Data Design
3	Distributed Data Management Processing
4	Document Stores
5	Property Graphs
6	Big Data Architectures
7	Distributed File Systems
8	Storage Formats
9	MapReduce
10	Spark

COURSE OUTLINE

Week	Topics
1	Introduction to Big Data and Data Management. This week provides an overview of Big Data, its key characteristics, and the fundamental challenges associated with managing large datasets. Students will understand the differences between structured, semi-structured, and unstructured data, and explore the technologies required for handling Big Data.
2	Understanding Distributed Computing and Parallel Processing. In this week, students will learn the concepts of distributed computing and how it plays a critical role in Big Data management. We will explore parallel processing, the significance of distributed systems, and how they facilitate handling massive amounts of data using systems like Hadoop and Spark.
3	Introduction to Hadoop and the Hadoop Ecosystem. This week focuses on Apache Hadoop, one of the foundational technologies for Big Data. Students will learn about the Hadoop Distributed File System (HDFS), its architecture, and its role in storing large datasets. Additionally, they will explore the Hadoop ecosystem, including tools like Hive and Pig for querying and processing data.
4	Introduction to Apache Spark. Students will be introduced to Apache Spark, an in-memory computing system for Big Data processing. We will cover its architecture, components, and why Spark has become a popular tool for large-scale data analytics. Students will also work with Spark using Python and SparkSQL.
5	NoSQL Databases for Big Data. This week, students will explore NoSQL databases, which are designed to handle unstructured or semi-structured data. We will cover types of NoSQL databases (e.g., document-based, column-family, key-value stores), focusing on MongoDB, Cassandra, and HBase, and their use cases in Big Data management.
6	Data Warehousing and Big Data Integration. This week covers data warehousing concepts, including data modeling, ETL (Extract, Transform, Load) processes, and integration of Big Data into traditional data warehouses. Students will understand how to manage large-scale data integration using platforms like Apache Nifi and Talend.
7	Midterm
8	Cloud-Based Big Data Management. Students will learn how cloud platforms like AWS, Google Cloud, and Microsoft Azure support Big Data management. This week will focus on cloud storage services, distributed computing models, and how to scale Big Data operations in the cloud. Students will also explore cloud-based tools such as AWS EMR and Google BigQuery.
9	Big Data Storage Solutions. This week dives deeper into Big Data storage solutions. Topics include HDFS, Amazon S3, and distributed file storage systems. Students will learn how to efficiently store and manage data in Big Data environments, focusing on performance, security, and scalability.
10	Big Data Processing Frameworks. This week focuses on Big Data processing frameworks such as Apache Spark and Apache Flink. Students will gain hands-on experience with these tools to process large volumes of data efficiently, leveraging both batch and stream processing techniques.
11	Data Pipelines and Workflow Automation. Students will learn how to design and manage data pipelines that automate data processing tasks. We will cover tools like Apache Airflow and Luigi for workflow orchestration, as well as best practices for building and maintaining data pipelines in a production environment.
12	Big Data Analytics and Machine Learning. In this week, students will learn how to apply analytics to Big Data. Topics include data visualization, descriptive analytics, and predictive modeling using tools such as Spark MLlib. Students will also explore the integration of machine learning algorithms with Big Data processing frameworks.
13	Big Data Security and Privacy. This week focuses on the security and privacy concerns involved in managing Big Data. Students will learn about encryption, access control, data anonymization, and legal compliance frameworks such as GDPR for ensuring secure and privacy-compliant data handling.
14	Real-Time Big Data Processing. This week covers the processing of real-time data streams using technologies like Apache Kafka and Apache Storm. Students will learn how to handle data that is generated continuously, such as sensor data or user activity logs, and perform real-time analysis and decision-making.

Prerequisite(s):	-
Textbook(s):	"How Data Happened: A History from the Age of Reason to the Age of Algorithms" by Chris Wiggins and Matthew L. Jones (2023) "On the Evolution of Data Science and Machine Learning" by Ibraheem Azeem (2025) "Access Rules: Freeing Data from Big Tech for a Better Future" by Viktor Mayer-Schönberger and Thomas Ramge (2022) "Implementing Data Mesh" by Jean-Georges Perrin and Eric Broda (2024)
Additional Literature:	M. Tamer Özsu and Patrick Valduriez. Principles of Distributed Database Systems. Springer, 2020. "Data Mesh for All Ages" by Jean-Georges Perrin (2023) "Augmented Intelligence: Making Better Decisions with Data & AI" by Thomas Ramge (2023)
Laboratory Work:	2
Computer Usage:	Yes
Others:	No

COURSE LEARNING OUTCOMES

1	Apply mathematical and computational analysis of social, business and economic networks knowing the theory and optimization algorithms.
2	Make use of databases and cloud computing
3	Work with Big Data information using data mining techniques.
4	Create visualizations of information according to each type of data.
5	Sort the information in a visual and understanding mode from the selection and qualification of the data.
6	Treat high-dimensional data environments knowing their limitations and how to present the results
7	Collaborate in a computing environment that requires structuring and planning.
8	Present information visually and in an orderly manner to improve decision- making.

COURSE CONTRIBUTION TO... PROGRAM COMPETENCIES
(Blank : no contribution, 1: least contribution ... 5: highest contribution)

No	Program Competencies	Cont.
Master of Science in Business Intelligence and Data Science Program
1	Demonstrate understanding the value of data driven decision making.	4
2	Graduates will acquire the ability to make informed decisions based on data analysis and interpretation.	4
3	Identify the basic concepts that underpin today’s organizational IT infrastructures, such as concepts of databases, information systems, operations and processes, cloud computing, data warehousing and Big Data, Data Mining and Machine Learning.	2
4	Students will develop advanced skills in data analysis techniques, including statistical analysis, data mining, data visualization, and predictive modeling.	5
5	Apply data mining/analytics (statistical and machine-learning) in order to solve real-world business problems.	5
6	Develop skills related to data analytics pipeline from collection, processing, analysis and interpretation	4
7	Graduates will develop strong communication skills to effectively present complex data analysis findings to diverse stakeholders.	2
8	Effectively communicate to top management the results and implications arising from data analytics, security risk assessments, and emerging technologies.	2
9	Demonstrate professionalism and leadership by taking initiatives within their domain of responsibility while working effectively with other team members.	2
10	The program offers practical training and exposure to industry-standard software and tools used in business intelligence and data analysis.	2

COURSE EVALUATION METHOD

Method	Quantity	Percentage
Midterm Exam(s)	1	15
Presentation	1	15
Project	1	60
Laboratory	2	5
	Total Percent:	100%

ECTS (ALLOCATED BASED ON STUDENT WORKLOAD)

Activities	Quantity	Duration(Hours)	Total Workload(Hours)
Course Duration (Including the exam week: 16x Total course hours)	16	3	48
Hours for off-the-classroom study (Pre-study, practice)	1	47	47
Mid-terms	1	2	2
Assignments	1	4	4
Final examination	1	3	3
Other	1	21	21
Total Work Load:			125
Total Work Load/25(h):			5
ECTS Credit of the Course:			5

CONCLUDING REMARKS BY THE COURSE LECTURER

Studying Big Data Management is important because it equips individuals with the skills to analyze and leverage vast amounts of data to make informed decisions, drive innovation, and gain a competitive edge in today's data-driven world.