RGPV Computer Science and Engineering VII Semester | Unit-wise Notes, Syllabus, Important Questions & PYQ Resources
Data Mining and Warehousing is an open elective subject in RGPV CSE 7th semester. This subject covers data warehousing, OLAP systems, data preprocessing, data mining, classification, clustering and association rule mining.
Introduction, delivery process, data warehouse architecture, preprocessing, cleaning, integration, reduction, design, partitioning, data marts, metadata and multidimensional model.
Basic OLAP concepts, OLAP queries, types of OLAP servers, OLAP operations, data warehouse hardware and operational design including security, backup and recovery.
Data types, quality of data, preprocessing, similarity measures, summary statistics, distributions, data mining tasks, KDD, issues in data mining and fuzzy logic.
Classification, statistical-based algorithms, distance-based algorithms, decision tree algorithms, neural network algorithms, rule-based algorithms and probabilistic classifiers.
Hierarchical algorithms, partitional algorithms, BIRCH, DBSCAN, CURE, Apriori and FP-Growth algorithms for association rule mining.
| Unit | Topics |
|---|---|
| Unit 1 | Data Warehousing: introduction, delivery process, architecture, preprocessing, data cleaning, integration, transformation, reduction, design, schema, partitioning strategy, implementation, data marts, metadata, multidimensional data model and pattern warehousing. |
| Unit 2 | OLAP Systems: basic concepts, OLAP queries, types of OLAP servers, OLAP operations, data warehouse hardware, operational design, security, backup and recovery. |
| Unit 3 | Introduction to Data and Data Mining: data types, quality of data, preprocessing, similarity measures, summary statistics, data distributions, basic data mining tasks, data mining vs KDD, issues in data mining, fuzzy sets and fuzzy logic. |
| Unit 4 | Supervised Learning: classification, statistical-based algorithms, distance-based algorithms, decision tree-based algorithms, neural network-based algorithms, rule-based algorithms and probabilistic classifiers. |
| Unit 5 | Clustering and Association Rule Mining: hierarchical algorithms, partitional algorithms, clustering large databases, BIRCH, DBSCAN, CURE, Apriori and FP-Growth algorithms. |
For RGPV exams, focus on Data Warehouse Architecture, OLAP operations, Data Preprocessing, KDD, Classification, Decision Tree, Clustering, Apriori and FP-Growth. These topics are suitable for 7 marks and 14 marks answers.
Yes, it is scoring because many questions are theory-based and diagram-based.
Unit 1, Unit 2, Unit 4 and Unit 5 are very important for exam preparation.
Start with Data Warehouse basics, then OLAP, then Data Mining/KDD, then Classification and Clustering.