MU Computer Engineering (Semester 8)
Data Warehouse & Mining
May 2013
Total marks: --
Total time: --
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary

1 (a) Differences between Data warehouse and Data mart
5 M
1 (b) For a Supermarket Chain consider the following dimensions, namely Product, store, time, promotion. The schema contains a central fact table, sales facts with measures unit_sales, dollars_sales and dollar_cost. Design STAR schema example: supermarket.
5 M
1 (c) Calculate the maximum number of base fact table records for warehouse with the following values given below:
- Time period: 5 years
- Store: 300 stores reporting daily sales
- Product: 40,000 products in each store (about 4000 sell in each store daily)
5 M
1 (d) Illustrate how the supermarket can use clustering methods to improve sales.
5 M

Define the following terms by giving examples:-
2 (a) Factless fact tables
5 M
2 (b) Snowflake schema
5 M
2 (c) Web Structure Mining
5 M
2 (d) Concept Hierarchy
5 M

3 (a) Apply Agglomerative Hierarchical Clustering and draw single Link and average Link dendrogram for the following distance matrix.

  A B C D E
A 0 2 6 10 9
B 2 0 3 9 8
C 6 3 0 7 5
D 10 9 7 0 4
E 9 8 5 4 0
10 M
3 (b) Explain Page Rank technique with algorithm.
10 M

4 (a) Consider a data warehouse for a hospital, where there are three dimensions (1) Doctor (2) Patient (3) Time
and two measures (1) Count & (2) Fees
For this example create a OLAP cube and describe the following OLAP operations:
(1) Slice (2) Dice (3) Rollup (4) Drill Down (5) Pivot
10 M
4 (b) Consider the following transaction database:
TID Items
01 A,B,C,D
02 A,B,C,D,E,G
03 A,C,G,H,K
04 B,C,D,E,K
05 D,E,F,H,L
06 A,B,C,D,L
07 B,I,E,K,L
08 A,B,D,E,K
09 A,E,F,H,L
10 B,C,D,F

Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 70%, and find all the association rules in the data set.
10 M

5 (a) A simple example from the stock market involving only discrete ranges has Profit as categorically attribute, with values {up, down} and the training data is

Old Yes Software Down
Old No Software Down
Old No Hardware Down
Mid Yes Software Down
Mid Yes Hardware Down
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
New No Hardware Up
New No Software Up

Apply decision tree algorithm and show the generated rules.
10 M
5 (b) What is meant by ETL? Explain the ETL process in detail.
10 M

6 (a) Define multidimensional and multilevel association mining.
10 M
6 (b) Explain role for Meta data in Data Warehouse.
10 M

Write detailed notes on:-
7 (a) Data Warehouse Architecture.
10 M
7 (b) K-Means Clustering.
10 M

More question papers from Data Warehouse & Mining