GTU Information Technology (Semester 7)
Data Warehousing And Data Mining
December 2014
Total marks: --
Total time: --
INSTRUCTIONS
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary


1 (a) Define KDD. How data mining techniques applied over multimedia database, temporal database and spatial database to extract useful knowledge.
7 M
1 (b) What is concept hierarchy? List and explain types of concept hierarchy in detail.
7 M

2 (a) What is data cleaning? Discuss various ways of handling missing values during data cleaning.
7 M
2 (b) (i) Explain Star and Fact Galaxy schemas used in data warehouse for multidimensional database.
3 M
2 (b) (ii) Differentiate OLAP vs. OLTP
4 M
2 (c) (i) What is Cuboid? Explain various OLAP operations on data cube with suitable example.
3 M
2 (c) (ii) Differentiate Fact table vs. Dimension table.
4 M

3 (a) Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
i) Use min-max normalization to transform the value 45 for age onto the range [0:0, 1:0]
ii) Use z-score normalization to transform the value 45 for age, where the standard deviation of age is 20.64 years.
7 M
3 (b) State the Apriori Property. Generate large itemsets and association rules using Apriori algorithm on the following data set with minimum support value and minimum confidence value set as 50% and 75% respectively.
TID Items Purchased
T101 Cheese,Milk ,Cookies
T102 Butter,Milk,Bread
T103 Cheese,Butter,Milk,Bread
T104 Butter,Bread
7 M
3 (c) What is noise? Explain data smoothing methods as noise removal technique to divide given data into bins of size 3 by bin partition (equal frequency), by bin means, by bin medians and by bin boundaries. Consider the data: 10, 2, 19, 18,20, 18, 25, 28, 22
7 M
3 (d) List two shortcomings of the algorithms which helped in improving the efficiency of Apriori algorithm. Discuss any TWO variations of the Apriori algorithm to improve the efficiency.
7 M

4 (a) How K-Mean clustering method differs from K-Medoid clustering method? Discuss the process of K-Mean clustering. Also outline major drawbacks of K-Mean clustering technique.
7 M
4 (b) Explain how the accuracy of a classifier can be measured. How Bagging strategy helps improving the classifier accuracy?
7 M
4 (c) What is supervised learning? Using the given table, show how the ROOT splitting attribute is selected using InfoGain measure in the overall process of decision tree induction.
Attributes
No. Outlook Temperature Humidity Windy Class
1 Sunny Hot High FALSE N
2 Sunny Hot High TRUE N
3 Overcast Hot High FALSE P
4 Rain Mild High FALSE P
5 Rain Cool Normal FALSE P
6 Rain Cool Normal TRUE N
7 Overcast Cool Normal TRUE P
8 Sunny Mild High FALSE N
9 Sunny Cool Normal FALSE P
10 Rain Mild Normal FALSE P
11 Sunny Mild Normal TRUE P
12 Overcast Mild High TRUE P
13 Overcast Hot Normal FALSE P
14 Rain Mild High TRUE N
7 M
4 (d) Explain Linear Regression and Non-linear Regression techniques of prediction.
7 M

5 (a) What is web log? Explain web structure mining and web usage mining in detail.
7 M
5 (b) Discuss the application of data warehousing and data mining in government sector.
7 M
5 (c) Explain the information retrieval methods used in text mining.
7 M
5 (d) What are neural networks? Describe the various factors which make them useful for classification and prediction in data mining. Explain how the topology of neural network is designed.
7 M



More question papers from Data Warehousing And Data Mining
SPONSORED ADVERTISEMENTS