GTU Data Warehousing And Data Mining - December 2014 Exam Question Paper

GTU Information Technology (Semester 7)
Data Warehousing And Data Mining
December 2014

Total marks: --
Total time: --

INSTRUCTIONS
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary

1 (a) Define KDD. How data mining techniques applied over multimedia database, temporal database and spatial database to extract useful knowledge.

7 M

1 (b) What is concept hierarchy? List and explain types of concept hierarchy in detail.

7 M

2 (a) What is data cleaning? Discuss various ways of handling missing values during data cleaning.

7 M

2 (b) (i) Explain Star and Fact Galaxy schemas used in data warehouse for multidimensional database.

3 M

2 (b) (ii) Differentiate OLAP vs. OLTP

4 M

2 (c) (i) What is Cuboid? Explain various OLAP operations on data cube with suitable example.

3 M

2 (c) (ii) Differentiate Fact table vs. Dimension table.

4 M

3 (a) Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
i) Use min-max normalization to transform the value 45 for age onto the range [0:0, 1:0]
ii) Use z-score normalization to transform the value 45 for age, where the standard deviation of age is 20.64 years.

7 M

3 (b) State the Apriori Property. Generate large itemsets and association rules using Apriori algorithm on the following data set with minimum support value and minimum confidence value set as 50% and 75% respectively.

TID	Items Purchased
T101	Cheese,Milk ,Cookies
T102	Butter,Milk,Bread
T103	Cheese,Butter,Milk,Bread
T104	Butter,Bread

7 M

3 (c) What is noise? Explain data smoothing methods as noise removal technique to divide given data into bins of size 3 by bin partition (equal frequency), by bin means, by bin medians and by bin boundaries. Consider the data: 10, 2, 19, 18,20, 18, 25, 28, 22

7 M

3 (d) List two shortcomings of the algorithms which helped in improving the efficiency of Apriori algorithm. Discuss any TWO variations of the Apriori algorithm to improve the efficiency.

7 M

4 (a) How K-Mean clustering method differs from K-Medoid clustering method? Discuss the process of K-Mean clustering. Also outline major drawbacks of K-Mean clustering technique.

7 M

4 (b) Explain how the accuracy of a classifier can be measured. How Bagging strategy helps improving the classifier accuracy?

7 M

4 (c) What is supervised learning? Using the given table, show how the ROOT splitting attribute is selected using InfoGain measure in the overall process of decision tree induction.

Attributes
No.	Outlook	Temperature	Humidity	Windy	Class
1	Sunny	Hot	High	FALSE	N
2	Sunny	Hot	High	TRUE	N
3	Overcast	Hot	High	FALSE	P
4	Rain	Mild	High	FALSE	P
5	Rain	Cool	Normal	FALSE	P
6	Rain	Cool	Normal	TRUE	N
7	Overcast	Cool	Normal	TRUE	P
8	Sunny	Mild	High	FALSE	N
9	Sunny	Cool	Normal	FALSE	P
10	Rain	Mild	Normal	FALSE	P
11	Sunny	Mild	Normal	TRUE	P
12	Overcast	Mild	High	TRUE	P
13	Overcast	Hot	Normal	FALSE	P
14	Rain	Mild	High	TRUE	N

7 M

4 (d) Explain Linear Regression and Non-linear Regression techniques of prediction.

7 M

5 (a) What is web log? Explain web structure mining and web usage mining in detail.

7 M

5 (b) Discuss the application of data warehousing and data mining in government sector.

7 M

5 (c) Explain the information retrieval methods used in text mining.

7 M

5 (d) What are neural networks? Describe the various factors which make them useful for classification and prediction in data mining. Explain how the topology of neural network is designed.

7 M

More question papers from Data Warehousing And Data Mining

SPONSORED ADVERTISEMENTS