MU Information Technology (Semester 6)
Data Mining & Business Intelligence
May 2015
Total marks: --
Total time: --
INSTRUCTIONS
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary


1 (a) Describe the different types of attributes one may come across in a data mining data set with two examples of each type.
5 M
1 (b) Explain the different distance measures that can be used to compute distance between two clusters.
5 M
1 (c) Define "Business Intelligence" and "Support System" with examples.
5 M
1 (d) Define "Outlier". What are the different types of Outliers that occur in dataset?
5 M

2 (a) Consider the following data points: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
i) What is the mean of the data? What is the median?
ii) What is the mode of the data?
iii) What is the mid-range of the data?
iv) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
v) Show a box plot of the data.
10 M
2 (b) Design a BI system for fraud detection. Describe all the steps from Data collection to Decision Making clearly.
10 M

3 (a) Illustrate any one classification technique for the above data set. Show how we can classify a new tuple. With (Homeowner=Yes; status=Employed; Income=Average).
Id Homeowner Status Income Defaulted
1 Yes Employed High No
2 No Business Average No
3 No Employed Low No
4 Yes Business HIgh No
5 No Unemployed Average Yes
6 No Business Low No
7 Yes Unemployed High No
8 No Employed Average Yes
9 No Business Low No
10 No Employed Average Yes
10 M
3 (b) Why is Data Preprocessing required? Explain the different steps involved in Data Preprocessing.
10 M

4 (a) Use K-means to cluster the following data set into 3 clusters.
Protein 20 21 15 22 20 25 26 20 18 20
Fat 9 9 7 17 8 12 14 9 9 9
10 M
4 (b) Describe the different visualization techniques that can be used in Data Mining.
10 M

5 (a) Consider the following transaction database:
TID Items
01 A,B,C,D
02 A,B,C,D,E,G
03 A,C,G,H,K
04 B,C,D,E,K
05 D,E,F,H,L
06 A,B,C,D,L
07 B,I,E,K,L
08 A,B,D,E,LK
09 A,E,E,H,L
10 B,C,D,F

Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 70% and find all the association rules in the data set.
10 M
5 (b) Explain different methods that can be used evaluate and compare the accuracy of different classification algorithms.
10 M

6 (a) DBSCAN clustering algorithm with an example.
10 M
6 (b) Multilevel and Multidimensional Association rules.
10 M



More question papers from Data Mining & Business Intelligence
SPONSORED ADVERTISEMENTS