1 (a)
Describe the different types of attributes one may come across in a data mining data set with two examples of each type.
5 M
1 (b)
Explain the different distance measures that can be used to compute distance between two clusters.
5 M
1 (c)
Define "Business Intelligence" and "Support System" with examples.
5 M
1 (d)
Define "Outlier". What are the different types of Outliers that occur in dataset?
5 M
2 (a)
Consider the following data points: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
i) What is the mean of the data? What is the median?
ii) What is the mode of the data?
iii) What is the mid-range of the data?
iv) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
v) Show a box plot of the data.
i) What is the mean of the data? What is the median?
ii) What is the mode of the data?
iii) What is the mid-range of the data?
iv) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
v) Show a box plot of the data.
10 M
2 (b)
Design a BI system for fraud detection. Describe all the steps from Data collection to Decision Making clearly.
10 M
3 (a)
Illustrate any one classification technique for the above data set. Show how we can classify a new tuple. With (Homeowner=Yes; status=Employed; Income=Average).
Id | Homeowner | Status | Income | Defaulted |
1 | Yes | Employed | High | No |
2 | No | Business | Average | No |
3 | No | Employed | Low | No |
4 | Yes | Business | HIgh | No |
5 | No | Unemployed | Average | Yes |
6 | No | Business | Low | No |
7 | Yes | Unemployed | High | No |
8 | No | Employed | Average | Yes |
9 | No | Business | Low | No |
10 | No | Employed | Average | Yes |
10 M
3 (b)
Why is Data Preprocessing required? Explain the different steps involved in Data Preprocessing.
10 M
4 (a)
Use K-means to cluster the following data set into 3 clusters.
Protein | 20 | 21 | 15 | 22 | 20 | 25 | 26 | 20 | 18 | 20 |
Fat | 9 | 9 | 7 | 17 | 8 | 12 | 14 | 9 | 9 | 9 |
10 M
4 (b)
Describe the different visualization techniques that can be used in Data Mining.
10 M
5 (a)
Consider the following transaction database:
Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 70% and find all the association rules in the data set.
TID | Items |
01 | A,B,C,D |
02 | A,B,C,D,E,G |
03 | A,C,G,H,K |
04 | B,C,D,E,K |
05 | D,E,F,H,L |
06 | A,B,C,D,L |
07 | B,I,E,K,L |
08 | A,B,D,E,LK |
09 | A,E,E,H,L |
10 | B,C,D,F |
Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 70% and find all the association rules in the data set.
10 M
5 (b)
Explain different methods that can be used evaluate and compare the accuracy of different classification algorithms.
10 M
6 (a)
DBSCAN clustering algorithm with an example.
10 M
6 (b)
Multilevel and Multidimensional Association rules.
10 M
More question papers from Data Mining & Business Intelligence