1(a)
Define 'Data Mining'. Enumerate five example applications that can benefit by using Data Mining.
5 M
1(b)
What is Data Preprocessing? Explain the different methods for the Data Cleansing phase.
5 M
1(c)
What is hierarchical clustering? Explain any two techniques for finding distance between the clusters in hierarchical clustering.
5 M
1(d)
Explain the concept of a decision support system with the help of an example application.
5 M
2(a)
Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries.
Data: 11,13,13,15,15,16,19,20,20,20,21,21,22,23,24,30,40,45,45,45,71,72,73,75.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries.
Data: 11,13,13,15,15,16,19,20,20,20,21,21,22,23,24,30,40,45,45,45,71,72,73,75.
10 M
2(b)
For the same set of data points in question 2.(a)
(a) Find Mean, Median and Mode.
(b) Show a boxplot of the data. Clearly indicating the five-number summary.
(a) Find Mean, Median and Mode.
(b) Show a boxplot of the data. Clearly indicating the five-number summary.
10 M
3(a)
The table below shows a sample dataset of whether a customer responds to a survey or not. 'Outcome' is the class label.
Construct a Decision Tree Classifier for the dataset. For a new example (Rural, semidetached, low, No), what will be the predicated class label?
Construct a Decision Tree Classifier for the dataset. For a new example (Rural, semidetached, low, No), what will be the predicated class label?
District | House Type | Income | Previous Customers | Outcome |
Suburban | Detached | High | No | Nothing |
Suburban | Detached | High | Yes | Nothing |
Suburban | Detached | High | No | Responded |
Urban |
Semi- Detached |
High | NO | Responded |
Urban |
Semi- Detached |
Low | NO | Responded |
Urban |
Semi- Detached |
Low | NO | NOthing |
Rural |
Semi- Detached |
Low | Yes | Responded |
Suburban | Terrace | High | NO | Nothing |
Suburban |
Semi- Detached |
Low | NO | Responded |
Urban | Terrace | Low | NO | Responded |
Suburban | Terrace | Low | Yes | Responded |
Rural | Terrace | High | Yes | Responded |
Rural | Detached | Low | No | Responded |
Urban | Terrace | High | Yes | Nothing |
10 M
3(b)
Briefly explain Bagging and Boosting of Classifiers
10 M
4(a)
Use the Apriori to algorithm to identify the frequent item-sets in the folloeing database. Then extract the strong association rules from these sets.
Min. Support = 30% Min. Confidence=75%
Min. Support = 30% Min. Confidence=75%
TID | Items |
01 | A, B, D, E, F |
02 | B, C, E |
03 | A, B, D, E |
04 | A, B, C, E |
05 | A, B, C, D, E, F |
06 | B, C, D |
07 | A, B, D, E |
10 M
4(b)
Explain multidimensional and multi level Association rules with examples.
10 M
5(a)
use any hierarchical clustering algorithm to cluster the following 8 example into 3 clusters:
A1=(2, 10), A2=(2, 5), A3=(8, 4), A4=(5, 8),
A5=(7, 5), A6(6, 4), A7=(1, 2), A8=(4, 9)
A1=(2, 10), A2=(2, 5), A3=(8, 4), A4=(5, 8),
A5=(7, 5), A6(6, 4), A7=(1, 2), A8=(4, 9)
10 M
5(b)
What is an outlier? Describe methods that can be used for outlier analysis.
10 M
6(a)
Consider the following case study: An International chain of hotels wants to analysis and improve its performance using several performance indicators-quality of rooms, service facilities, check in, breakfast , popular time of visits, duration of stay etc.
For this case study design a B1 system, clearly explaining all steps from data collection to decision making.
For this case study design a B1 system, clearly explaining all steps from data collection to decision making.
10 M
6(b)
Clearly explain the working of the DB_SCAN algorithm using appropriate diagrams.
10 M
More question papers from Data Mining & Business Intelligence