1(a)
Define "Data Mining". Enumerate five example applications that can benefit by using Data Mining.
5 M
1(b)
Clearly explain the data preprocessing phase for data mining.
5 M
1(c)
Describe one hierarchical clustering algorithm using an example dendrogram.
5 M
1(d)
Explain the concept of a decision support system with the help of an example application.
5 M
2(a)
Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries Data: 11, 13, 13, 15, 15, 16, 19, 20, 20, 20, 21, 21, 22, 23, 24, 30, 40, 45, 45, 71, 72, 73, 75
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries Data: 11, 13, 13, 15, 15, 16, 19, 20, 20, 20, 21, 21, 22, 23, 24, 30, 40, 45, 45, 71, 72, 73, 75
10 M
2(b)
For the same set of data points in question 2.(a)
a) Find Mean, Medium and Mode.
b) Show a boxplot of the data. Clearly indicating the five-number summary.
a) Find Mean, Medium and Mode.
b) Show a boxplot of the data. Clearly indicating the five-number summary.
10 M
3(a)
The table below shows a sample dataset of whether a customer reponds to a survey of not. " Outcome" is the class label. Construct a Naive Bayes' Classifier for the dataset. For a new example (Rural, semidetached, low,No), what will be the predicted class label?
District | House Type | Income | Previous Customer | Outcome |
Suburban | Detached | High | No | Nothing |
Suburban | Detached | High | Yes | Nothing |
Rural | Detached | High | No | Reponded |
Urban | Semi-detached | High | No | Reponded |
Urban | Semi-detached | Low | No | Reponded |
Urban | Semi-detached | Low | Yes | Nothing |
Rural | Semi-detached | Low | Yes | Reponded |
Suburban | Terrace | High | No | Nothing |
Suburban | Semi-detached | Low | No | Reponded |
Urban | Terrace | Low | No | Reponded |
Suburban | Terrace | Low | Yes | Reponded |
Rural | Terrace | High | Yes | Reponded |
Rural | Detached | Low | No | Reponded |
Urban | Terrace | High | Yes | Nothing |
10 M
3(b)
Briefly explain Regression based Classifiers.
10 M
4(a)
Using the Apriori algortihm to identify the frequent item-set in the following database. Them extract the strong association rules from these sets. Mini. Support = 30% Min. Confidence =75%
TID | Items |
01 | A, B, D, E, F |
02 | B, C, E |
04 | A, B, D, E |
04 | A, B, C, E |
05 | A, B, C, D, E,F |
06 | B, C, D |
07 | A, B, D,E |
10 M
4(b)
Explain multidimensional multi level Association rules with examples.
10 M
5(a)
What is clustering? Explain k-means clustering algorithm. Suppose the date for clustering is {2, 4, 10, 12, 3, 20 ,11, 25} Consider k=2, cluster the given data using K-means algorithm.
10 M
5(b)
What is an outlier? Describe methods that can be used for outlier analysis.
10 M
6(a)
Consider the following case study: A telecom company wants to analyze and improve its performance by introducing a series of innovative mobile payment plants. For this case study design a BI system, clearly explaining all steps from data collection to decision making.
10 M
6(b)
Clearly explain the working of the DBSCAN algorithm using appropriate diagrams.
10 M
More question papers from Data Mining & Business Intelligence