MORE IN Data Mining Techniques and Applications
SPPU Computer Engineering (Semester 7)
Data Mining Techniques and Applications
December 2016
Total marks: --
Total time: --
INSTRUCTIONS
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary

Solve any one question.Q.1(a,b,c) and Q.2(a,b,c)
1(a) In real-world data, tuples with missing values for some attributes are a common occurrence. Describe various methods for handling this problem.
6 M
1(b) Explain the following terms:
i) Constraint- based rule mining.
ii) Closed and maximal frequent itemsets.
6 M
1(c) Consider the following data for a binary class problem.
 A B Class T F P T T P T T N T F P T T P F F N F F N F F N T T P T F N

i) Compute the information gain for A1 and A2.
ii) What is the best split between A1 and A2 according to Information gain?
iii) Compute the Gini index for A1 and A2.
iv) What is the best split between A1 and A2 according to Gini index?
8 M

2(a) Consider the market basket transactions shown below:
 Transaction ID Items bought T1 {M, A, B, D} T2 {A, D, C, B, F} T3 {A, C, B, F} T4 {A, B, D}
Assuming the minimum support of 50% and minimum cofidence of 80%
i) Find all frequent itemsets using Apriori algorithm.
ii) Find all association rules using Apriori algorithm
6 M
2(b) What are the major tasks in data preprocessing? Explain them in brief.
6 M
2(c) Explain with suitable example:
i) k-Nearest-Neighbor Classifier
ii) Scalable decision tree
8 M

Solve any one question.Q.3(a,b,c) and Q.4(a,b,c)
3(a) Consider the following points six points: P1(0.40, 0.53), P2(0.22, 0.38), P3(0.35, 0.32), P4(0.26, 0.19), P5(0.08, 0.41) and P6(0.45, 0.30). Perform the single link hierarchical clustering and show your reults by drawing a dendrogram.
8 M
3(b) Explain with suitable example the k-medoids algorithm
6 M
3(c) What are the requirements of clustering in data mining?
3 M

4(a) What is meant by cluster analysis?
4 M
4(b) Explain with suitable the K-means algorithm.
5 M
4(c) Differentiate between following clustering methods
ii) Hierarchical and partitioning.
8 M

Solve any one question.Q.5(a,b,c) and Q.6(a,b,c)
5(a) Precision and recall are two essential quality measures of an information retrieval system.
i) Why it is usual practice to trade one measure for the other? Explain.
ii) Why F-score is good measure for rate between precision and recall.
6 M
5(b) Compare the different text mining approaches.
5 M
5(c) Expalin the following terms:
i) Bag of words
ii) Feature vector
6 M

6(a) What is Web usage mining? Explain in brief.
6 M
6(b) Differentiate between document selection and document ranking methods of information retrieval.
5 M
6(c) Explain the following terms:
i) Authoritative Web pages
ii) Hub pages
iii) Document Object Model (DOM) structure
6 M

Solve any one question.Q.7(a,b,c) and Q.8(a,b,c)
7(a) What is meant by machine learning? Differentiate between supervised and unsupervised machine learning.
6 M
7(b) What are the similarities and differences between reinforcement learning and artificial intelligence algorithms?
5 M
7(c) Write short note on mining of big data.
5 M

8(a) What is meant by wholistic learning?
4 M
8(b) Briefly explain the reinforcement learning .
6 M
8(c) What is meant by multi-perspective decision making? Explain.
6 M

More question papers from Data Mining Techniques and Applications