SPPU Computer Engineering (Semester 7)
Data Mining Techniques and Applications
May 2017
Total marks: --
Total time: --
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary

Solve any one question from Q.1(a,b,c) &Q.2(a,b,c)
1(a) What are the different data normalization methods? Explain them in brief.
6 M
1(b) Consider the training examples shown in the table below for a binary classification problem.
Instance A1 A2 Class
1 T T Yes
2 T T Yes
3 T F No
4 F F Yes
5 F T No
6 F T No
7 F F No
8 T F Yes
9 F T No

i) What is the entropy of this collection of training examples with respect of the 'Yes' class
ii) What are the infomation gains of A1 and A2 relative to these training examples?
6 M
1(c) Explain with suitable example the frequent item set generation in Apriori algorithm.
8 M

2(a) What is data prerocessing? Explain the different steps in data preprocessing.
6 M
2(b) Explain with example K-Nearest-Neighbor Classifier.
6 M
2(c) Explain the following terms:
i) Support count
ii) Support
iii) Frequent itemset
iv) Closed itemset.
8 M

Solve any one question from Q.3(a,b,c) &Q.4(a,b,c)
3(a) What are interval-scaled variables? Describe the distance measures that are commonly used for computing the dissimilarity of objects described by such varibles.
8 M
3(b) What is meant by complete link hierarchical clustering?
6 M
3(c) Consider the following vectors x and y. X=[1,1,1,1] y=[2,2,2,2]. Calculate:
i) Cosine Similarity
ii) Euclidean distance.
3 M

4(a) Explain with suitable example K-medoids algorithm.
8 M
4(b) Differentiate between the following:
i) Partitioning and hierarchical clustering
ii) Centroid and average link hierarchical clustering
iii) Symmetric and asymmetric binary varibles.
6 M
4(c) How the Mahattan distance between the two objects is calculated?
3 M

Solve any one question from Q.5(a,b,c) &Q.6(a,b)
5(a) What is Web coontent mining? Explain in brief.
7 M
5(b) Assume 'd' is the set of documents and 't' is the term. Write the formulas to determine.
i) Term frequency freq(d, t)
ii) Weighted term frequency TF (d, t)
iii) Inverse document frequency (IDF) (t)
iv) TE-IDF measure TF-IDF (d, t)
8 M
5(c) What is Web crawler?
2 M

6(a) Compare the different text mining approaches.
9 M
6(b) Explain the following terms:
i) Recommender systems:
ii) Inverted index
iii) Feature vector
iv) Signature file.
8 M

Solve any one question from Q.7(a,b) &Q.8(a,b)
7(a) Explain with neat diagram systematic machine learning framework.
8 M
7(b) Write short notes on:
i) Big data
ii) Multi-perspective decision making.
8 M

8(a) What is reinforement learning? Explain.
8 M
8(b) Write short notes on:
i) Wholistic learning
ii) Machine learning
8 M

More question papers from Data Mining Techniques and Applications