Solve any one question from Q.1(a,b,c) &Q.2(a,b,c)
1(a)
What are the different data normalization methods? Explain them in brief.
6 M
1(b)
Consider the training examples shown in the table below for a binary classification problem.
i) What is the entropy of this collection of training examples with respect of the 'Yes' class
ii) What are the infomation gains of A1 and A2 relative to these training examples?
Instance | A1 | A2 | Class |
1 | T | T | Yes |
2 | T | T | Yes |
3 | T | F | No |
4 | F | F | Yes |
5 | F | T | No |
6 | F | T | No |
7 | F | F | No |
8 | T | F | Yes |
9 | F | T | No |
i) What is the entropy of this collection of training examples with respect of the 'Yes' class
ii) What are the infomation gains of A1 and A2 relative to these training examples?
6 M
1(c)
Explain with suitable example the frequent item set generation in Apriori algorithm.
8 M
2(a)
What is data prerocessing? Explain the different steps in data preprocessing.
6 M
2(b)
Explain with example K-Nearest-Neighbor Classifier.
6 M
2(c)
Explain the following terms:
i) Support count
ii) Support
iii) Frequent itemset
iv) Closed itemset.
i) Support count
ii) Support
iii) Frequent itemset
iv) Closed itemset.
8 M
Solve any one question from Q.3(a,b,c) &Q.4(a,b,c)
3(a)
What are interval-scaled variables? Describe the distance measures that are commonly used for computing the dissimilarity of objects described by such varibles.
8 M
3(b)
What is meant by complete link hierarchical clustering?
6 M
3(c)
Consider the following vectors x and y. X=[1,1,1,1] y=[2,2,2,2]. Calculate:
i) Cosine Similarity
ii) Euclidean distance.
i) Cosine Similarity
ii) Euclidean distance.
3 M
4(a)
Explain with suitable example K-medoids algorithm.
8 M
4(b)
Differentiate between the following:
i) Partitioning and hierarchical clustering
ii) Centroid and average link hierarchical clustering
iii) Symmetric and asymmetric binary varibles.
i) Partitioning and hierarchical clustering
ii) Centroid and average link hierarchical clustering
iii) Symmetric and asymmetric binary varibles.
6 M
4(c)
How the Mahattan distance between the two objects is calculated?
3 M
Solve any one question from Q.5(a,b,c) &Q.6(a,b)
5(a)
What is Web coontent mining? Explain in brief.
7 M
5(b)
Assume 'd' is the set of documents and 't' is the term. Write the formulas to determine.
i) Term frequency freq(d, t)
ii) Weighted term frequency TF (d, t)
iii) Inverse document frequency (IDF) (t)
iv) TE-IDF measure TF-IDF (d, t)
i) Term frequency freq(d, t)
ii) Weighted term frequency TF (d, t)
iii) Inverse document frequency (IDF) (t)
iv) TE-IDF measure TF-IDF (d, t)
8 M
5(c)
What is Web crawler?
2 M
6(a)
Compare the different text mining approaches.
9 M
6(b)
Explain the following terms:
i) Recommender systems:
ii) Inverted index
iii) Feature vector
iv) Signature file.
i) Recommender systems:
ii) Inverted index
iii) Feature vector
iv) Signature file.
8 M
Solve any one question from Q.7(a,b) &Q.8(a,b)
7(a)
Explain with neat diagram systematic machine learning framework.
8 M
7(b)
Write short notes on:
i) Big data
ii) Multi-perspective decision making.
i) Big data
ii) Multi-perspective decision making.
8 M
8(a)
What is reinforement learning? Explain.
8 M
8(b)
Write short notes on:
i) Wholistic learning
ii) Machine learning
i) Wholistic learning
ii) Machine learning
8 M
More question papers from Data Mining Techniques and Applications