Solve any one question from Q.1(a,b,c) &Q.2(a,b,c)

1(a)
What are the different data normalization methods? Explain them in brief.

6 M

1(b)
Consider the training examples shown in the table below for a binary classification problem.

i) What is the entropy of this collection of training examples with respect of the 'Yes' class

ii) What are the infomation gains of A1 and A2 relative to these training examples?

Instance | A1 | A2 | Class |

1 | T | T | Yes |

2 | T | T | Yes |

3 | T | F | No |

4 | F | F | Yes |

5 | F | T | No |

6 | F | T | No |

7 | F | F | No |

8 | T | F | Yes |

9 | F | T | No |

i) What is the entropy of this collection of training examples with respect of the 'Yes' class

ii) What are the infomation gains of A1 and A2 relative to these training examples?

6 M

1(c)
Explain with suitable example the frequent item set generation in Apriori algorithm.

8 M

2(a)
What is data prerocessing? Explain the different steps in data preprocessing.

6 M

2(b)
Explain with example K-Nearest-Neighbor Classifier.

6 M

2(c)
Explain the following terms:

i) Support count

ii) Support

iii) Frequent itemset

iv) Closed itemset.

i) Support count

ii) Support

iii) Frequent itemset

iv) Closed itemset.

8 M

Solve any one question from Q.3(a,b,c) &Q.4(a,b,c)

3(a)
What are interval-scaled variables? Describe the distance measures that are commonly used for computing the dissimilarity of objects described by such varibles.

8 M

3(b)
What is meant by complete link hierarchical clustering?

6 M

3(c)
Consider the following vectors x and y. X=[1,1,1,1] y=[2,2,2,2]. Calculate:

i) Cosine Similarity

ii) Euclidean distance.

i) Cosine Similarity

ii) Euclidean distance.

3 M

4(a)
Explain with suitable example K-medoids algorithm.

8 M

4(b)
Differentiate between the following:

i) Partitioning and hierarchical clustering

ii) Centroid and average link hierarchical clustering

iii) Symmetric and asymmetric binary varibles.

i) Partitioning and hierarchical clustering

ii) Centroid and average link hierarchical clustering

iii) Symmetric and asymmetric binary varibles.

6 M

4(c)
How the Mahattan distance between the two objects is calculated?

3 M

Solve any one question from Q.5(a,b,c) &Q.6(a,b)

5(a)
What is Web coontent mining? Explain in brief.

7 M

5(b)
Assume 'd' is the set of documents and 't' is the term. Write the formulas to determine.

i) Term frequency freq(d, t)

ii) Weighted term frequency TF (d, t)

iii) Inverse document frequency (IDF) (t)

iv) TE-IDF measure TF-IDF (d, t)

i) Term frequency freq(d, t)

ii) Weighted term frequency TF (d, t)

iii) Inverse document frequency (IDF) (t)

iv) TE-IDF measure TF-IDF (d, t)

8 M

5(c)
What is Web crawler?

2 M

6(a)
Compare the different text mining approaches.

9 M

6(b)
Explain the following terms:

i) Recommender systems:

ii) Inverted index

iii) Feature vector

iv) Signature file.

i) Recommender systems:

ii) Inverted index

iii) Feature vector

iv) Signature file.

8 M

Solve any one question from Q.7(a,b) &Q.8(a,b)

7(a)
Explain with neat diagram systematic machine learning framework.

8 M

7(b)
Write short notes on:

i) Big data

ii) Multi-perspective decision making.

i) Big data

ii) Multi-perspective decision making.

8 M

8(a)
What is reinforement learning? Explain.

8 M

8(b)
Write short notes on:

i) Wholistic learning

ii) Machine learning

i) Wholistic learning

ii) Machine learning

8 M

More question papers from Data Mining Techniques and Applications