1(a)
What are the three Vs of Big Data? Give two examples of big data case studies. Indiacte which Vs are satisfied by these case studies.
5 M
1(b)
What is the role of a "combiner" in the Map reduce framework? Explain with the help of an example.
5 M
1(c)
Through an example illustrate how the triangular array can be usedn to optimally store and count pairs in a frequent itemset mining algorithm.
5 M
1(d)
List the different issues and challenges in data stream query processing.
5 M
2(a)
What are the different data architecture patterns on NOSQL? Explain "key value" store and "Document" store patterns with relevant examples.
10 M
2(b)
Show Map Reduce implementation for the following two tasks using pseudocode.
i) Multiplication of two matrces
ii) Computing Group-by and aggregation of a relational table.
i) Multiplication of two matrces
ii) Computing Group-by and aggregation of a relational table.
10 M
3(a)
Give a formal definition of the Nearest Neighbor problem. Show how finding plagiarism in documents is Nearest Neighbor problem. What similarity measures can be used.
10 M
3(b)
Clearly explain the concept of a Bloom Filter with the help of an example.
10 M
4(a)
Suppose a data stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5. Let the hash function being used is h(x) = 3x + 1 mod 5; Show how the Flajolet- Martin Algorithm will estimate the number of distinct element in this stream.
10 M
4(b)
Clearly explain how the CURE algorithm can be used to cluster big data sets.
10 M
5(a)
Define Collaborative filtering. Using an example of an e-commerce site like Filpkart of Amazon describe how it can be used to provide recommendations to users.
10 M
5(b)
Define PageRank. Using the web graph shown below compute the PageRank at every node at the end of the second iteration. Use teleport factor = 0.8
!mage
!mage
10 M
6(a)
Explain clearly with diagrams how the PCY algorithm helps to perform frequent itemset mining for large datasets.
10 M
6(b)
For the graph given below use betweenness factor and find all communities
!mage
!mage
10 M
More question papers from Big Data Analytics