VTU Computer Science (Semester 7)
Advanced Computer Architectures
December 2016
Total marks: --
Total time: --
INSTRUCTIONS
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary


1(a) Define the term-instruction set arechitecture. In what computer architecture is related to ISA? Corelate them.
4 M
1(b) Elaborate the different parameter that decide the cost of an IC. Give th equation of each parameter separately and explain them.
6 M
1(c) Define the term-CPI and derive the equation for finding the total number of processor cycles needed to excute a program. Consider the execution of an object code with 200.000 instructions on a 20MHz processor. The program consists of four major types of instructions. The instruction mix and the number of cycles(CPI) needed for each instruction is given below based on the result of a program trace experiment.
SI.No Instruction type C.P.I Insturction mix
1 Arithmetic and logic 1 68%
2 Load/store with cache hit 2 8%
3 Branch 4 14%
4 Memory reference with cache 8 10%
i) Find the total number of cycles required to execute the program.
ii) Calculate the average C.P.I when the program is executed on a uniprocessor system with the above trace results.
iii) Calculate the corresponding MIPS rate based on the CPI obtained in (i) above.
10 M

2(a) Discuss the various kinds of data dependencies that can cause problems to the smooth flow of instruction through pipelines. Give supporting example in each case and explain with an example how these dependencies can be eliminated.
10 M
2(b) Explain the principles of loop unrolling. Demonstrate the normal loop execution and loop unrolling concepts for the following C-code segment by translating the given code segment given below, to MIPS assembly language code. C-code: for (i=1000; i > 0; i = i -I) X [i] = X [i] + s where s = scalar value. i) Calculate the number of block cycles required per element for both unscheduled and scheduled loops in normal case considering stalls/ idle clock cycles.
ii) Repeat the above step for loop unrolled exexcution case with and without schedule.
iii) Calculate the average value of clock cycles per element for the(i) and (ii)
10 M

3(a) With reference to Branch Target Buffers(BTBs) explain. i) The purpose of each B.T.B entry and ii) The meaning of the following terms and the subsequent action taken for each of the following occurence of events. Case(1): Branch entry found in BTB entry and predicted branch not taken. Case(2): Branch entry found in BTB entry and predicted branch not taken.
8 M
3(b) Name the different techniques used for getting high performance of pipelines with multiple delivery techniques. What are integrated instruction Fetch units(IIFU)? Highlight on the basic function of such fetch units.
8 M
3(c) Compare register Re-naming technique with that of Re-order buffer technique in speculation concept.
4 M

4(a) With appropriate timing diagrams. Explain the concept of delayed branch technique used in RISC processors. What are its limitations and demonstrate the scheduling of branch delay with suitable examples.
10 M
4(b) Explain what is branch penalty? Discuss the different techniques used to reduce branch penalties.
6 M
4(c) Consider a non-pipelined processor in RISC. Assume that it has I ns clock cycle and that it uses 4 cycles fo ALU operations and branches and 5 cycles for memory operations .Assume that the relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose that due to clock skew and step up, pipeline the processor adds 0.2ns of overhead to the clock. Ignoring any latency impact, how much speed up in the instruction exexcution rate is achieved from this pipeline?
4 M

5(a) Explain with the help of appropriate pseudo statement the principle of spinlocks with EXCH synchronization primitive, highlight on its demerits. Explain the modified spinlock primitive psuedocode and its merits.
10 M
5(b) Explain the meaning of the following terms used in cache controlled state transition diagrams: i) Exclusive; ii) Shared and iii) invalid. Draw the state transition diagram for: i) Processor (C.P.U) request for each cache block.
ii) Bus requests for each cache block and list all the reponds to the events for (i) and (ii) in tabular form.
10 M

6(a) Explain the different compiler optimization techniques used to reduce miss rate.
10 M
6(b) Explain the process of: i) Protection via virtual memory
ii) Protection via virtual machines.
10 M

7(a) Explain any four memory hierachy questions in detail.
8 M
7(b) Explain the different techniques used to improve memory performance inside a DRAM chip.
8 M
7(c) A parallel processing system -C is having a degree of parallelism = 10. If f = fraction of the operations performed by C and are strictly scalar (cannot be processed in parallel), speed up for the tasks under consideration = 6.5, assuming that all other operations are processed at the maximum possible rate (vector), i) What is f?
ii) By how much f be reduced to increase the speed to 9.0?
4 M

Write short note any four Q.8(a,b,c,d,e)
8(a) Benchmarks
5 M
8(b) Detecting and enhancing loop level parallelism.
5 M
8(c) Hardware support for complier speculation.
5 M
8(d) Memory consistency models.
5 M
8(e) What makes pipelining hard to implement?
5 M



More question papers from Advanced Computer Architectures
SPONSORED ADVERTISEMENTS