- 1) You have been asked to investigate the relative performance of a banked versus pipelined L1 data cache for a new microprocessor. Assume a 64 KB two-way set associative cache with 64-byte blocks. The pipelined cache would consist of three pipe stages, similar in capacity to the Alpha 21264 data cache. A banked implementation would consist of two 32 KB two-way set associative banks. Use CACTI and assume a 65 nm (0.065 m) technology to answer the following questions. The cycle time output in the web version shows at what frequency a cache can operate without any bubbles in the pipeline. What is the cycle time of the cache in comparison to its access time, and how many pipe stages will the cache take up (to two decimal places)? Compare the area and total dynamic read energy per access of the pipelined design versus the banked design. State which takes up less area and which requires more power and explain why that might be.2) A cache acts as a filter. For example, for every 1000 instructions of a program, an average of 20 memory accesses may exhibit low enough locality that they cannot be serviced by a 2 MB cache. The 2 MB cache is said to have an MPKI (misses per thousand instructions) of 20, and this will be largely true regardless of the smaller caches that precede the 2 MB cache. Assume the following cache/latency/MPKI values: 32 KB/1/100, 128 KB/2/80, 512 KB/4/50, 2 MB/8/40, 8 MB/16/10. Assume that accessing the off-chip memory system requires 200 cycles on average. For the following cache configurations, calculate the average time spent accessing the cache hierarchy. What do you observe about the downsides of a cache hierarchy that is too shallow or too deep?a. 32 KB L1; 8 MB L2; off-chip memoryb. 32 KB L1; 512 KB L2; 8 MB L3; off-chip memory c. 32 KB L1; 128 KB L2; 2 MB L3; 8 MB L4; off-chip memory
- 3) You are designing a PMD and optimizing it for low energy. The core, including an 8 KB L1 data cache, consumes 1 W whenever it is not in hibernation. If the core has a perfect L1 cache hit rate, it achieves an average CPI of 1 for a given task, that is, 1000 cycles to execute 1000 instructions. Each additional cycle accessing the L2 and beyond adds a stall cycle for the core. Based on the following specifications, what is the size of L2 cache that achieves the lowest energy for the PMD (core, L1, L2, memory) for that given task?
- The core frequency is 1 GHz, and the L1 has an MPKI of 100.
- A 256KB L2 has a latency of 10cycles, an MPKI of 20, a background power of 0.2 W, and each L2 access consumes 0.5 nJ.
- A 1MB L2 has a latency of 20 cycles, an MPKI of 10, a background power of 0.8 W, and each L2 access consumes 0.7 nJ.
The memory system has an average latency of 100 cycles, a background power of 0.5 W, and each memory access consumes 35 nJ.
4) The ways of a set can be viewed as a priority list, ordered from high priority to low priority. Every time the set is touched, the list can be reorganized to change block priorities. With this view, cache management policies can be decomposed into three sub-policies: Insertion, Promotion, and Victim Selection. Insertion defines where newly fetched blocks are placed in the priority list. Promotion defines how a block’s position in the list is changed every time it is touched (a cache hit). Victim Selection defines which entry of the list is evicted to make room for a new block when there is a cache miss.
a. Can you frame the LRU cache policy in terms of the Insertion, Promotion, and Victim Selection sub-policies?
Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
b. Can you define other Insertion and Promotion policies that may be competitive and worth exploring further?
Turn in your highest-quality paper
Get a qualified writer to help you with
“ Computer Science Question ”
Get high-quality paper
NEW! AI matching with writer