Index of Cache Architectures

1. Introduction

The cache is a special buffer memory witch is situated between the processor and the real memory.

There are different cache types:

  • L1: On modern CPU's this cache is accessed in one CPU cycle (very fast). There exists normally 2 different caches, one for data and one for text (Haward-Architecture).
  • L2: Since L1 cache is very expensive there exists some slower cache, but it is also larger. Specially for multiprogramming a large L2 cache is needed to store data of different processes.
  • L3: Was introduced for the coherence cache protocol to maintain data consistent within the different caches of each core (multiprocessor)

1.1 Principle of locality

  • local space. The program code runs in a small area of memory
  • There is a great possibility to access a memory word more than once in the same local code (local variables, counters etc.)

1.2 How it works

The cache is a very fast memory, but it is small comparing to the RAM. Processor tries to access a memory word.

  1. Lookup in cache. If word is in cache return it to the processor (1 cycle). This is also called a hit.
  2. If the word is not in cache (miss) the request is passed to the memory over the bus. If the memory returns the word it is first stored in the cache and then returned to the processor.
    1. If the place in cache is not empty and the word was modified (dirty) the content must be written back to the memory first.
    2. Put the memory word into cache.

The hit rate is very important to improve the power of a processor. Assume for instance that a memory access takes about 3 waiting states and the hit rate is about 90% then

10% x 3 = 0.3 wait states /memory cyles

If the hit rate is only 80% the value above is doubled.

If the average time of an instruction takes 2 cycles without waiting state and 5 cycle with waiting state and the hit rate is 80%, the execution of 10 instructions takes about

10 x [(0.8 x 0.2) + (0.2 x 5)] = 26 cycles.

This means that the processor uses 40% of the time looking for 20% of the code. During this time the bus is occupied to transfer the data from memory to processor.

The better the hit rate is the better is the performance of the processor and for multiprocessors the entire system can be improved dramatically.