Tuesday, June 25, 2013

Sniper Simulation Cache Anomaly

I have been running some simulations with Sniper, and I've come across an anomaly for which I have not yet been able to come up with an explanation. For a particular program, doubling the size of the L3 cache from 2 MiB to 4 MiB while keeping all other simulation parameters equal causes a noticeable slowdown. The figure below shows that with the 2 MiB cache, the run time is 13.1 ms, while with the 4 MiB cache the run time is 14.4 ms (about a 10% increase). The figure shows that something similar is seen with other cache sizes. At 8 MiB and 32 MiB, the run time drops back down to 13.1 ms, but at 16 MiB and 64 MiB, it jumps back up to 14.4 ms.




The cache statistics from the 2 MiB and 4 MiB simulations are given below. The number of misses to the L3 is only slightly higher with the 4 MiB L3 than it is with the 2 MiB L3. The difference of 15 misses (a 0.064% increase) does not seem to be large enough to account for the difference in run time. The data also shows that the number of accesses to the L3 has dropped noticeably with the increase in cache size (by 9245 accesses or 4.4%). This would suggest that the number of misses to the L2 has increased, which is indeed the case (by 4517 misses or 3.6%). This is a bit counterintuitive, as it seems that a change in the L3 size should not affect the L2 activity. Likewise, we see a change in the L1 instruction cache activity also. The only quantity that appears to remain constant is the number of accesses to the L1 data cache.

From a question I asked on the Sniper mailing list in May, it has become clear that the way that cache stats are counted is not as simple as one might first imagine. It is important to take such points into account when reviewing this data.

There is a post on the Sniper mailing list regarding a similar problem in which someone observed decreasing performance with increasing L2 size. However, this case involved a parallel application. According to that discussion, it seems that parallelism was a root cause of the unexpected behavior. In my case, however, I am running a single, sequential application.



2MiB L3:
Execution Time (ns) 13050531

Cache L1-I
num cache accesses 3430983
num cache misses 3105

Cache L1-D
num cache accesses 14697313
num cache misses 134331

Cache L2
num cache accesses 228931
num cache misses 126219

Cache L3
num cache accesses 211220
num cache misses 23432

4MiB L3:
Execution Time (ns) 14356859

Cache L1-I
num cache accesses 3440668
num cache misses 3337

Cache L1-D
num cache accesses 14697313
num cache misses 134352

Cache L2
num cache accesses 229190
num cache misses 121702

Cache L3
num cache accesses 201975
num cache misses 23447




Update: (3 July) I received a response from one of the Sniper developers. He recommended setting the "traceinput/address_randomization" configuration parameter to "false". I did so, and the new results show the same execution time for all cases where the L3 cache was set to a size larger than 1 MiB. The new results are shown in the figure below.