Tuesday, October 14, 2014

gem5: Instructions for Enabling Full System Memory Tracing

  1. Setup your proxy.
    • export http_proxy=http://myproxy.myserver.com:1234
  2. Clone the gem5-stable repo.
    •  hg clone http://repo.gem5.org/gem5-stable
  3. Install gem5's dependencies (http://www.m5sim.org/Dependencies).
    • Note that on CentOS 6, the standard repos don't provide a new enough version of SWIG. I was able to download SWIG from the SWIG website and build it.
    • On CentOS 6, the protobuf package doesn't include gzip_stream.h, which gem5 will ask for if you want to do memory tracing. I posted on StackOverflow about this problem, which eventually led me to the EPEL bug-tracking system. A bug was filed for this problem over two years ago, but it hasn't been fixed yet. To get around this problem, I downloaded protobuf from Google and built it.
  4. Add the bin directories for SWIG and protobuf builds to your PATH. 
  5. Add the lib directory for protobuf to your LIBRARY_PATH (note that this is not LD_LIBRARY_PATH).
  6. Apply this patch to make necessary modifications to code: link.
    • Copy it into the gem5-stable/ directory and run "hg import memTraceExample.patch".
    • You will need to modify the path given on line 416 to point to your protobuf include directory:
main.Append(CPPPATH=[Dir('<path to protobuf install directory>/include/')])
    • This patch adds the necessary modifications to insert CommMonitors to track all accesses to L1, to main memory, and to the data TLB. Because the L1 log will grow rapidly, for longer simulations, you will want the option to turn this off. To do so, you can edit configs/common/CacheConfig.py and change the call to the CommMonitor constructor for the L1 to make trace_enable=False.
    • This patch also modifies the code that does virtual to physical memory translation so that is prints the mapping of a virtual to physical address when a new page is allocated. By default, the code doesn't print the mapping in this case.
  1. Test in system call emulation mode.
    • This won't be doing full system simulation. Although in the end we want full system simulation, doing a run in SE mode will verify that we have things working up to this point. Run:
./build/X86/gem5.opt --debug-flags=MMU --debug-file=mmu_trace.log ./configs/example/se.py --cpu-type=timing --caches --l2cache -c tests/test-progs/hello/bin/x86/linux/hello 
    • The "--debug-flags=MMU --debug-file=mmu_trace.log" options will create the debug file "mmu_trace.log" in the m5out/ directory which logs the virtual to physical address translations.
    • The trace files CT_mon{1,2,3}.trz.gz in the directory m5out. The debug file shows the mappings of virtual to physical addresses, while the other files show the traces of memory access to L1, main memory, and the data TLB.
    • To view the traces of memory accesses in the CT_mon files, you will use a script provided by gem5.
    • To use this script, you will need to setup the python protobuf library. Go to the directory containing the protobuf package you downloaded in the earlier step. Go into the python subdirectory, and follow the steps in the README file.
    • To install to a directory other than the default, run an "export PYTHONPATH=<protobuf install directory>/lib/python2.6/site-packages/". When you get to the step to run "python setup.py install", instead run "python setup.py install --prefix=<protobuf install directory/lib/python2.6/site-packages/>".
    • Go back to you gem5-stable directory. Run the commands:

export PYTHONPATH=$PYTHONPATH:<protobuf-2.5.0 package directory>/python/

./util/decode_packet_trace.py m5out/CT_mon1.trc.gz outputfile.log

    • This should translate the protobuf output log into an ascii file named "outputfile.log" that contains the trace of L1 accesses.
  1. Test in full system mode.
    • To run in full system mode, you will need to download some more files from the gem5 website.
    • First, you'll need to download the full system files and config files for X86. They are available here: http://www.m5sim.org/Download. Extract both of these to a directory where they will live (for our example, we'll assume this directory is /home/user123/gem5-util/). They need to be in a place where gem5 can see them while they are running. When you extract these files, you will get two directories, "binaries" and "disks".
    • In the "disks" directory, there is a file called "linux-x86.img". This needs to be renamed to "x86root.img".
    • gem5 also needs a file called "linux2-bigswap.img" to be in the disks directory. The only place I've found this is in the full system files for the Alpha architecture posted on the gem5 website, here: http://www.m5sim.org/Download. Download the Alpha system files, extract them, and move the file "linux2-bigswap.img" from this package to the "disks" directory containing the file "x86root.img".
    • Now, set the environment variable M5_PATH to point to /home/user123/gem5-util (or wherever you put the "disks" and "config" directories).
    • Run gem5 in full system mode: 
./build/X86/gem5.opt --debug-enable=TLB --debug-file=tlb_trace.log ./configs/example/fs.py --cpu-type=timing --caches --l2cache

    • Here again, the "--debug-enable" and "--debug-file" options will trace virtual to physical address translations. We use the "TLB" option here to log these translations rather than the "MMU" option as we did earlier, because in full system simulation a different class is performing these translations. Note that because you are logging all virtual to physical address translations, the "tlb_trace.log" file will get very large very fast.
    • To mitigate this problem of the exploding "tlb_trace.log" file, I modifed the arch/x86/tlb.cc file to log translations only when page faults occur. Logging these translations will be important when we start executing our own applications. If we want to print out an address and map it back to the memory trace, we'll need to see how these translations are taking place. The modifications to tlb.cc with restricted logging can be obtained through these diff files: tlb.cc, tlb.hh.
    • In another terminal, connect to the gem5 simulation using the m5term utility:
    • CD to the directory util/term/, and run "./m5term 3456". This will connect you to the simulation. After the simulator is done booting the system, you should see a linux command prompt. From this command prompt, you can run standard linux commands. Shut down the simulation and make sure that you got a TLB trace in m5out/tlb_trace.log. 
    • Switch back over to the terminal where the gem5 simulation is running, and hit CTRL-C. This shuts down the simulator. Examine the log file in the top of the gem5-stable directory. This log file will have the name that you chose when you opened the ofstream in src/mem/comm_monitor.cc. The log file should contain a trace of memory accesses performed by the entire system that was simulated in the last run.
  1. Run a custom program in the full system simulation.
    • We will want to run our own custom programs with the gem5 full system simulator. Although we can compile programs from within the simulator, this will be fairly slow and tedious. Instead, we can compile the programs natively, mount the disk image from linux, and copy the compiled program from our native linux into the mounted disk image, where it can be accessed by the simulated system.
    • The instructions for mounting the image are given on the gem5 website here: http://www.m5sim.org/Disk_images#Mounting_an_image. Run:
sudo mount -o loop,offset=32256 /home/user123/gem5-util/disks/x86root.img /home/user123/gem5-util/fsmount/
    • (This assumes you've already created the fsmount directory).
    • Now, within your natively running Linux, write a program - call it test.c, for example - that you want to run in the gem5 simulation. Compile it using gcc, and copy the binary into /home/user123/gem5-util/fsmount/bin/.
    • Restart the gem5 simulation, and you should be able to run the test executable that you placed in the bin directory from within the gem5 simulation.
    • After running your program, you will probably want to be able to look in the memory trace to see how it is using memory. However, if you print out addresses in your program, they will print out as virtual addresses. The memory traces log physical addresses. Hence, you need to use the tlb_trace.log file to find out what mappings take place. Say you printed out some address 0x2ac97c7a8010 used by your program. Look in tlb_trace.log for an address starting with 0x2ac97c7XXXXX to see where the page containing your address was faulted in. This will tell you which physical address your virtual address maps to.
    • Now, using your decode_packet_trace.py python script described in a previous step, translate one of your CT_mon files to ascii. Now, you will search this ascii file for your physical address. However, these addresses are logged in decimal, not hex. So translate your physical address to decimal, and search the file for it. It should be in there. Note that these addresses are going to be accessed on boundaries (as the pages were during page faults). So, for instance, if you are looking at the log for accesses to external memory, and the block size of your last level cache is 64B, you will see in the log an access to the 64B block containing your physical address.


Caveats

  1. Last time I tried, if you copy files into the disk image while gem5 is running, gem5 won't be able to see them. I had to reboot gem5 for it to see the files I copied in. 
  2. I am still working on understanding the memory trace. For example, my log contains long strings of writes to the address 0x20000000000000b0, which I don't yet understand.



Results of Trials with other Tools


Before trying gem5, I tried to perform memory tracing with some other tools. Here are some notes on what I learned from trying those tools.

QEMU: The vCSIMx86 Version

There has been a version of QEMU released with modifications specifically for the purpose of gathering system-wide memory traces, available here. I was able to get it running and gather some traces, but I ran into some difficulties with it and found that it had some significant limitations, including the following:

  1. It works only with a very old version of QEMU (v. 1.1.0).
  2. After booting the virtual machine (which takes quite awhile, even on a machine with fairly good specs), only the first run of an application will successfully initiate tracing.
  3. It isn't well-supported, as there doesn't appear to be a mailing list or other such resource for offering help with it.
  4. I didn't ever gain confidence that it was actually tracing all memory accesses in the system.
  5. Note that I contacted the author, and he confirmed that #1 and #2 are known limitations of the tool.

No comments:

Post a Comment